Extreme Cloud Administration Toolkit

xCAT stands for Extreme Cloud Administration Toolkit.

xCAT offers complete management of clouds, clusters, HPC, grids, datacenters, renderfarms, online gaming infrastructure, and whatever tomorrows next buzzword may be.

xCAT enables the administrator to:
  1. Discover the hardware servers
  2. Execute remote system management
  3. Provision operating systems on physical or virtual machines
  4. Provision machines in Diskful (stateful) and Diskless (stateless)
  5. Install and configure user applications
  6. Parallel system management
  7. Integrate xCAT in Cloud

You’ve reached xCAT documentation site, The main page product page is http://xcat.org

xCAT is an open source project hosted on GitHub. Go to GitHub to view the source, open issues, ask questions, and participate in the project.

Enjoy!

Table of Contents

Overview

xCAT enables you to easily manage large number of servers for any type of technical computing workload. xCAT is known for exceptional scaling, wide variety of supported hardware and operating systems, virtualization platforms, and complete “day0” setup capabilities.

Differentiators

  • xCAT Scales

    Beyond all IT budgets, up to 100,000s of nodes with distributed architecture.

  • Open Source

    Eclipse Public License. Support contracts are also available, contact IBM.

  • Supports Multiple Operating Systems

    RHEL, SLES, Ubuntu, Debian, CentOS, Fedora, Scientific Linux, Oracle Linux, Windows, Esxi, RHEV, and more!

  • Support Multiple Hardware

    IBM Power, IBM Power LE, x86_64

  • Support Multiple Virtualization Infrastructures

    IBM PowerKVM, KVM, IBM zVM, ESXI, XEN

  • Support Multiple Installation Options

    Diskful (Install to Hard Disk), Diskless (Runs in memory), Cloning

  • Built in Automatic discovery

    No need to power on one machine at a time for discovery. Nodes that fail can be replaced and back in action simply by powering the new one on.

  • RestFUL API

    Provides a Rest API interface for the third-party software to integrate with

Features

  1. Discover the hardware servers
    • Manually define
    • MTMS-based discovery
    • Switch-based discovery
    • Sequential-based discovery
  2. Execute remote system management against the discovered server
    • Remote power control
    • Remote console support
    • Remote inventory/vitals information query
    • Remote event log query
  3. Provision Operating Systems on physical (Bare-metal) or virtual machines
    • RHEL
    • SLES
    • Ubuntu
    • Debian
    • Fedora
    • CentOS
    • Scientific Linux
    • Oracle Linux
    • PowerKVM
    • Esxi
    • RHEV
    • Windows
  4. Provision machines in
    • Diskful (Scripted install, Clone)
    • Stateless
  5. Install and configure user applications
    • During OS install
    • After the OS install
    • HPC products - GPFS, Parallel Environment, LSF, compilers …
    • Big Data - Hadoop, Symphony
    • Cloud - Openstack, Chef
  6. Parallel system management
    • Parallel shell (Run shell command against nodes in parallel)
    • Parallel copy
    • Parallel ping
  7. Integrate xCAT in Cloud
    • Openstack
    • SoftLayer

Operating System & Hardware Support Matrix

  Power Power LE zVM Power KVM x86_64 x86_64 KVM x86_64 Esxi
RHEL yes yes yes yes yes yes yes
SLES yes yes yes yes yes yes yes
Ubuntu no yes no yes yes yes yes
CentOS no no no no yes yes yes
Windows no no no no yes yes yes

Architecture

The following diagram shows the basic structure of xCAT:

_images/Xcat-arch.png
xCAT Management Node (xCAT Mgmt Node):
The server where xCAT software is installed and used as the single point to perform system management over the entire cluster. On this node, a database is configured to store the xCAT node definitions. Network services (dhcp, tftp, http, etc) are enabled to respond in Operating system deployment.
Service Node:
One or more defined “slave” servers operating under the Management Node to assist in system management to reduce the load (cpu, network bandwidth) when using a single Management Node. This concept is necessary when managing very large clusters.
Compute Node:
The compute nodes are the target servers which xCAT is managing.
Network Services (dhcp, tftp, http,etc):
The various network services necessary to perform Operating System deployment over the network. xCAT will bring up and configure the network services automatically without any intervention from the System Administrator.
Service Processor (SP):
A module embedded in the hardware server used to perform the out-of-band hardware control. (e.g. Integrated Management Module (IMM), Flexible Service Processor (FSP), Baseboard Management Controller (BMC), etc)
Management network:
The network used by the Management Node (or Service Node) to install operating systems and manage the nodes. The Management Node and in-band Network Interface Card (NIC) of the nodes are connected to this network. If you have a large cluster utilizing Service Nodes, sometimes this network is segregated into separate VLANs for each Service Node.
Service network:
The network used by the Management Node (or Service Node) to control the nodes using out-of-band management using the Service Processor. If the Service Processor is configured in shared mode (meaning the NIC of the Service process is used for the SP and the host), then this network can be combined with the management network.
Application network:
The network used by the applications on the Compute nodes to communicate among each other.
Site (Public) network:
The network used by users to access the Management Nodes or access the Compute Nodes directly.
RestAPIs:
The RestAPI interface can be used by the third-party application to integrate with xCAT.

xCAT2 Release Information

The following tables documents the xCAT release versions and release dates. For more detailed information regarding new functions, supported OSs, bug fixes, and download links, refer to the specific release notes.

xCAT 2.16.x

2.16.x Release Information
Version Release Date New OS Supported Release Notes
2.16.3 2021-11-17 RHEL 8.4 2.16.3 Release Notes
2.16.2 2021-05-25 RHEL 8.3 2.16.2 Release Notes
2.16.1 2020-11-06 RHEL 8.2 2.16.1 Release Notes
2.16.0 2020-06-17 RHEL 8.1,SLES 15 2.16.0 Release Notes

xCAT 2.15.x

2.15.x Release Information
Version Release Date New OS Supported Release Notes
2.15.1 2020-03-06 RHEL 7.7 2.15.1 Release Notes
2.15.0 2019-11-11 RHEL 8.0 2.15.0 Release Notes

xCAT 2.14.x

2.14.x Release Information
Version Release Date New OS Supported Release Notes
2.14.6 2019-03-29   2.14.6 Release Notes
2.14.5 2018-12-07 RHEL 7.6 2.14.5 Release Notes
2.14.4 2018-10-19 Ubuntu 18.04.1 2.14.4 Release Notes
2.14.3 2018-08-24 SLES 12.3 2.14.3 Release Notes
2.14.2 2018-07-13 RHEL 6.10, Ubuntu 18.04 2.14.2 Release Notes
2.14.1 2018-06-01 RHV 4.2, RHEL 7.5 (Power8) 2.14.1 Release Notes
2.14.0 2018-04-20 RHEL 7.5 2.14.0 Release Notes

xCAT 2.13.x

2.13.x Release Information
Version Release Date New OS Supported Release Notes
2.13.11 2018-03-09   2.13.11 Release Notes
2.13.10 2018-01-26   2.13.10 Release Notes
2.13.9 2017-12-18   2.13.9 Release Notes
2.13.8 2017-11-03   2.13.8 Release Notes
2.13.7 2017-09-22   2.13.7 Release Notes
2.13.6 2017-08-10 RHEL 7.4 2.13.6 Release Notes
2.13.5 2017-06-30   2.13.5 Release Notes
2.13.4 2017-05-09 RHV 4.1 2.13.4 Release Notes
2.13.3 2017-04-14 RHEL 6.9 2.13.3 Release Notes
2.13.2 2017-02-24   2.13.2 Release Notes
2.13.1 2017-01-13   2.13.1 Release Notes
2.13.0 2016-12-09 SLES 12.2 2.13.0 Release Notes

xCAT 2.12.x

2.12.x Release Information
Version Release Date New OS Supported Release Notes
2.12.4 2016-11-11 RHEL 7.3 LE, RHEV 4.0 2.12.4 Release Notes
2.12.3 2016-09-30   2.12.3 Release Notes
2.12.2 2016-08-19 Ubuntu 16.04.1 2.12.2 Release Notes
2.12.1 2016-07-08   2.12.1 Release Notes
2.12.0 2016-05-20 RHEL 6.8, Ubuntu 14.4.4 LE, Ubuntu 16.04 2.12.0 Release Notes

xCAT 2.11.x

xCAT Version New OS New Hardware New Feature
xCAT 2.11.1
2016/04/22
   
  • Bug fix
xCAT 2.11
2015/12/11
  • RHEL 7.2 LE
  • UBT 14.4.3 LE
  • UBT 15.10 LE
  • PowerKVM 3.1
  • S822LC(GCA)
  • S822LC(GTA)
  • S812LC
  • NeuCloud OP
  • ZoomNet RP
  • NVIDIA GPU for OpenPOWER
  • Infiniband for OpenPOWER
  • SW KIT support for OpenPOWER
  • renergy command for OpenPOWER
  • rflash command for OpenPOWER
  • Add xCAT Troubleshooting Log
  • xCAT Log Classification
  • RAID Configuration
  • Accelerate genimage process
  • Add bmcdiscover Command
  • Enhance xcatdebugmode
  • new xCAT doc in ReadTheDocs

xCAT 2.10.x

xCAT Version New OS New Hardware New Feature
xCAT 2.10
2015/07/31
  • RHEL 7.1 LE
  • UBT 15.4 LE
  • SLES 12 LE
  • RHEL 6.7
  • CentOS 7.1
  • SLES 11 SP4
  • Power 8 LE
  • Ubuntu LE -> RH 7.1 Mix
  • Cuda install for Ubuntu 14.4.2
  • additional kernel parameters
  • customized disk part (Ubuntu)
  • RAID configure base iprconfig
  • New command: switchdiscover
  • New command: makentp
  • New command: bmcdiscovery
  • Support getmacs –noping
  • site.xcatdebugmode
  • validate netboot attribute
  • buildcore on local server
  • copycds generates fewer osimage
  • nodeset only accepts osimage=

xCAT 2.9.x

xCAT Version New OS New Hardware New Feature
xCAT 2.9.3 for AIX
2016/03/11
  • AIX 7.2.0
  • AIX 7.1.4.1
 
  • new format in synclist (node)
xCAT 2.9.2 for AIX
2015/11/11
  • AIX 6.1.8.6
  • AIX 6.1.9.5
  • AIX 7.1.3.5
  • Power 8 for AIX
  • ssl version control in xcatd
xCAT 2.9.1 [1]
2015/03/20
  • RHEL 7.1
  • UBT 14.04.2
  • SLES 11 SP3 and later ONLY
 
  • Nvidia GPU
  • Ubuntu Local Mirror
  • SLES12 diskless
  • Energy management for Power 8
  • RHEL 7.1 LE -> BE mix cluster
  • nics.nicextraparams
  • xCAT in Docker Image
  • confluent replaces conserver
  • TLSv1 in xcatd
  • New GPG key for xCAT packages
  • fast restart xcatd (systemd)
  • netboot method: grub2-tftp
  • netboot method: grub2-http
xCAT 2.9
2014/12/12
  • UBT 14.4 LE
  • UBT 14.4.1 LE
  • UBT 14.10
  • SLES 12
  • RHEL 6.6
  • AIX 7.1.3.15
  • PowerKVM
  • Power 8 LE
  • sysclone enhancements
  • site.auditnosyslog
  • site.nmapoptions
  • customize postscripts
  • Power 8 LE hw discover
  • IB support for P8 LE
[1]xCAT 2.9.1 onwards provides support for Kernel-based Virtual Machines (KVM) and requires an operating system that ships the perl-Sys-Virt package.

xCAT 2.8.x

xCAT Version New OS New Hardware New Feature
xCAT 2.8.4
2014/03/23
  • RHEL 6.5
  • RHEL 5.10
 
  • RHEL 7 experimental,
  • support xCAT clusterzones
  • commands enhancements
xCAT 2.8.3
2013/11/15
  • AIX 7.3.1.1
  • AIX 7.3.1.0
  • AIX 7.1.2
  • Xeon Phi (P2)
  • NS nx360M4
  • xcatd flow control
  • sysclone x86_64 image
  • enhance genitird and nodeset
  • enhance confignics, KIT
  • enhance sequential discovery
  • deploy OpenStack on Ubuntu
xCAT 2.8.2
2013/06/26
  • SLES 11 SP3
  • Xeon Phi (P1)
  • HPC KIT for ppc64
  • sysclone x86_64 image (P1)
  • enhance xdsh, updatenode
  • localdisk for diskless
  • enhance sequential discovery
  • deploy OpenStack on Ubuntu
xCAT 2.8.1
2013/06/26
  • RHEL 6.4
  • RHEL 5.9
 
  • energy management for flex
  • sequential discovery
  • KIT enhancements
  • osimage enhancements
  • IPv6 enhancements
  • def/xdsh/xdcp enhancements
  • updatenode enhancements
xCAT 2.8
2013/02/28
  • UBT 12.04
  • WIN S 2012
  • WIN 8 Hv
 
  • Flex IMM setup
  • Multiple Hostname
  • KIT support
  • KVM/zVM enhancements
  • RHEV Support
  • Localdisk for statelite
  • Manage MN itslef
  • site auditskipcmds
  • precreate postscripts
  • mypostscript templates
  • pasu command
  • postscripts on stateful boot
  • node update status attrs
  • updatenode enhancements

xCAT 2.7.x

xCAT Version New OS New Hardware New Feature
xCAT 2.7.8
2014/01/24
  • AIX 7.1.3.1
  • AIX 7.1.3.0
  • AIX 6.1.9.1
   
xCAT 2.7.7
2013/03/17
  • RHEL 6.4
 
  • sinv for devices
  • Flex energy mgt and rbeacon
xCAT 2.7.6
2012/11/30
  • SLES 10 SP4
  • AIX 6.1.8
  • AIX 7.1.2
 
  • HPC Integration updates
xCAT 2.7.5
2012/10/29
  • RHEL 6.3
 
  • virtualization with RHEV
  • hardware discovery for x Flex
  • enhanced AIX HASN
xCAT 2.7.4
2012/08/27
  • SLES11 SP2
  • Flex
  • improved IPMI for large systems
xCAT 2.7.3
2012/06/22
  • SLES11 SP2
  • RHEL 6.2
  • Flex
  • HPC Integration updates
xCAT 2.7.2
2012/05/25
  • AIX 7.1.1.3
  • Power 775
  • Flex for P
  • SLES 11 kdump
  • HPC Integration updates
xCAT 2.7.1
2012/04/20
  • RHEL 6.3
 
  • minor enhancements
  • bug fixes
xCAT 2.7
2012/03/19
  • RHEL 6.2
 
  • xcatd memory usage reduced
  • xcatdebug for xcatd and plugins
  • lstree command
  • x86_64 genesis boot image
  • ipmi throttles
  • rpower suspend select IBM hw
  • stateful ESXi5
  • xnba UEFI boot
  • httpd for postscripts
  • rolling updates
  • Nagios monitoring plugin

Install Guides

Installation Guide for Red Hat Enterprise Linux

For the current list of operating systems supported and verified by the development team for the different releases of xCAT, see the xCAT2 Release Notes.

Disclaimer These instructions are intended to only be guidelines and specific details may differ slightly based on the operating system version. Always refer to the operating system documentation for the latest recommended procedures.

Prepare the Management Node

These steps prepare the Management Node for xCAT Installation

Install an OS on the Management Node

Install one of the supported operating systems on your target management node.

The system requirements for your xCAT management node largely depend on the size of the cluster you plan to manage and the type of provisioning used (diskful, diskless, system clones, etc). The majority of system load comes during cluster provisioning time.

Memory Requirements:

Cluster Size Memory (GB)
Small (< 16) 4-6
Medium 6-8
Large > 16
Configure the Base OS Repository

xCAT uses the yum package manager on RHEL Linux distributions to install and resolve dependency packages provided by the base operating system. Follow this section to create the repository for the base operating system on the Management Node

  1. Copy the DVD iso file to /tmp on the Management Node. This example will use file RHEL-LE-7.1-20150219.1-Server-ppc64le-dvd1.iso

  2. Mount the iso to /mnt/iso/rhels7.1 on the Management Node.

    mkdir -p /mnt/iso/rhels7.1
    mount -o loop /tmp/RHEL-LE-7.1-20150219.1-Server-ppc64le-dvd1.iso /mnt/iso/rhels7.1
    
  3. Create a yum repository file /etc/yum.repos.d/rhels71-dvd.repo that points to the locally mounted iso image from the above step. The file contents should appear as the following:

    [rhel-7.1-dvd-server]
    name=RHEL 7 SERVER packages
    baseurl=file:///mnt/iso/rhels7.1/Server
    enabled=1
    gpgcheck=1
    
Configure the Management Node

By setting properties on the Management Node before installing the xCAT software will allow xCAT to automatically configure key attributes in the xCAT site table during the install.

  1. Ensure a hostname is configured on the management node by issuing the hostname command. [It’s recommended to use a fully qualified domain name (FQDN) when setting the hostname]

    1. To set the hostname of xcatmn.cluster.com:

      hostname xcatmn.cluster.com
      
    2. Add the hostname to the /etc/sysconfig/network in order to persist the hostname on reboot.

    3. Reboot the server and verify the hostname by running the following commands:

      • hostname
      • hostname -d - should display the domain
  2. Reduce the risk of the Management Node IP address being lost by setting the IP to STATIC in the /etc/sysconfig/network-scripts/ifcfg-<dev> configuration files.

  3. Configure any domain search strings and nameservers to the /etc/resolv.conf file.

Installing xCAT

The following sections describe the different methods for installing xCAT.

Automatic Install Using go-xcat

go-xcat is a tool that can be used to fully install or update xCAT. go-xcat will automatically download the correct package manager repository file from xcat.org and use the public repository to install xCAT. If the xCAT management node does not have internet connectivity, use process described in the Manual Installation section of the guide.

  1. Download the go-xcat tool using wget:

    wget https://raw.githubusercontent.com/xcat2/xcat-core/master/xCAT-server/share/xcat/tools/go-xcat -O - >/tmp/go-xcat
    chmod +x /tmp/go-xcat
    
  2. Run the go-xcat tool:

    /tmp/go-xcat install            # installs the latest stable version of xCAT
    /tmp/go-xcat -x devel install   # installs the latest development version of xCAT
    
Manual Install Using Software Repositories

xCAT consists of two software packages: xcat-core and xcat-dep

  1. xcat-core xCAT’s main software package and is provided in one of the following options:

    • Latest Release (Stable) Builds

      This is the latest GA (Generally Availability) build that has been tested thoroughly

    • Development Builds

      This is the snapshot builds of the new version of xCAT in development. This version has not been released yet, use as your own risk

  2. xcat-dep xCAT’s dependency package. This package is provided as a convenience for the user and contains dependency packages required by xCAT that are not provided by the operating system.

xCAT is installed by configuring software repositories for xcat-core and xcat-dep and using yum package manager. The repositories can be publicly hosted or locally hosted.

Configure xCAT Software Repository

xCAT software and repo files can be obtained from: http://xcat.org/download.html

Internet Repository

[xcat-core]

For the xCAT version you want to install, download the xcat-core.repo file and copy it to /etc/yum.repos.d

[xcat-dep]

From the xCAT-dep Online Repository, navigate to the correct subdirectory for the target machine and download the xcat-dep.repo file and copy it to /etc/yum.repos.d.

Continue to the next section to install xCAT.

Local Repository

[xcat-core]

  1. Download xcat-core:

    # downloading the latest stable build, xcat-core-<version>-linux.tar.bz2
    mkdir -p ~/xcat
    cd ~/xcat/
    wget http://xcat.org/files/xcat/xcat-core/<version>.x_Linux/xcat-core/xcat-core-<version>-linux.tar.bz2
    
  2. Extract xcat-core:

    tar xcat-core-<version>-linux.tar.bz2
    
  3. Configure the local repository for xcat-core by running mklocalrepo.sh script in the xcat-core directory:

    cd ~/xcat/xcat-core
    ./mklocalrepo.sh
    

[xcat-dep]

Unless you are downloading xcat-dep to match a specific version of xCAT, it’s recommended to download the latest version of xcat-dep.

  1. Download xcat-dep:

    # downloading the latest stable version, xcat-dep-<version>-linux.tar.bz2
    mkdir -p ~/xcat/
    cd ~/xcat
    wget http://xcat.org/files/xcat/xcat-dep/2.x_Linux/xcat-dep-<version>-linux.tar.bz2
    
  2. Extract xcat-dep:

    tar jxvf xcat-dep-<version>-linux.tar.bz2
    
  3. Configure the local repository for xcat-dep by switching to the architecture and os subdirectory of the node you are installing on, then run the mklocalrepo.sh script:

    cd ~/xcat/xcat-dep/
    # On redhat 7.1 ppc64le: cd rh7/ppc64le
    cd <os>/<arch>
    ./mklocalrepo.sh
    
Install xCAT

Install xCAT with the following command:

yum clean all (optional)
yum install xCAT

Note: During the install, you must accept the xCAT Security Key to continue:

Retrieving key from file:///root/xcat/xcat-dep/rh6/ppc64/repodata/repomd.xml.key
Importing GPG key 0xC6565BC9:
 Userid: "xCAT Security Key <xcat@cn.ibm.com>"
 From  : /root/xcat/xcat-dep/rh6/ppc64/repodata/repomd.xml.key
Is this ok [y/N]:
Verify xCAT Installation

Quick verification of the xCAT Install can be done running the following steps:

  1. Source the profile to add xCAT Commands to your path:

    source /etc/profile.d/xcat.sh
    
  2. Check the xCAT version:

    lsxcatd -a
    
  3. Check to verify that the xCAT database is initialized by dumping out the site table:

    tabdump site
    

    The output should be similar to the following:

    #key,value,comments,disable
    "blademaxp","64",,
    "domain","pok.stglabs.ibm.com",,
    "fsptimeout","0",,
    "installdir","/install",,
    "ipmimaxp","64",,
    "ipmiretries","3",,
    ...
    
Starting and Stopping

xCAT is started automatically after the installation, but the following commands can be used to start, stop, restart, and check xCAT status.

  • start xCAT:

    service xcatd start
    [systemd] systemctl start xcatd.service
    
  • stop xCAT:

    service xcatd stop
    [systemd] systemctl stop xcatd.service
    
  • restart xCAT:

    service xcatd restart
    [systemd] systemctl restart xcatd.service
    
  • check xCAT status:

    service xcatd status
    [systemd] systemctl status xcatd.service
    
Updating xCAT

If at a later date you want to update xCAT, first, update the software repositories and then run:

yum clean metadata # or, yum clean all
yum update '*xCAT*'

# To check and update the packages provided by xcat-dep:
yum update '*xcat*'

Installation Guide for SUSE Linux Enterprise Server

For the current list of operating systems supported and verified by the development team for the different releases of xCAT, see the xCAT2 Release Notes.

Disclaimer These instructions are intended to only be guidelines and specific details may differ slightly based on the operating system version. Always refer to the operating system documentation for the latest recommended procedures.

Prepare the Management Node

These steps prepare the Management Node for xCAT Installation

Install an OS on the Management Node

Install one of the supported operating systems on your target management node.

The system requirements for your xCAT management node largely depend on the size of the cluster you plan to manage and the type of provisioning used (diskful, diskless, system clones, etc). The majority of system load comes during cluster provisioning time.

Memory Requirements:

Cluster Size Memory (GB)
Small (< 16) 4-6
Medium 6-8
Large > 16
Configure the Base OS Repository

xCAT uses the zypper package manager on SLES Linux distributions to install and resolve dependency packages provided by the base operating system. Follow this section to create the repository for the base operating system on the Management Node

  1. Copy the DVD iso file to /tmp on the Management Node:

    # This example will use SLE-12-Server-DVD-ppc64le-GM-DVD1.iso
    
  2. Mount the iso to /mnt/iso/sles12 on the Management Node.

    mkdir -p /mnt/iso/sles12
    mount -o loop /tmp/SLE-12-Server-DVD-ppc64le-GM-DVD1.iso /mnt/iso/sles12
    
  3. Create a zypper repository file /etc/zypp/repos.d/sles12le-base.repo that points to the locally mounted iso image from the above step. The file contents should appear as the following:

    [sles-12-le-server]
    name=SLES 12 ppc64le Server Packages
    baseurl=file:///mnt/iso/sles12/suse
    enabled=1
    gpgcheck=1
    
Configure the Management Node

By setting properties on the Management Node before installing the xCAT software will allow xCAT to automatically configure key attributes in the xCAT site table during the install.

  1. Ensure a hostname is configured on the management node by issuing the hostname command. [It’s recommended to use a fully qualified domain name (FQDN) when setting the hostname]

    1. To set the hostname of xcatmn.cluster.com:

      hostname xcatmn.cluster.com
      
    2. Add the hostname to the /etc/hostname in order to persist the hostname on reboot.

    3. Reboot the server and verify the hostname by running the following commands:

      • hostname
      • hostname -d - should display the domain
  2. Reduce the risk of the Management Node IP address being lost by setting the IP to STATIC in the /etc/sysconfig/network/ifcfg-<dev> configuration files.

  3. Configure any domain search strings and nameservers to the /etc/resolv.conf file.

Installing xCAT

The following sections describe the different methods for installing xCAT.

Automatic Install Using go-xcat

go-xcat is a tool that can be used to fully install or update xCAT. go-xcat will automatically download the correct package manager repository file from xcat.org and use the public repository to install xCAT. If the xCAT management node does not have internet connectivity, use process described in the Manual Installation section of the guide.

  1. Download the go-xcat tool using wget:

    wget https://raw.githubusercontent.com/xcat2/xcat-core/master/xCAT-server/share/xcat/tools/go-xcat -O - >/tmp/go-xcat
    chmod +x /tmp/go-xcat
    
  2. Run the go-xcat tool:

    /tmp/go-xcat install            # installs the latest stable version of xCAT
    /tmp/go-xcat -x devel install   # installs the latest development version of xCAT
    
Manual Install Using Software Repositories

xCAT consists of two software packages: xcat-core and xcat-dep

  1. xcat-core xCAT’s main software package and is provided in one of the following options:

    • Latest Release (Stable) Builds

      This is the latest GA (Generally Availability) build that has been tested thoroughly

    • Development Builds

      This is the snapshot builds of the new version of xCAT in development. This version has not been released yet, use as your own risk

  2. xcat-dep xCAT’s dependency package. This package is provided as a convenience for the user and contains dependency packages required by xCAT that are not provided by the operating system.

xCAT is installed by configuring software repositories for xcat-core and xcat-dep and using yum package manager. The repositories can be publicly hosted or locally hosted.

Configure xCAT Software Repository

xCAT software and repo files can be obtained from: http://xcat.org/download.html

Internet Repository

[xcat-core]

For the xCAT version you want to install, download the xcat-core.repo file and copy it to /etc/zypp/repos.d

[xcat-dep]

From the xCAT-dep Online Repository, navigate to the correct subdirectory for the target machine and download the xcat-dep.repo file and copy it to /etc/zypp/repos.d.

Continue to the next section to install xCAT.

Local Repository

[xcat-core]

  1. Download xcat-core:

    # downloading the latest stable build, xcat-core-<version>-linux.tar.bz2
    mkdir -p ~/xcat
    cd ~/xcat/
    wget http://xcat.org/files/xcat/xcat-core/<version>.x_Linux/xcat-core/xcat-core-<version>-linux.tar.bz2
    
  2. Extract xcat-core:

    tar xcat-core-<version>-linux.tar.bz2
    
  3. Configure the local repository for xcat-core by running mklocalrepo.sh script in the xcat-core directory:

    cd ~/xcat/xcat-core
    ./mklocalrepo.sh
    

[xcat-dep]

Unless you are downloading xcat-dep to match a specific version of xCAT, it’s recommended to download the latest version of xcat-dep.

  1. Download xcat-dep:

    # downloading the latest stable version, xcat-dep-<version>-linux.tar.bz2
    mkdir -p ~/xcat/
    cd ~/xcat
    wget http://xcat.org/files/xcat/xcat-dep/2.x_Linux/xcat-dep-<version>-linux.tar.bz2
    
  2. Extract xcat-dep:

    tar jxvf xcat-dep-<version>-linux.tar.bz2
    
  3. Configure the local repository for xcat-dep by switching to the architecture and os subdirectory of the node you are installing on, then run the mklocalrepo.sh script:

    cd ~/xcat/xcat-dep/
    # On redhat 7.1 ppc64le: cd rh7/ppc64le
    cd <os>/<arch>
    ./mklocalrepo.sh
    
Install xCAT

Install xCAT with the following command:

zypper clean all (optional)
zypper install xCAT

Note: During the install, you must accept the xCAT Security Key to continue

Verify xCAT Installation

Quick verification of the xCAT Install can be done running the following steps:

  1. Source the profile to add xCAT Commands to your path:

    source /etc/profile.d/xcat.sh
    
  2. Check the xCAT version:

    lsxcatd -a
    
  3. Check to verify that the xCAT database is initialized by dumping out the site table:

    tabdump site
    

    The output should be similar to the following:

    #key,value,comments,disable
    "blademaxp","64",,
    "domain","pok.stglabs.ibm.com",,
    "fsptimeout","0",,
    "installdir","/install",,
    "ipmimaxp","64",,
    "ipmiretries","3",,
    ...
    
Starting and Stopping

xCAT is started automatically after the installation, but the following commands can be used to start, stop, restart, and check xCAT status.

  • start xCAT:

    service xcatd start
    [systemd] systemctl start xcatd.service
    
  • stop xCAT:

    service xcatd stop
    [systemd] systemctl stop xcatd.service
    
  • restart xCAT:

    service xcatd restart
    [systemd] systemctl restart xcatd.service
    
  • check xCAT status:

    service xcatd status
    [systemd] systemctl status xcatd.service
    
Updating xCAT

If at a later date you want to update xCAT, first, update the software repositories and then run:

zypper refresh
zypper update "*xCAT*"

# To check and update the packages provided by xcat-dep:
zypper update "*xcat*"

Installation Guide for Ubuntu Server LTS

For the current list of operating systems supported and verified by the development team for the different releases of xCAT, see the xCAT2 Release Notes.

Disclaimer These instructions are intended to only be guidelines and specific details may differ slightly based on the operating system version. Always refer to the operating system documentation for the latest recommended procedures.

Prepare the Management Node

These steps prepare the Management Node or xCAT Installation

Install an OS on the Management Node

Install one of the supported operating systems on your target management node.

The system requirements for your xCAT management node largely depend on the size of the cluster you plan to manage and the type of provisioning used (diskful, diskless, system clones, etc). The majority of system load comes during cluster provisioning time.

Memory Requirements:

Cluster Size Memory (GB)
Small (< 16) 4-6
Medium 6-8
Large > 16
Configure the Base OS Repository

xCAT uses the apt package manager on Ubuntu Linux distributions to install and resolve dependency packages provided by the base operating system. Follow this section to create the repository for the base operating system on the Management Node

  1. Copy the DVD iso file to /tmp on the Management Node:

    # This example will use ubuntu-18.04-server-ppc64el.iso
    cp /path/to/ubuntu-18.04-server-ppc64el.iso /tmp
    
  2. Mount the iso to /mnt/iso/ubuntu on the Management Node.

    mkdir -p /mnt/iso/ubuntu
    mount -o loop /tmp/ubuntu-18.04-server-ppc64el.iso /mnt/iso/ubuntu
    
  3. Create an apt repository file /etc/apt.repos.d/ubuntu18-dvd.repo that points to the locally mounted iso image from the above step. The file contents should appear as the following:

    [ubuntu-dvd-server]
    name=Ubuntu 18.04 Server packages
    baseurl=file:///mnt/iso/ubuntu/Server
    enabled=1
    gpgcheck=1
    
Configure the Management Node

By setting properties on the Management Node before installing the xCAT software will allow xCAT to automatically configure key attributes in the xCAT site table during the install.

  1. Ensure a hostname is configured on the management node by issuing the hostname command. [It’s recommended to use a fully qualified domain name (FQDN) when setting the hostname]

    1. To set the hostname of xcatmn.cluster.com:

      hostname xcatmn.cluster.com
      
    2. Add the hostname to the /etc/hostname and /etc/hosts to persist the hostname on reboot.

    3. Reboot or run service hostname restart to allow the hostname to take effect and verify the hostname command returns correctly:

      • hostname
      • hostname -d - should display the domain
  2. Reduce the risk of the Management Node IP address being lost by setting the interface IP to STATIC in the /etc/network/interfaces configuration file.

  3. Configure any domain search strings and nameservers using the resolvconf command.

Installing xCAT

The following sections describe the different methods for installing xCAT.

Automatic Install Using go-xcat

go-xcat is a tool that can be used to fully install or update xCAT. go-xcat will automatically download the correct package manager repository file from xcat.org and use the public repository to install xCAT. If the xCAT management node does not have internet connectivity, use process described in the Manual Installation section of the guide.

  1. Download the go-xcat tool using wget:

    wget https://raw.githubusercontent.com/xcat2/xcat-core/master/xCAT-server/share/xcat/tools/go-xcat -O - >/tmp/go-xcat
    chmod +x /tmp/go-xcat
    
  2. Run the go-xcat tool:

    /tmp/go-xcat install            # installs the latest stable version of xCAT
    /tmp/go-xcat -x devel install   # installs the latest development version of xCAT
    
Manual Install Using Software Repositories

xCAT consists of two software packages: xcat-core and xcat-dep

  1. xcat-core xCAT’s main software package and is provided in one of the following options:

    • Latest Release (Stable) Builds

      This is the latest GA (Generally Availability) build that has been tested thoroughly

    • Development Builds

      This is the snapshot builds of the new version of xCAT in development. This version has not been released yet, use as your own risk

  2. xcat-dep xCAT’s dependency package. This package is provided as a convenience for the user and contains dependency packages required by xCAT that are not provided by the operating system.

xCAT is installed by configuring software repositories for xcat-core and xcat-dep and using yum package manager. The repositories can be publicly hosted or locally hosted.

Configure xCAT Software Repository

xCAT software and repo files can be obtained from: http://xcat.org/download.html

Internet Repository

[xcat-core]

From the xCAT download page, find the build you want to install and add to /etc/apt/sources.list.

To configure the xCAT stable build, add the following line to /etc/apt/sources.list:

[For x86_64 servers]
deb [arch=amd64] http://xcat.org/files/xcat/repos/apt/latest/xcat-core bionic main
[For ppc64el servers]
deb [arch=ppc64el] http://xcat.org/files/xcat/repos/apt/latest/xcat-core bionic main

[xcat-dep]

To configure the xCAT deps online repository, add the following line to /etc/apt/sources.list:

[For x86_64 servers]
deb [arch=amd64] http://xcat.org/files/xcat/repos/apt/latest/xcat-dep bionic main
[For ppc64el servers]
deb [arch=ppc64el] http://xcat.org/files/xcat/repos/apt/latest/xcat-dep bionic main

Continue to the next section to install xCAT.

Local Repository

[xcat-core]

  1. Download xcat-core:

    # downloading the latest stable build, xcat-core-<version>-ubuntu.tar.bz2
    mkdir -p ~/xcat
    cd ~/xcat/
    wget http://xcat.org/files/xcat/xcat-core/<version>.x_Ubuntu/xcat-core/xcat-core-<version>-ubuntu.tar.bz2
    
  2. Extract xcat-core:

    tar jxvf xcat-core-<version>-ubuntu.tar.bz2
    
  3. Configure the local repository for xcat-core by running mklocalrepo.sh script in the xcat-core directory:

    cd ~/xcat/xcat-core
    ./mklocalrepo.sh
    

[xcat-dep]

Unless you are downloading xcat-dep to match a specific version of xCAT, it’s recommended to download the latest version of xcat-dep.

  1. Download xcat-dep:

    # downloading the latest stable version, xcat-dep-<version>-ubuntu.tar.bz2
    mkdir -p ~/xcat/
    cd ~/xcat
    wget http://xcat.org/files/xcat/xcat-dep/2.x_Ubuntu/xcat-dep-<version>-ubuntu.tar.bz2
    
  2. Extract xcat-dep:

    tar jxvf xcat-dep-<version>-ubuntu.tar.bz2
    
  3. Configure the local repository for xcat-dep by running the mklocalrepo.sh script:

    cd ~/xcat/xcat-dep/
    ./mklocalrepo.sh
    
Install xCAT

The xCAT GPG Public Key must be added for apt to verify the xCAT packages

wget -O - "http://xcat.org/files/xcat/repos/apt/apt.key" | apt-key add -

Add the necessary apt-repositories to the management node

# Install the add-apt-repository command
apt-get install software-properties-common

# For x86_64:
add-apt-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc) main"
add-apt-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc)-updates main"
add-apt-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc) universe"
add-apt-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc)-updates universe"

# For ppc64el:
add-apt-repository "deb http://ports.ubuntu.com/ubuntu-ports $(lsb_release -sc) main"
add-apt-repository "deb http://ports.ubuntu.com/ubuntu-ports $(lsb_release -sc)-updates main"
add-apt-repository "deb http://ports.ubuntu.com/ubuntu-ports $(lsb_release -sc) universe"
add-apt-repository "deb http://ports.ubuntu.com/ubuntu-ports $(lsb_release -sc)-updates universe"

Install xCAT [1] with the following command:

apt-get clean all
apt-get update
apt-get install xcat
[1]Starting with Ubuntu 16.04, the package name ‘xCAT’ is required to be all lowercase
Verify xCAT Installation

Quick verification of the xCAT Install can be done running the following steps:

  1. Source the profile to add xCAT Commands to your path:

    source /etc/profile.d/xcat.sh
    
  2. Check the xCAT version:

    lsxcatd -a
    
  3. Check to verify that the xCAT database is initialized by dumping out the site table:

    tabdump site
    

    The output should be similar to the following:

    #key,value,comments,disable
    "blademaxp","64",,
    "domain","pok.stglabs.ibm.com",,
    "fsptimeout","0",,
    "installdir","/install",,
    "ipmimaxp","64",,
    "ipmiretries","3",,
    ...
    
Starting and Stopping

xCAT is started automatically after the installation, but the following commands can be used to start, stop, restart, and check xCAT status.

  • start xCAT:

    service xcatd start
    [systemd] systemctl start xcatd.service
    
  • stop xCAT:

    service xcatd stop
    [systemd] systemctl stop xcatd.service
    
  • restart xCAT:

    service xcatd restart
    [systemd] systemctl restart xcatd.service
    
  • check xCAT status:

    service xcatd status
    [systemd] systemctl status xcatd.service
    
Updating xCAT

If at a later date you want to update xCAT, first, update the software repositories and then run:

apt-get update
apt-get -y --only-upgrade install .*xcat.*

Maintenance

Backup and Restore xCAT

It’s useful to backup xcat data sometime. For example, you need to upgrading to another version of xCAT, or you need to change management server and move xcat form one to another, or you need to make backups regularly and restore production environment for any accident. Below section will help you backup and restore xcat data.

Backup User Data

If need to backup xcat database, you can use dumpxCATdb command like below.

dumpxCATdb -p <path_to_save_the_database>

[Note] Maybe you need to dump some environment data for problem report when you hit defect, you can use xcatsnap command like below.

xcatsnap -B -d <path_to_save_the_data>
Restore User Data

If need to restore xCAT environment, after xCAT software installation, you can restore xCAT DB using the restorexCATdb command pointing to the data files dumped in the past.

restorexCATdb -p  <path_to_save_the_database>

Remove xCAT

We’re sorry to see you go! Here are some steps for removing the xCAT product.

Stop xCAT Service
  1. Stop xCAT service

    service xcatd stop
    
  2. Stop xCAT related services (optional)

XCAT uses various network services on the management node and service nodes, the network services setup by xCAT may need to be cleaned up on the management node and service nodes before uninstalling xCAT.
  • NFS : Stop nfs service, unexport all the file systems exported by xCAT, and remove the xCAT file systems from /etc/exports.
  • HTTP: Stop http service, remove the xcat.conf in the http configuration directory.
  • TFTP: Stop tftp service, remove the tftp files created by xCAT in tftp directory.
  • DHCP: Stop dhcp service, remove the configuration made by xCAT in dhcp configuration files.
  • DNS : Stop the named service, remove the named entries created by xCAT from the named database.
Remove xCAT Files
  1. Remove xCAT Packages

To automatically remove all xCAT packages, run the following command

/opt/xcat/share/xcat/tools/go-xcat uninstall

There is no easy way to identify all xCAT packages. For packages shipped by xCAT, you can manually remove them by using one of the commands below.

[RHEL]

yum remove conserver-xcat elilo-xcat goconserver grub2-xcat ipmitool-xcat perl-xCAT syslinux-xcat xCAT xCAT-SoftLayer xCAT-buildkit xCAT-client xCAT-confluent xCAT-csm xCAT-genesis-base-ppc64 xCAT-genesis-base-x86_64 xCAT-genesis-scripts-ppc64 xCAT-genesis-scripts-x86_64 xCAT-openbmc-py xCAT-probe xCAT-server xnba-undi yaboot-xcat

[SLES]

zypper remove conserver-xcat elilo-xcat goconserver grub2-xcat ipmitool-xcat perl-xCAT syslinux-xcat xCAT xCAT-SoftLayer xCAT-buildkit xCAT-client xCAT-confluent xCAT-csm xCAT-genesis-base-ppc64 xCAT-genesis-base-x86_64 xCAT-genesis-scripts-ppc64 xCAT-genesis-scripts-x86_64 xCAT-openbmc-py xCAT-probe xCAT-server xnba-undi yaboot-xcat

[Ubuntu]

apt-get remove conserver-xcat elilo-xcat goconserver grub2-xcat ipmitool-xcat perl-xcat syslinux-xcat xcat xcat-buildkit xcat-client xcat-confluent xcat-genesis-base-amd64 xcat-genesis-base-ppc64 xcat-genesis-scripts-amd64 xcat-genesis-scripts-ppc64 xcat-probe xcat-server xcat-test xcat-vlan xcatsn xnba-undi

To do an even more thorough cleanup, use links below to get a list of RPMs installed by xCAT. Some RPMs may not to be installed in a specific environment.

  • XCAT Core Packages List (xcat-core)

    [RHEL and SLES]

    http://xcat.org/files/xcat/repos/yum/<version>/xcat-core/
    

    [Ubuntu]

    http://xcat.org/files/xcat/repos/apt/<version>/xcat-core/pool/main
    
  • XCAT Dependency Packages (xcat-dep)

    [RHEL and SLES]

    http://xcat.org/files/xcat/repos/yum/xcat-dep/<os>/<arch>
    

    [Ubuntu]

    http://xcat.org/files/xcat/repos/apt/xcat-dep/pool/main
    

When yum install xCAT is used to install xCAT, dependency RPMs provided by the Operating System will be installed. Keeping those rpms installed on the system is harmless.

  1. Remove xCAT certificate file

    rm -rf /root/.xcat
    
  2. Remove xCAT data files

By default, xCAT uses SQLite, remove SQLite data files under /etc/xcat/.

rm -rf /etc/xcat
  1. Remove xCAT related files (optional)

XCAT might have also created additional files and directories below. Take caution when removing these files as they may be used for other purposes in your environment.

/install
/tftpboot
/etc/yum.repos.d/xCAT-*
/etc/sysconfig/xcat
/etc/apache2/conf.d/xCAT-*
/etc/logrotate.d/xCAT-*
/etc/rsyslogd.d/xCAT-*
/var/log/xcat
/opt/xcat/
/mnt/xcat

Get Started

Quick Start Guide

xCAT can be a comprehensive system to manage infrastructure elements in Data Center, bare-metal servers, switches, PDUs, and Operation System distributions. This quick start guide will instruct you to set up a xCAT system and manage an IPMI managed bare metal server with Red Hat-based distribution in 15 minutes.

The steps below will be focused on RHEL7, however they should work for other distribution, such as CentOS, SLES, etc, details Operating System & Hardware Support Matrix

Prerequisites

Assume there are two servers named xcatmn.mydomain.com and cn1.mydomain.com.

  1. They are in the same subnet 192.168.0.0.
  2. cn1.mydomain.com has BMC which xcatmn.mydomain.com can access it.
  3. xcatmn.mydomain.com has Red Hat OS installed, and uses IP 192.168.0.2.
  4. xcatmn.mydomain.com has access to internet.
  5. cn1.mydomain.com BMC IP address is 10.4.40.254.
  6. Prepare a full DVD for OS provision, and not a Live CD ISO, for this example, will use RHEL-7.6-20181010.0-Server-x86_64-dvd1.iso ISO, you can download it from Red Hat website.

All the following steps should be executed in xcatmn.mydomain.com.

Prepare the Management Node xcatmn.mydomain.com
  1. Disable SELinux:

    echo 0 > /selinux/enforce
    sed -i 's/^SELINUX=.*$/SELINUX=disabled/' /etc/selinux/config
    
  2. Set the hostname of xcatmn.mydomain.com:

    hostname xcatmn.mydomain.com
    
  3. Set the IP to STATIC in the /etc/sysconfig/network-scripts/ifcfg-<proc_nic> file

  4. Update your /etc/resolv.conf with DNS settings and make sure that the node could visit github and xcat official website.

  5. Configure any domain search strings and nameservers to the /etc/resolv.conf file

  6. Add xcatmn into /etc/hosts:

    192.168.0.2 xcatmn xcatmn.mydomain.com
    
  7. Install xCAT:

    wget https://raw.githubusercontent.com/xcat2/xcat-core/master/xCAT-server/share/xcat/tools/go-xcat -O - >/tmp/go-xcat
    chmod +x /tmp/go-xcat
    go-xcat --yes install
    source /etc/profile.d/xcat.sh
    
  8. Configure the system password for the root user on the compute nodes:

    chtab key=system passwd.username=root passwd.password=abc123
    

Stage 1 Add your first node and control it with out-of-band BMC interface

  1. Define compute node cn1:

    mkdef -t node cn1 --template x86_64-template ip=192.168.0.3 mac=42:3d:0a:05:27:0c bmc=10.4.40.254 bmcusername=USERID bmcpassword=PASSW0RD
    
  2. Configure DNS:

    makehosts cn1
    makedns -n
    
  3. Check cn1 Hardware Control:

cn1 power management:

rpower cn1 on
rpower cn1 state
cn1: on

cn1 firmware information:

rinv cn1 firm
cn1: UEFI Version: 1.31 (TDE134EUS  2013/08/27)
cn1: Backup UEFI Version: 1.00 (TDE112DUS )
cn1: Backup IMM Version: 1.25 (1AOO26K 2012/02/23)
cn1: BMC Firmware: 3.10 (1AOO48H 2013/08/22 18:49:44)

Stage 2 Provision a node and manage it with parallel shell

  1. In order to PXE boot, you need a DHCP server to hand out addresses and direct the booting system to the TFTP server where it can download the network boot files. Configure DHCP:

    makedhcp -n
    
  2. Copy all contents of Distribution ISO into /install directory, create OS repository and osimage for OS provision:

    copycds RHEL-7.6-20181010.0-Server-x86_64-dvd1.iso
    

    After copycds, the corresponding basic osimage will be generated automatically. And then you can list the new osimage name here. You can refer document to customize the package list or postscript for target compute nodes, but here just use the default one:

    lsdef -t osimage
    
  3. Use xcatprobe to precheck xCAT management node ready for OS provision:

    xcatprobe xcatmn
    [mn]: Checking all xCAT daemons are running...                                      [ OK ]
    [mn]: Checking xcatd can receive command request...                                 [ OK ]
    [mn]: Checking 'site' table is configured...                                        [ OK ]
    [mn]: Checking provision network is configured...                                   [ OK ]
    [mn]: Checking 'passwd' table is configured...                                      [ OK ]
    [mn]: Checking important directories(installdir,tftpdir) are configured...          [ OK ]
    [mn]: Checking SELinux is disabled...                                               [ OK ]
    [mn]: Checking HTTP service is configured...                                        [ OK ]
    [mn]: Checking TFTP service is configured...                                        [ OK ]
    [mn]: Checking DNS service is configured...                                         [ OK ]
    [mn]: Checking DHCP service is configured...                                        [ OK ]
    ... ...
    [mn]: Checking dhcpd.leases file is less than 100M...                               [ OK ]
    =================================== SUMMARY ====================================
    [MN]: Checking on MN...                                                             [ OK ]
    
  4. Start the Diskful OS Deployment:

    rinstall cn1 osimage=rhels7.6-x86_64-install-compute
    
  5. Monitor Installation Process:

    makegocons cn1
    rcons cn1
    

    Note: The keystroke ctrl+e c . will disconnect you from the console.

    After 5-10 min verify provision status is booted:

    lsdef cn1 -i status
    Object name: cn1
    status=booted
    

    Use xdsh to check cn1 OS version, OS provision is successful:

    xdsh cn1 more /etc/*release
    cn1: ::::::::::::::
    cn1: /etc/os-release
    cn1: ::::::::::::::
    cn1: NAME="Red Hat Enterprise Linux Server"
    cn1: VERSION="7.6 (Maipo)"
    ... ...
    

Workflow Guide

If xCAT looks suitable for your requirement, following steps are recommended to set up an xCAT cluster.

  1. Find a server for xCAT management node

    The server can be a bare-metal server or a virtual machine. The major factor for selecting a server is the number of machines in your cluster. The bigger the cluster is, the performance of server need to be better.

    The architecture of xCAT management node is recommended to be same as the target compute node in the cluster.

  2. Install xCAT on your selected server

    The server where xCAT is installed will be the xCAT Management Node.

    Refer to the doc: xCAT Install Guide to learn how to install xCAT on a server.

    Refer to the doc: xCAT Admin Guide to learn how to manage xCAT Management server.

  3. Discover target compute nodes in the cluster

    Define the target nodes in the xCAT database before managing them.

    For a small cluster (less than 5), you can collect the information of target nodes one by one and then define them manually through mkdef command.

    For a bigger cluster, you can use the automatic method to discover the target nodes. The discovered nodes will be defined to xCAT database. You can use lsdef to display them.

    Refer to the doc: xCAT discovery Guide to learn how to discover and define compute nodes.

  4. Perform hardware control operations against the target compute nodes

    Verify the hardware control for defined nodes. e.g. rpower <node> stat.

    Refer to the doc: Hardware Management to learn how to perform the remote hardware control.

  5. Deploy OS on the target nodes

    • Prepare the OS images
    • Customize the OS images (Optional)
    • Perform the OS deployment

    Refer to the doc: Diskful Install, Diskless Install to learn how to deploy OS for a target node.

  6. Update the OS after the deployment

    You may require to update the OS of certain target nodes after the OS deployment, try the updatenode command. updatenode command can execute the following tasks for target nodes:

    • Install additional software/application for the target nodes
    • Sync some files to the target nodes
    • Run some postscript for the target nodes

    Refer to the doc: Updatenode to learn how to use updatenode command.

  7. Run parallel commands

    When managing a cluster with hundreds or thousands of nodes, operating on many nodes in parallel might be necessary. xCAT has some parallel commands for that.

    • Parallel shell
    • Parallel copy
    • Parallel ping

    Refer to the Parallel Commands to learn how to use parallel commands.

  8. Contribute to xCAT (Optional)

    While using xCAT, if you find something (code, documentation, …) that can be improved and you want to contribute that to xCAT, do that for your and other xCAT users benefit. And welcome to xCAT community!

    Refer to the Developers to learn how to contribute to xCAT community.

Admin Guide

The admin guide is intended to help with learning how to manage a cluster using xCAT with the following major sections:

  • Basic Concepts Introduces some of the basic concepts in xCAT.
  • Manage Cluster Describes managing clusters under xCAT. The management procedures are organized based on the hardware type since management may vary depending on the hardware architecture.
  • Reference xCAT reference sections.

Basic Concepts

xCAT is not hard to use but you still need to learn some basic concepts of xCAT before starting to manage a real cluster.

  • xCAT Objects

    The unit which can be managed in the xCAT is defined as an object. xCAT abstracts several types of objects from the cluster information to represent the physical or logical entities in the cluster. Each xCAT object has a set of attributes, each attribute is mapped from a specified field of a xCAT database table. The xCAT users can get cluster information and perform cluster management work through operations against the objects.

  • xCAT Database

    All the data for the xCAT Objects (node, group, network, osimage, policy … and global configuration) are stored in xCAT Database. Tens of tables are created as the back-end of xCAT Objects. Generally the data in the database is used by user through xCAT Objects. But xCAT also offers a bunch of commands to handle the database directly.

  • Global Configuration

    xCAT has a bunch of Global Configuration for xCAT user to control the behaviors of xCAT. Some of the configuration items are mandatory for an xCAT cluster that you must set them correctly before starting to use xCAT.

  • xCAT Network

    xCAT’s goal is to manage and configure a significant number of servers remotely and automatically through a central management server. All the hardware discovery/management, OS deployment/configuration and application install/configuration are performed through network. You need to have a deep understand of how xCAT will use network before setting up a cluster.

Get Into the Detail of the Cencepts:

xCAT Objects

Basically, xCAT has 20 types of objects. They are:

auditlog    boottarget    eventlog    firmware        group
kit         kitcomponent  kitrepo     monitoring      network
node        notification  osdistro    osdistroupdate  osimage
policy      rack          route       site            zone

This section will introduce you to several important types of objects and give you an overview of how to view and manipulate them.

You can get the detail description of each object by man <object type> e.g. man node.

  • node Object

    The node is the most important object in xCAT. Any physical server, virtual machine or SP (Service Processor for Hardware Control) can be defined as a node object.

    For example, I have a physical server which has the following attributes:

    groups: all,x86_64
        The groups that this node belongs to.
    arch: x86_64
        The architecture of the server is x86_64.
    bmc: 10.4.14.254
        The IP of BMC which will be used for hardware control.
    bmcusername: ADMIN
        The username of bmc.
    bmcpassword: admin
        The password of bmc.
    mac: 6C:AE:8B:1B:E8:52
        The mac address of the ethernet adapter that will be used to
        deploy OS for the node.
    mgt: ipmi
        The management method which will be used to manage the node.
        This node will use ipmi protocol.
    netboot: xnba
        The network bootloader that will be used to deploy OS for the node.
    provmethod: rhels7.1-x86_64-install-compute
        The osimage that will be deployed to the node.
    

    I want to name the node to be cn1 (Compute Node #1) in xCAT. Then I define this node in xCAT with following command:

    $mkdef -t node cn1 groups=all,x86_64 arch=x86_64 bmc=10.4.14.254
                       bmcusername=ADMIN bmcpassword=admin mac=6C:AE:8B:1B:E8:52
                       mgt=ipmi netboot=xnba provmethod=rhels7.1-x86_64-install-compute
    

    After the define, I can use lsdef command to display the defined node:

    $lsdef cn1
    Object name: cn1
        arch=x86_64
        bmc=10.4.14.254
        bmcpassword=admin
        bmcusername=ADMIN
        groups=all,x86_64
        mac=6C:AE:8B:1B:E8:52
        mgt=ipmi
        netboot=xnba
        postbootscripts=otherpkgs
        postscripts=syslog,remoteshell,syncfiles
        provmethod=rhels7.1-x86_64-install-compute
    

    Then I can try to remotely power on the node cn1:

    $rpower cn1 on
    
  • group Object

    group is an object which includes multiple node object. When you set group attribute for a node object to a group name like x86_64, the group x86_64 is automatically generated and the node is assigned to the group.

    The benefits of using group object:

    • Handle multiple nodes through group

      I defined another server cn2 which is similar with cn1, then my group x86_64 has two nodes: cn1 and cn2.

      $ lsdef -t group x86_64
      Object name: x86_64
        cons=ipmi
        members=cn1,cn2
      

      Then I can power on all the nodes in the group x86_64.

      $ rpower x86_64 on
      
    • Inherit attributes from group

      If the group object of node object has certain attribute that node object does not have, the node will inherit this attribute from its group.

      I set the cons attribute for the group object x86_64.

      $ chdef -t group x86_64 cons=ipmi
        1 object definitions have been created or modified.
      
      $ lsdef -t group x86_64
      Object name: x86_64
         cons=ipmi
         members=cn1,cn2
      

      The I can see the cn1 inherits the attribute cons from the group x86_64:

      $ lsdef cn1
      Object name: cn1
          arch=x86_64
          bmc=10.4.14.254
          bmcpassword=admin
          bmcusername=ADMIN
          cons=ipmi
          groups=all,x86_64
          mac=6C:AE:8B:1B:E8:52
          mgt=ipmi
          netboot=xnba
          postbootscripts=otherpkgs
          postscripts=syslog,remoteshell,syncfiles
          provmethod=rhels7.1-x86_64-install-compute
      

      It is useful to define common attributes in group object so that newly added node will inherit them automatically. Since the attributes are defined in the group object, you don’t need to touch the individual nodes attributes.

    • Use Regular Expression to generate value for node attributes

      This is powerful feature in xCAT that you can generate individual attribute value from node name instead of assigning them one by one. Refer to Use Regular Expression in xCAT Database Table.

  • osimage Object

    An osimage object represents an Operating System which can be deployed in xCAT. xCAT always generates several default osimage objects for certain Operating System when executing copycds command to generate the package repository for the OS.

    You can display all the defined osimage object:

    $ lsdef -t osimage
    

    Display the detail attributes of one osimage named rhels7.1-x86_64-install-compute:

    $ lsdef -t osimage rhels7.1-x86_64-install-compute
    Object name: rhels7.1-x86_64-install-compute
        imagetype=linux
        osarch=x86_64
        osdistroname=rhels7.1-x86_64
        osname=Linux
        osvers=rhels7.1
        otherpkgdir=/install/post/otherpkgs/rhels7.1/x86_64
        pkgdir=/install/rhels7.1/x86_64
        pkglist=/opt/xcat/share/xcat/install/rh/compute.rhels7.pkglist
        profile=compute
        provmethod=install
        synclists=/root/syncfiles.list
        template=/opt/xcat/share/xcat/install/rh/compute.rhels7.tmpl
    

    This osimage represents a Linux rhels7.1 Operating System. The package repository is in /install/rhels7.1/x86_64 and the packages which will be installed is listed in the file /opt/xcat/share/xcat/install/rh/compute.rhels7.pkglist

    I can bind the osimage to node when I want to deploy osimage rhels7.1-x86_64-install-compute on my node cn1:

    $ nodeset cn1 osimage=rhels7.1-x86_64-install-compute
    

    Then in the next network boot, the node cn1 will start to deploy rhles7.1.

  • Manipulating Objects

    You already saw that I used the commands mkdef, lsdef, chdef to manipulate the objects. xCAT has 4 objects management commands to manage all the xCAT objects.

    • mkdef : create object definitions
    • chdef : modify object definitions
    • lsdef : list object definitions
    • rmdef : remove object definitions

    To get the detail usage of the commands, refer to the man page. e.g. man mkdef

Get Into the Detail of the xCAT Objects:

node
Description

The definition of physical units in the cluster, such as lpar, virtual machine, frame, cec, hmc, switch.

Key Attributes
  • os:

    The operating system deployed on this node. Valid values: AIX, rhels*, rhelc*, rhas*, centos*, SL*, fedora*, sles* (where * is the version #)

  • arch:

    The hardware architecture of this node. Valid values: x86_64, ppc64, x86, ia64.

  • groups:

    Usually, there are a set of nodes with some attributes in common, xCAT admin can define a node group containing these nodes, so that the management task can be issued against the group instead of individual nodes. A node can be a member of different groups, so the value of this attributes is a comma-delimited list of groups. At least one group is required to create a node. The new created group names should not be prefixed with “__” as this token has been preserved as the internal group name.

  • mgt:

    The method to do general hardware management of the node. This attribute can be determined by the machine type of the node. Valid values: ipmi, blade, hmc, ivm, fsp, bpa, kvm, esx, rhevm.

  • mac:

    The mac address of the network card on the node, which is connected with the installation server and can be used as the network installation device.

  • ip:

    The IP address of the node.

  • netboot:

    The type of network boot method for this node, determined by the OS to provision, the architecture and machine type of the node. Valid values:

    Arch and Machine Type OS valid netboot options
    x86, x86_64 ALL pxe, xnba
    ppc64 <=rhel6, <=sles11.3 yaboot
    ppc64 >=rhels7, >=sles11.4 grub2,grub2-http,grub2-tftp
    ppc64le NonVirtualize ALL petitboot
    ppc64le PowerKVM Guest | ALL grub2,grub2-http,grub2-tftp
  • postscripts:

    Comma separated list of scripts, that should be run on this node after diskful installation or diskless boot, finish some system configuration and maintenance work. For installation of RedHat, CentOS, Fedora, the scripts will be run before the reboot. For installation of SLES, the scripts will be run after the reboot but before the init.d process.

  • postbootscripts:

    Comma separated list of scripts, that should be run on this node as a SysV init job on the 1st reboot after installation or diskless boot, finish some system configuration and maintenance work.

  • provmethod:

    The provisioning method for node deployment. Usually, this attribute is an osimage object name.

  • status:

    The current status of the node, which is updated by xCAT. This value can be used to monitor the provision process. Valid values: powering-off, installing, booting/netbooting, booted.

Use Cases
  • Case 1: There is a ppc64le node named “cn1”, the mac of installation NIC is “ca:68:d3:ae:db:03”, the ip assigned is “10.0.0.100”, the network boot method is “grub2”, place it into the group “all”. Use the following command

    mkdef -t node -o cn1 arch=ppc64 mac="ca:68:d3:ae:db:03" ip="10.0.0.100" netboot="grub2" groups="all"
    
  • Case 2:

    List all the node objects

    nodels
    

    This can also be done with

    lsdef -t node
    
  • Case 3: List the mac of object “cn1”

    lsdef -t node -o cn1 -i mac
    
  • Case 4: There is a node definition “cn1”, modify its network boot method to “yaboot”

    chdef -t node -o cn1 netboot=yaboot
    
  • Case 5: There is a node definition “cn1”, create a node definition “cn2” with the same attributes with “cn1”, except the mac addr(ca:68:d3:ae:db:04) and ip address(10.0.0.101)

    step 1: write the definition of “cn1” to a stanza file named “cn.stanza”

    lsdef -z cn1 > /tmp/cn.stanza
    

    The content of “/tmp/cn.stanza” will look like

    # <xCAT data object stanza file>
    cn1:
        objtype=node
        groups=all
        ip=10.0.0.100
        mac=ca:68:d3:ae:db:03
        netboot=grub2
    

    step 2: modify the “/tmp/cn.stanza” according to the “cn2” attributes

    # <xCAT data object stanza file>
    cn2:
        objtype=node
        groups=all
        ip=10.0.0.101
        mac=ca:68:d3:ae:db:04
        netboot=grub2
    

    step 3: create “cn2” definition with “cn.stanza”

    cat /tmp/cn.stanza |mkdef -z
    
group

XCAT supports both static and dynamic groups. A static group is defined to contain a specific set of cluster nodes. A dynamic node group is one that has its members determined by specifying a selection criteria for node attributes. If a nodes attribute values match the selection criteria then it is dynamically included as a member of the group. The actual group membership will change over time as nodes have attributes set or unset. This provides flexible control over group membership by defining the attributes that define the group, rather than the specific node names that belong to the group. The selection criteria is a list of attr<operator>val pairs that can be used to determine the members of a group, (see below).

Note : Dynamic node group support is available in xCAT version 2.3 and later.

In xCAT, the definition of a static group has been extended to include additional attributes that would normally be assigned to individual nodes. When a node is part of a static group definition, it can inherit the attributes assigned to the group. This feature can make it easier to define and manage cluster nodes in that you can generally assign nodes to the appropriate group and then just manage the group definition instead of multiple node definitions. This feature is not supported for dynamic groups.

To list all the attributes that may be set for a group definition you can run

lsdef -t group -h

When a node is included in one or more static groups, a particular node attribute could actually be stored in several different object definitions. It could be in the node definition itself or it could be in one or more static group definitions. The precedence for determining which value to use is to choose the attribute value specified in the node definition if it is provided. If not, then each static group that the node belongs to will be checked to see if the attribute is set. The first value that is found is the value that is used. The static groups are checked in the order that they are specified in the groups attribute of the node definition.

NOTE : In a large cluster environment it is recommended to focus on group definitions as much as possible and avoid setting the attribute values in the individual node definition. (Of course some attribute values, such as a MAC addresses etc., are only appropriate for individual nodes.) Care must be taken to avoid confusion over which values will be inherited by the nodes.

Group definitions can be created using the mkdef command, changed using the chdef command, listed using the lsdef command and removed using the rmdef command.

Creating a static node group

There are two basic ways to create xCAT static node groups. You can either set the groups attribute of the node definition or you can create a group definition directly.

You can set the groups attribute of the node definition when you are defining the node with the mkdef or nodeadd command or you can modify the attribute later using the chdef or nodech command. For example, if you want a set of nodes to be added to the group “aixnodes”,you could run chdef or nodech as follows

chdef -t node -p -o node01,node02,node03 groups=aixnodes

or

nodech node01,node02,node03 groups=aixnodes

The -p (plus) option specifies that “aixnodes” be added to any existing value for the groups attribute. The -p (plus) option is not supported by nodech command.

The second option would be to create a new group definition directly using the mkdef command as follows

mkdef -t group -o aixnodes members="node01,node02,node03"

These two options will result in exactly the same definitions and attribute values being created in the xCAT database.

Creating a dynamic node group

The selection criteria for a dynamic node group is specified by providing a list of attr<operator>val pairs that can be used to determine the members of a group. The valid operators include: ==, !=, =~ and !~. The attr field can be any node definition attribute returned by the lsdef command. The val field in selection criteria can be a simple sting or a regular expression. A regular expression can only be specified when using the =~ or !~ operators. See http://www.perl.com/doc/manual/html/pod/perlre.html for information on the format and syntax of regular expressions.

Operator descriptions

== Select nodes where the attribute value is exactly this value.
!= Select nodes where the attribute value is not this specific value.
=~ Select nodes where the attribute value matches this regular expression.
!~ Select nodes where the attribute value does not match this regular expression.

The selection criteria can be specified using one or more -w attr<operator>val options on the command line.

If the val field includes spaces or any other characters that will be parsed by shell then the attr<operator>val needs to be quoted.

For example, to create a dynamic node group called “mygroup”, where the hardware control point is “hmc01” and the partition profile is not set to service

mkdef -t group -o mygroup -d -w hcp==hmc01 -w pprofile!=service

To create a dynamic node group called “pslesnodes”, where the operating system name includes “sles” and the architecture includes “ppc”

mkdef -t group -o pslesnodes -d -w os=~sles[0-9]+ -w arch=~ppc

To create a dynamic node group called nonpbladenodes where the node hardware management method is not set to blade and the architecture does not include ppc

mkdef -t group -o nonpbladenodes -d -w mgt!=blade -w 'arch!~ppc'
osimage
Description

A logical definition of image which can be used to provision the node.

Key Attributes
  • imagetype:
    The type of operating system this definition represents (linux, AIX).
  • osarch:
    The hardware architecture of the nodes this image supports. Valid values: x86_64, ppc64, ppc64le.
  • osvers:
    The Linux distribution name and release number of the image. Valid values: rhels*, rhelc*, rhas*, centos*, SL*, fedora*, sles* (where * is the version #).
  • pkgdir:
    The name of the directory where the copied OS distro content are stored.
  • pkglist:
    The fully qualified name of a file, which contains the list of packages shipped in Linux distribution ISO which will be installed on the node.
  • otherpkgdir
    When xCAT user needs to install some additional packages not shipped in Linux distribution ISO, those packages can be placed in the directory specified in this attribute. xCAT user should take care of dependency problems themselves, by putting all the dependency packages not shipped in Linux distribution ISO in this directory and creating repository in this directory.
  • otherpkglist:
    The fully qualified name of a file, which contains the list of user specified additional packages not shipped in Linux distribution ISO which will be installed on the node.
  • template:
    The fully qualified name of the template file that will be used to create the OS installer configuration file for stateful installation (e.g. kickstart for RedHat, autoyast for SLES and preseed for Ubuntu).
Use Cases
  • Case 1:

List all the osimage objects

lsdef -t osimage
  • Case 2:

Create a osimage definition “customized-rhels7-ppc64-install-compute” based on an existing osimage “rhels7-ppc64-install-compute”, the osimage “customized-rhels7-ppc64-install-compute” will inherit all the attributes of “rhels7-ppc64-install-compute” except installing the additional packages specified in the file “/tmp/otherpkg.list”:

step 1 : write the osimage definition “rhels7-ppc64-install-compute” to a stanza file “osimage.stanza”

lsdef -z -t osimage -o rhels7-ppc64-install-compute > /tmp/osimage.stanza

The content will look like

# <xCAT data object stanza file>

rhels7-ppc64-install-compute:
    objtype=osimage
    imagetype=linux
    osarch=ppc64
    osdistroname=rhels7-ppc64
    osname=Linux
    osvers=rhels7
    otherpkgdir=/install/post/otherpkgs/rhels7/ppc64
    pkgdir=/install/rhels7/ppc64
    pkglist=/opt/xcat/share/xcat/install/rh/compute.rhels7.pkglist
    profile=compute
    provmethod=install
    template=/opt/xcat/share/xcat/install/rh/compute.rhels7.tmpl

step 2 : modify the stanza file according to the attributes of “customized-rhels7-ppc64-install-compute”

# <xCAT data object stanza file>

customized-rhels7-ppc64-install-compute:
    objtype=osimage
    imagetype=linux
    osarch=ppc64
    osdistroname=rhels7-ppc64
    osname=Linux
    osvers=rhels7
    otherpkglist=/tmp/otherpkg.list
    otherpkgdir=/install/post/otherpkgs/rhels7/ppc64
    pkgdir=/install/rhels7/ppc64
    pkglist=/opt/xcat/share/xcat/install/rh/compute.rhels7.pkglist
    profile=compute
    provmethod=install
    template=/opt/xcat/share/xcat/install/rh/compute.rhels7.tmpl

step 3 : create the osimage “customized-rhels7-ppc64-install-compute” from the stanza file

cat /tmp/osimage.stanza |mkdef -z

xCAT Database

All of the xCAT Objects and Configuration data are stored in xCAT database. By default, xCAT uses SQLite - an OS contained simple database engine. The powerful open source database engines like MySQL, MariaDB, PostgreSQL are also supported for a large cluster.

xCAT defines about 70 tables to store different data. You can get the xCAT database definition from file /opt/xcat/lib/perl/xCAT/Schema.pm.

You can run tabdump command to get all the xCAT database tables. Or run tabdump -d <tablename> or man <tablename> to get the detail information on columns and table definitions.

$ tabdump
$ tabdump site
$ tabdump -d site
$ man site

For a complete reference, see the man page for xcatdb: man xcatdb.

The tables in xCAT:

  • site table

    Global settings for the whole cluster. This table is different from the other tables. Each entry in site table is a key=>value pair. Refer to the Global Configuration page for the major global attributes or run man site to get all global attributes.

  • policy table

    Controls who has authority to run specific xCAT operations. It is the Access Control List (ACL) in xCAT.

  • passwd table

    Contains default userids and passwords for xCAT to access cluster components. In most cases, xCAT will also set the userid/password in the relevant component (Generally for SP like bmc, fsp.) when it is being configured or installed. The default userids/passwords in passwd table for specific cluster components can be overridden by the columns in other tables, e.g. mpa , ipmi , ppchcp , etc.

  • networks table

    Contains the network definitions in the cluster.

    You can manipulate the networks through *def command against the network object.

    $ lsdef -t network
    

Manipulate xCAT Database Tables

xCAT offers 5 commands to manipulate the database tables:

  • tabdump

    Displays the header and all the rows of the specified table in CSV (comma separated values) format.

  • tabedit

    Opens the specified table in the user’s editor, allows them to edit any text, and then writes changes back to the database table. The table is flattened into a CSV (comma separated values) format file before giving it to the editor. After the editor is exited, the CSV file will be translated back into the database format.

  • tabgrep

    List table names in which an entry for the given node appears.

  • dumpxCATdb

    Dumps all the xCAT db tables to CSV files under the specified directory, often used to backup the xCAT database for xCAT reinstallation or management node migration.

  • restorexCATdb

    Restore the xCAT db tables from the CSV files under the specified directory.

Advanced Topic: How to use Regular Expression in xCAT tables:

Groups and Regular Expressions in Tables
Using Regular Expressions in the xCAT Tables

The xCAT database has a number of tables, some with rows that are keyed by node name (such as noderes and nodehm ) and others that are not keyed by node name (for example, the policy table). The tables that are keyed by node name have some extra features that enable a more template-based style to be used:

Any group name can be used in lieu of a node name in the node field, and that row will then provide “default” attribute values for any node in that group. A row with a specific node name can then override one or more attribute values for that specific node. For example, if the nodehm table contains

#node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,cmdmapping,comments,disable
"mygroup",,"ipmi",,,,,,"19200",,,,,
"node1",,,,,,,,"115200",,,,,

In the above example, the node group called “mygroup” sets mgt=ipmi and serialspeed=19200. Any nodes that are in this group will have those attribute values, unless overridden. For example, if “node2” is a member of “mygroup”, it will automatically inherit these attribute values (even though it is not explicitly listed in this table). In the case of “node1” above, it inherits mgt=ipmi, but overrides the serialspeed to be 115200, instead of 19200. A useful, typical way to use this capability is to create a node group for your nodes and for all the attribute values that are the same for every node, set them at the group level. Then you only have to set attributes for each node that vary from node to node.

xCAT extends the group capability so that it can also be used for attribute values that vary from node to node in a very regular pattern. For example, if in the ipmi table you want the bmc attribute to be set to whatever the nodename is with “-bmc” appended to the end of it, then use this in the ipmi table

#node,bmc,bmcport,taggedvlan,bmcid,username,password,comments,disable
"compute","/\z/-bmc/",,,,,,,

In this example, “compute” is a node group that contains all of the compute nodes. The 2nd attribute (bmc) is a regular expression that is similar to a substitution pattern. The 1st part \z matches the end of the node name and substitutes -bmc, effectively appending it to the node name.

Another example is if “node1” is assigned the IP address “10.0.0.1”, node2 is assigned the IP address “10.0.0.2”, etc., then this could be represented in the hosts table with the single row

#node,ip,hostnames,otherinterfaces,comments,disable
"compute","|node(\d+)|10.0.0.($1+0)|",,,,

In this example, the regular expression in the ip attribute uses | to separate the 1st and 2nd part. This means that xCAT will allow arithmetic operations in the 2nd part. In the 1st part, (\d+), will match the number part of the node name and put that in a variable called $1. The 2nd part is what value to give the ip attribute. In this case it will set it to the string “10.0.0.” and the number that is in $1. (Zero is added to $1 just to remove any leading zeros.)

A more involved example is with the vm table. If your kvm nodes have node names c01f01x01v01, c01f02x03v04, etc., and the kvm host names are c01f01x01, c01f02x03, etc., then you might have an vm table like

 #node,mgr,host,migrationdest,storage,storagemodel,storagecache,storageformat,cfgstore,memory,cpus,nics,nicmodel,bootorder,clockoffset,virtflags,master,vncport,textconsole,powerstate,beacon,datacenter,cluster,guestostype,othersettings,physlots,vidmodel,vidproto,vidpassword,comments,disable
"kvms",,"|\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)|c($1)f($2)x($3)|",,"|\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)|dir:///install/vms/vm($4+0)|",,,,,"3072","2","virbr2","virtio",,,,,,,,,,,,,,,,,,

Before you panic, let me explain each column:

kvms

This is a group name. In this example, we are assuming that all of your kvm nodes belong to this group. Each time the xCAT software accesses the vm table to get the kvm host host and storage file vmstorage of a specific kvm node (e.g. c01f02x03v04), this row will match (because c01f02x03v04 is in the kvms group). Once this row is matched for c01f02x03v04, then the processing described in the following items will take place.

|\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)|c($1)f($2)x($3)|

This is a perl substitution pattern that will produce the value for the 3rd column of the table (the kvm host). The text \D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+) between the 1st two vertical bars is a regular expression that matches the node name that was searched for in this table (in this example c01f02x03v04). The text that matches within the 1st set of parentheses is set to $1, 2nd set of parentheses is set to $2 ,3rd set of parentheses is set to $3,and so on. In our case, the \D+ matches the non-numeric part of the name (“c”,”f”,”x”,”v”) and the \d+ matches the numeric part (“01”,”02”,”03”,”04”). So $1 is set to “01”, $2 is set to “02”, $3 is set to “03”, and $4 is set to “04”. The text c($1)f($2)x($3) between the 2nd and 3rd vertical bars produces the string that should be used as the value for the host attribute for c01f02x03v04, i.e,”c01f02x03”.

|\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)|dir:///install/vms/vm($4+0)|

This item is similar to the one above. This substitution pattern will produce the value for the 5th column (a list of storage files or devices to be used). Because this row was the match for “c01f02x03v04”, the produced value is “dir:///install/vms/vm4”.

Just as the explained above, when the node definition “c01f02x03v04” is created with

# mkdef -t node -o c01f02x03v04 groups=kvms
1 object definitions have been created or modified.

The generated node definition is

# lsdef c01f02x03v04
Object name: c01f02x03v04
    groups=kvms
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles
    vmcpus=2
    vmhost=c01f02x03
    vmmemory=3072
    vmnicnicmodel=virtio
    vmnics=virbr2
    vmstorage=dir:///install/vms/vm4

See perlre for more information on perl regular expressions.

Easy Regular expressions

As of xCAT 2.8.1, you can use a modified version of the regular expression support described in the previous section. You do not need to enter the node information (1st part of the expression), it will be derived from the input nodename. You only need to supply the 2nd part of the expression to determine the value to give the attribute.

For example:

If node1 is assigned the IP address 10.0.0.1, node2 is assigned the IP address 10.0.0.2, etc., then this could be represented in the hosts table with the single row:

Using full regular expression support you would put this in the hosts table.

chdef -t group compute ip="|node(\d+)|10.0.0.($1+0)|"
tabdump hosts
#node,ip,hostnames,otherinterfaces,comments,disable
"compute","|node(\d+)|10.0.0.($1+0)|",,,,

Using easy regular expression support you would put this in the hosts table.

chdef -t group compute ip="|10.0.0.($1+0)|"
tabdump hosts
#node,ip,hostnames,otherinterfaces,comments,disable
"compute","|10.0.0.($1+0)|",,,,

In the easy regx example, the expression only has the 2nd part of the expression from the previous example. xCAT will evaluate the node name, matching the number part of the node name, and create the 1st part of the expression . The 2nd part supplied is what value to give the ip attribute. The resulting output is the same.

Regular Expression Helper Functions

xCAT provides several functions that can simplify regular expressions.

a2idx ASCII Character to Index

Usage: a2idx(character)

Turns a single character into a 1-indexed index. ‘a’ maps to 1 and ‘z’ maps to 26.

a2zidx ASCII Character to 0-Index

Usage: a2zidx(character)

Turns a single character into a 0-indexed index. ‘a’ maps to 0 and ‘z’ maps to 25.

dim2idx Dimensions to Index

Usage: dim2idx(value, [count, value...])

Converts dimensions (such as row, column, chassis, etc) into an index. An example system consists of 8 racks, two rows with four columns each.

row1-col1 row1-col2 row1-col3 row1-col4
row2-col1 row2-col2 row2-col3 row2-col4

To obtain the rack index, use |row(\d+)-col(\d+)|(dim2idx($1, 4, $2))|. This maps the racks to:

1 2 3 4
5 6 7 8

Note that the size of the highest dimension (2 rows) is not needed, and all values are one-indexed.

If each rack contains 20 nodes, use |row(\d+)-col(\d+)-node(\d+)|(dim2idx($1, 4, $2, 20, $3) to determine a node index (useful for determining IP addresses).

skip Skip indices

Usage: skip(index, skiplist)

Return an index with certain values skipped. The skip list uses the format start[:count][,start[:count]...]. Using the example above, to skip racks 3 and 4, use:

|row(\d+)-col(\d+)|(skip(dim2idx($1, 4, $2),'3:2')|

The result would be:

1 2    
3 4 5 6
ipadd Add to an IP address

Usage: ipadd(octet1, octet2, octet3, octet4, toadd, skipstart, skipend)

This function is useful when you need to cross octets. Optionally skip addresses at the start and end of octets (like .0 or .255 - technically those are valid IP addresses, but sometimes software makes poor assumptions about which broadcast and gateway addresses).

Verify your regular expression

After you create your table with regular expression, make sure they are evaluating as you expect.

lsdef node1 | grep ip
  ip=10.0.0.1

Global Configuration

All the xCAT global configurations are stored in site table, xCAT Admin can adjust the configuration by modifying the site attribute with tabedit.

This section only presents some key global configurations, for the complete reference on the xCAT global configurations, refer to the tabdump -d site.

Database Attributes
  • excludenodes: A set of comma separated nodes and/or groups that would automatically be subtracted from any noderange, it can be used for excluding some failed nodes from any xCAT command. See noderange for details on supported formats.
  • nodestatus: If set to n, the nodelist.status column will not be updated during the node deployment, node discovery and power operations. The default is to update.
DHCP Attributes
  • dhcpinterfaces: The network interfaces DHCP should listen on. If it is the same for all nodes, use a simple comma-separated list of NICs. To specify different NICs for different nodes

    xcatmn|eth1,eth2;service|bond0.
    

    In this example xcatmn is the name of the xCAT MN, and DHCP there should listen on eth1 and eth2. On all of the nodes in group service DHCP should listen on the bond0 nic.

  • dhcplease: The lease time for the dhcp client. The default value is 43200.

  • managedaddressmode: The mode of networking configuration during node provision. If set to static, the network configuration will be configured in static mode based on the node and network definition on MN. If set to dhcp, the network will be configured with dhcp protocol. The default is dhcp.

DNS Attributes
  • domain: The DNS domain name used for the cluster.

  • forwarders: The DNS servers at your site that can provide names outside of the cluster. The makedns command will configure the DNS on the management node to forward requests it does not know to these servers. Note that the DNS servers on the service nodes will ignore this value and always be configured to forward requests to the management node.

  • master: The hostname of the xCAT management node, as known by the nodes.

  • nameservers: A comma delimited list of DNS servers that each node in the cluster should use. This value will end up in the nameserver settings of the /etc/resolv.conf on each node. It is common (but not required) to set this attribute value to the IP addr of the xCAT management node, if you have set up the DNS on the management node by running makedns. In a hierarchical cluster, you can also set this attribute to <xcatmaster> to mean the DNS server for each node should be the node that is managing it (either its service node or the management node).

  • dnsinterfaces: The network interfaces DNS server should listen on. If it is the same for all nodes, use a simple comma-separated list of NICs. To specify different NICs for different nodes

    xcatmn|eth1,eth2;service|bond0.
    

    In this example xcatmn is the name of the xCAT MN, and DNS there should listen on eth1 and eth2. On all of the nodes in group service DNS should listen on the bond0 nic.

    NOTE: if using this attribute to block certain interfaces, make sure the ip that maps to your hostname of xCAT MN is not blocked since xCAT needs to use this ip to communicate with the local DNS server on MN.

Install/Deployment Attributes
  • installdir: The local directory name used to hold the node deployment packages.

  • runbootscripts: If set to yes the scripts listed in the postbootscripts attribute in the osimage and postscripts tables will be run during each reboot of stateful (diskful) nodes. This attribute has no effect on stateless nodes. Run the following command after you change the value of this attribute

    updatenode <nodes> -P setuppostbootscripts
    
  • precreatemypostscripts: (yes/1 or no/0). Default is no. If yes, it will instruct xCAT at nodeset and updatenode time to query the db once for all of the nodes passed into the cmd and create the mypostscript file for each node, and put them in a directory of tftpdir(such as: /tftpboot). If no, it will not generate the mypostscript file in the tftpdir.

  • xcatdebugmode: the xCAT debug level. xCAT provides a batch of techniques to help user debug problems while using xCAT, especially on OS provision, such as collecting logs of the whole installation process and accessing the installing system via ssh, etc. These techniques will be enabled according to different xCAT debug levels specified by ‘xcatdebugmode’, currently supported values:

    '0':  disable debug mode
    '1':  enable basic debug mode
    '2':  enable expert debug mode
    

    For the details on ‘basic debug mode’ and ‘expert debug mode’, refer to xCAT documentation.

Remoteshell Attributes
  • sshbetweennodes: Comma separated list of groups of compute nodes to enable passwordless root ssh during install, or xdsh -K. Default is ALLGROUPS. Set to NOGROUPS if you do not wish to enable it for any group of compute nodes. If using the zone table, this attribute in not used.
Services Attributes
  • consoleondemand: When set to yes, conserver connects and creates the console output only when the user opens the console. Default is no on Linux, yes on AIX.
  • timezone: The timezone for all the nodes in the cluster(e.g. America/New_York).
  • tftpdir: tftp directory path. Default is /tftpboot.
  • tftpflags: The flags used to start tftpd. Default is -v -l -s /tftpboot -m /etc/tftpmapfile4xcat.conf if tftplfags is not set.
Virtualization Attributes
  • persistkvmguests: Keep the kvm definition on the kvm hypervisor when you power off the kvm guest node. This is useful for you to manually change the kvm xml definition file in virsh for debugging. Set anything means enable.
xCAT Daemon attributes
  • xcatdport: The port used by xcatd daemon for client/server communication.
  • xcatiport: The port used by xcatd to receive installation status updates from nodes.
  • xcatlport: The port used by xcatd command log writer process to collect command output.
  • xcatsslversion: The ssl version by xcatd. Default is SSLv3.
  • xcatsslciphers: The ssl cipher by xcatd. Default is 3DES.

Network Planning

For a cluster, several networks are necessary to enable the cluster management and production.

  • Management network

    This network is used by the management node to install and manage the OS of the nodes. The MN and in-band NIC of the nodes are connected to this network. If you have a large cluster with service nodes, sometimes this network is segregated into separate VLANs for each service node.

    Following network services need be set up in this network to supply the OS deployment, application install/configuration service.

    • DNS(Domain Name Service)

      The dns server, usually the management node or service node, provides the domain name service for the entire cluster.

    • HTTP(HyperText Transfer Protocol)

      The http server,usually the management node or service node, acts as the download server for the initrd and kernel, the configuration file for the installer and repository for the online installation.

    • DHCP(Dynamic Host Configuration Protocol)

      The dhcp server, usually the management node or service node, provides the dhcp service for the entire cluster.

    • TFTP(Trivial File Transfer Protocol)

      The tftp server, usually the management node or service node, acts as the download server for bootloader binaries, bootloader configuration file, initrd and kernel.

    • NFS(Network File System)

      The NFS server, usually the management node or service node, provides the file system sharing between the management node and service node, or persistent file system support for the stateless node.

    • NTP(Network Time Protocol)

      The NTP server, usually the management node or service node, provide the network time service for the entire cluster.

  • Service network

    This network is used by the management node to control the nodes out of band via the SP like BMC, FSP. If the BMCs are configured in shared mode [1], then this network can be combined with the management network.

  • Application network

    This network is used by the applications on the compute nodes. Usually an IB network for HPC cluster.

  • Site (Public) network This network is used to access the management node and sometimes for the compute nodes to provide services to the site.

From the system management perspective, the Management network and Service network are necessary to perform the hardware control and OS deployment.

xCAT Network Planning for a New Cluster:

xCAT Network Planning

Before setting up your cluster, there are a few things that are important to think through first, because it is much easier to go in the direction you want right from the beginning, instead of changing course midway through.

Do You Need Hierarchy in Your Cluster?
Service Nodes

For very large clusters, xCAT has the ability to distribute the management operations to service nodes. This allows the management node to delegate all management responsibilities for a set of compute or storage nodes to a service node so that the management node doesn’t get overloaded. Although xCAT automates a lot of the aspects of deploying and configuring the services, it still adds complexity to your cluster. So the question is: at what size cluster do you need to start using service nodes? The exact answer depends on a lot of factors (mgmt node size, network speed, node type, OS, frequency of node deployment, etc.), but here are some general guidelines for how many nodes a single management node (or single service node) can handle:

  • [Linux]:
    • Stateful or Stateless: 500 nodes
    • Statelite: 250 nodes
  • [AIX]:
    150 nodes

These numbers can be higher (approximately double) if you are willing to “stage” the more intensive operations, like node deployment.

Of course, there are some reasons to use service nodes that are not related to scale, for example, if some of your nodes are far away (network-wise) from the mgmt node.

Network Hierarchy

For large clusters, you may want to divide the management network into separate subnets to limit the broadcast domains. (Service nodes and subnets don’t have to coincide, although they often do.) xCAT clusters as large as 3500 nodes have used a single broadcast domain.

Some cluster administrators also choose to sub-divide the application interconnect to limit the network contention between separate parallel jobs.

Design an xCAT Cluster for High Availability

Everyone wants their cluster to be as reliable and available as possible, but there are multiple ways to achieve that end goal. Availability and complexity are inversely proportional. You should choose an approach that balances these 2 in a way that fits your environment the best. Here’s a few choices in order of least complex to more complex.

Service Node Pools With No HA Software

Service node pools is an xCAT approach in which more than one service node (SN) is in the broadcast domain for a set of nodes. When each node netboots, it chooses an available SN by which one responds to its DHCP request 1st. When services are set up on the node (e.g. DNS), xCAT configures the services to use at that SN and one other SN in the pool. That way, if one SN goes down, the node can keep running, and the next time it netboots it will automatically choose another SN.

This approach is most often used with stateless nodes because that environment is more dynamic. It can possibly be used with stateful nodes (with a little more effort), but that type of node doesn’t netboot nearly as often so a more manual operation (snmove) is needed in that case move a node to different SNs.

It is best to have the SNs be as robust as possible, for example, if they are diskful, configure them with at least 2 disks that are RAID’ed together.

In smaller clusters, the management node (MN) can be part of the SN pool with one other SN.

In larger clusters, if the network topology dictates that the MN is only for managing the SNs (not the compute nodes), then you need a plan for what to do if the MN fails. Since the cluster can continue to run if the MN is down temporarily, the plan could be as simple as have a backup MN w/o any disks. If the primary MN fails, move its RAID’ed disks to the backup MN and power it on.

HA Management Node

If you want to use HA software on your management node to synchronize data and fail over services to a backup MN, see [TODO Highly_Available_Management_Node], which discusses the different options and the pros and cons.

It is important to note that some HA-related software like DRDB, Pacemaker, and Corosync is not officially supported by IBM, meaning that if you have a problem specifically with that software, you will have to go to the open source community or another vendor to get a fix.

HA Service Nodes

When you have NFS-based diskless (statelite) nodes, there is sometimes the motivation make the NFS serving highly available among all of the service nodes. This is not recommended because it is a very complex configuration. In our opinion, the complexity of this setup can nullify much of the availability you hope to gain. If you need your compute nodes to be highly available, you should strongly consider stateful or stateless nodes.

If you still have reasons to pursue HA service nodes:

  • For [AIX] , see [TODO XCAT_HASN_with_GPFS]
  • For [Linux] , a couple prototype clusters have been set up in which the NFS service on the SNs is provided by GPFS CNFS (Clustered NFS). A howto is being written to describe the setup as an example. Stay tuned.
[1]shared mode: In “Shared” mode, the BMC network interface and the in-band network interface will share the same network port.

xCAT Cluster OS Running Type

Whether a node is a physical server or a virtual machine, it needs to run an Operating System to support user applications. Generally, the OS is installed in the hard disk of the compute node. But xCAT also support the type that running OS in the RAM.

This section gives the pros and cons of each OS running type, and describes the cluster characteristics that will impact from each.

Stateful (diskful)

Traditional cluster with OS on each node’s local disk.

  • Main advantage

    This approach is familiar to most admins, and they typically have many years of experience with it.

  • Main disadvantage

    Admin has to manage all of the individual OS copies, has to face the failure of hard disk. For certain application which requires all the compute nodes have exactly same state, this is also changeable for admin.

Stateless (diskless)

Nodes boot from a RAMdisk OS image downloaded from the xCAT mgmt node or service node at boot time.

  • Main advantage

    Central management of OS image, but nodes are not tethered to the mgmt node or service node it booted from. Whenever you need a new OS for the node, just reboot the node.

  • Main disadvantage

    You can’t use a large image with many different applications in the image for varied users, because it uses too much of the node’s memory to store the ramdisk. (To mitigate this disadvantage, you can put your large application binaries and libraries in shared storage to reduce the ramdisk size. This requires some manual configuration of the image).

    Each node can also have a local “scratch” disk for swap, /tmp, /var, log files, dumps, etc. The purpose of the scratch disk is to provide a location for files that are written to by the node that can become quite large or for files that you don’t want to disappear when the node reboots. There should be nothing put on the scratch disk that represents the node’s “state”, so that if the disk fails you can simply replace it and reboot the node. A scratch disk would typically be used for situations like: job scheduling preemption is required (which needs a lot of swap space), the applications write large temp files, or you want to keep gpfs log or trace files persistently. (As a partial alternative to using the scratch disk, customers can choose to put /tmp /var/tmp, and log files (except GPFS logs files) in GPFS, but must be willing to accept the dependency on GPFS). This can be done by enabling the ‘localdisk’ support. For the details, refer to the section [TODO Enabling the localdisk Option].

OSimage Definition

The attribute provmethod is used to identify that the osimage is diskful or diskless:

$ lsdef -t osimage rhels7.1-x86_64-install-compute -i provmethod
Object name: rhels7.1-x86_64-install-compute
    provmethod=install
install:
Diskful
netboot:
Diskless

Manage Clusters

The following provides detailed information to help start managing your cluster using xCAT.

The sections are organized based on hardware architecture.

IBM POWER LE / OpenPOWER

Most of the content is general information for xCAT, the focus and examples are for management of IBM OpenPOWER servers.

IBM OpenPOWER Servers
  • based on POWER8 Processor Technology is IPMI managed
  • based on POWER9 Processor Technology is OpenBMC managed [Alpha]
Configure xCAT

After installing xCAT onto the management node, configure some basic attributes for your cluster into xCAT.

Set attributes in the site table
  1. Verify the following attributes have been correctly set in the xCAT site table.

    • domain
    • forwarders
    • master [1]
    • nameservers

    For more information on the keywords, see the DHCP ATTRIBUTES in the site table.

    If the fields are not set or need to be changed, use the xCAT chdef command:

    chdef -t site domain="domain_string"
    chdef -t site fowarders="forwarders"
    chdef -t site master="xcat_master_ip"
    chdef -t site nameservers="nameserver1,nameserver2,etc"
    
[1]The value of the master attribute in the site table should be set as the IP address of the management node responsible for the compute node.
Initialize DNS services
  1. Initialize the DNS [2] services on the xCAT Management Node:

    makedns -n
    

    Verify DNS is working by running nslookup against your Management Node:

    nslookup <management_node_hostname>
    

    For more information on DNS, refer to Cluster Name Resolution

[2]Setting up name resolution and the ability to have hostname resolved to IP addresses is required for xCAT.
Set attributes in the networks table
  1. Display the network settings defined in the xCAT networks table using: tabdump networks

    #netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,
    dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,
    comments,disable
    "10_0_0_0-255_0_0_0","10.0.0.0","255.0.0.0","eth0","10.0.0.101",,"10.4.27.5",,,,,,,,,,,,,
    

    A default network is created for the detected primary network using the same netmask and gateway. There may be additional network entries in the table for each network present on the management node where xCAT is installed.

  2. To define additional networks, use one of the following options:

    • [Recommended] Use mkdef to create/update an entry into networks table.

      To create a network entry for 192.168.X.X/16 with a gateway of 192.168.1.254:

      mkdef -t network -o net1 net=192.168.0.0 mask=255.255.0.0 gateway=192.168.1.254
      
    • Use the tabedit command to modify the networks table directly in an editor:

      tabedit networks
      
    • Use the makenetworks command to automatically generate a entry in the networks table:

      makenetworks
      
  3. Verify the network statements

    Domain and nameserver attributes must be configured in the networks table or in the site table for xCAT to function properly.

Initialize DHCP services

Configure DHCP to listen on different network interfaces [Optional]

The default behavior of xCAT is to configure DHCP to listen on all interfaces defined in the networks table.

The dhcpinterfaces keyword in the site table allows administrators to limit the interfaces that DHCP will listen over. If the management node has 4 interfaces, (eth0, eth1, eth2, and eth3), and you want DHCP to listen only on “eth1” and “eth3”, set dhcpinterfaces using:

chdef -t site dhcpinterfaces="eth1,eth3"

To set “eth1” and “eth3” on the management node and “bond0” on all nodes in the nodegroup=”service”, set dhcpinterfaces using:

chdef -t site dhcpinterfaces="eth1,eth3;service|bond0"

or, to explicitly identify the management node with hostname xcatmn:

chdef -t site dhcpinterfaces="xcatmn|eth1,eth3;service|bond0"
noboot

For the IBM OpenPOWER S822LC for HPC (“Minsky”) nodes, the BMC and compute “eth0” share the left-side integrated ethernet port and compute “eth1” is the right-side integrated ethernet port. For these servers, it is recommended to use two physical cables allowing the BMC port to be dedicated and “eth1” used by the OS. When an open range is configured on the two networks, the xCAT Genesis kernel will be sent to the BMC interface and causes problems during hardware discovery. To support this scenario, on the xCAT management node, if “eth1” is connected to the BMC network and “eth3” is connected to the compute network, disable genesis boot for the BMC network by setting :noboot in dhcpinterfaces using:

chdef -t site dhcpinterfaces="eth1:noboot,eth3"

# run the mknb command to remove the genesis
# configuration file for the specified network
mknb ppc64

For more information, see dhcpinterfaces keyword in the site table.

After making any DHCP changes, create a new DHCP configuration file with the networks defined using the makedhcp command.

makedhcp -n
Configure passwords
  1. Configure the system password for the root user on the compute nodes.

    • Set using the chtab command:

      chtab key=system passwd.username=root passwd.password=abc123
      

      To encrypt the password using openssl, use the following command:

      chtab key=system passwd.username=root passwd.password=`openssl passwd -1 abc123`
      
  2. Configure the passwords for Management modules of the compute nodes.

    • For OpenBMC managed systems:

      chtab key=openbmc passwd.username=root passwd.password=0penBmc
      
    • For IPMI/BMC managed systems:

      chtab key=ipmi passwd.username=ADMIN passwd.password=admin
      
    • For HMC managed systems:

      chtab key=hmc passwd.username=hscroot passwd.password=abc123
      

      If the username/password is different for multiple HMCs, set the username and password attribute for each HMC node object in xCAT

    • For Blade managed systems:

      chtab key=blade passwd.username=USERID passwd.password=PASSW0RD
      
    • For FSP/BPA (Flexible Service Processor/Bulk Power Assembly) the factory default passwords must be changed before running commands against them.

      rspconfig frame general_passwd=general,<newpassword>
      rspconfig frame admin_passwd=admin,<newpassword>
      rspconfig frame HMC_passwd=,<newpassword>
      
  3. If using the xCAT REST API

    1. Create a non-root user that will be used to make the REST API calls.

      useradd xcatws
      passwd xcatws # set the password
      
    2. Create an entry for the user into the xCAT passwd table.

      chtab key=xcat passwd.username=xcatws passwd.password=<xcatws_password>
      
    3. Set a policy in the xCAT policy table to allow the user to make calls against xCAT.

      mkdef -t policy 6 name=xcatws rule=allow
      

    When making calls to the xCAT REST API, pass in the credentials using the following attributes: userName and userPW

Hardware Discovery & Define Node

In order to manage machines using xCAT, the machines need to be defined as xCAT node objects in the database. The xCAT Objects documentation describes the process for manually creating node objects one by one using the xCAT mkdef command. This is valid when managing a small sizes cluster but can be error prone and cumbersome when managing large sized clusters.

xCAT provides several automatic hardware discovery methods to assist with hardware discovery by helping to simplify the process of detecting service processors (SP) and collecting various server information. The following are methods that xCAT supports:

MTMS-based Discovery

MTMS stands for Machine Type/Model and Serial. This is one way to uniquely identify each physical server.

MTMS-based hardware discovery assumes the administrator has the model type and serial number information for the physical servers and a plan for mapping the servers to intended hostname/IP addresses.

Overview

  1. Automatically search and collect MTMS information from the servers
  2. Write discovered-bmc-nodes to xCAT (recommended to set different BMC IP address)
  3. Create predefined-compute-nodes to xCAT providing additional properties
  4. Power on the nodes which triggers xCAT hardware discovery engine

Pros

  • Limited effort to get servers defined using xCAT hardware discovery engine

Cons

  • When compared to switch-based discovery, the administrator needs to create the predefined-compute-nodes for each of the discovered-bmc-nodes. This could become difficult for a large number of servers.
Verification

Before starting hardware discovery, ensure the following is configured to make the discovery process as smooth as possible.

Password Table

In order to communicate with IPMI-based hardware (with BMCs), verify that the xCAT passwd table contains an entry for ipmi which defines the default username and password to communicate with the IPMI-based servers.

tabdump passwd | grep ipmi

If not configured, use the following command to set usernam=ADMIN and password=admin.

chtab key=ipmi passwd.username=ADMIN passwd.password=admin
Genesis Package

The xCAT-genesis packages provides the utility to create the genesis network boot rootimage used by xCAT when doing hardware discovery. It should be installed during the xCAT install and would cause problems if missing.

Verify that the genesis-scripts and genesis-base packages are installed:

  • [RHEL/SLES]:

    rpm -qa | grep -i genesis
    
  • [Ubuntu]:

    dpkg -l | grep -i genesis
    

If missing:

  1. Install them from the xcat-dep repository using the Operating Specific package manager (yum, zypper, apt-get, etc)

    • [RHEL]:

      yum install xCAT-genesis
      
    • [SLES]:

      zypper install xCAT-genesis
      
    • [Ubuntu]:

      apt-get install xCAT-genesis
      
  2. Create the network boot rootimage with the following command: mknb ppc64.

    The genesis kernel should be copied to /tftpboot/xcat.

Discovery

When the IPMI-based servers are connected to power, the BMCs will boot up and attempt to obtain an IP address from an open range dhcp server on your network. In the case for xCAT managed networks, xCAT should be configured serve an open range dhcp IP addresses with the dynamicrange attribute in the networks table.

When the BMCs have an IP address and is pingable from the xCAT management node, administrators can discover the BMCs using the xCAT’s bmcdiscover command and obtain basic information to start the hardware discovery process.

xCAT Hardware discover uses the xCAT genesis kernel (diskless) to discover additional attributes of the compute node and automatically populate the node definitions in xCAT.

Set static BMC IP using dhcp provided IP address

The following example outlines the MTMS based hardware discovery for a single IPMI-based compute node.

Compute Node Information Value
Model Type 8247-22l
Serial Number 10112CA
Hostname cn01
IP address 10.0.101.1

The BMC IP address is obtained by the open range dhcp server and the plan is to leave the IP address the same, except we want to change the IP address to be static in the BMC.

BMC Information Value
IP address - dhcp 50.0.100.1
IP address - static 50.0.100.1
  1. Pre-define the compute nodes:

    Use the bmcdiscover command to help discover the nodes over an IP range and easily create a starting file to define the compute nodes into xCAT.

    To discover the compute nodes for the BMCs with an IP address of 50.0.100.1, use the command:

    bmcdiscover --range 50.0.100.1 -z > predefined.stanzas
    

    The discovered nodes have the naming convention: node-<model-type>-<serial-number>

    # cat predefined.stanzas
    node-8247-22l-10112ca:
      objtype=node
      groups=all
      bmc=50.0.100.1
      cons=ipmi
      mgt=ipmi
      mtm=8247-22L
      serial=10112CA
    
  2. Edit the predefined.stanzas file and change the discovered nodes to the intended hostname and IP address.

    1. Edit the predefined.stanzas file:

      vi predefined.stanzas
      
    2. Rename the discovered object names to their intended compute node hostnames based on the MTMS mapping:

      node-8247-22l-10112ca ==> cn01
      
    3. Add a ip attribute and give it the compute node IP address:

      ip=10.0.101.1
      
    4. Repeat for additional nodes in the predefined.stanza file based on the MTMS mapping.

    In this example, the predefined.stanzas file now looks like the following:

    # cat predefined.stanzas
    cn01:
      objtype=node
      groups=all
      bmc=50.0.100.1
      cons=ipmi
      mgt=ipmi
      mtm=8247-22L
      serial=10112CA
      ip=10.0.101.1
    
  3. Define the compute nodes into xCAT:

    cat predefined.stanzas | mkdef -z
    
  4. Set the chain table to run the bmcsetup script, this will set the BMC IP to static.

    chdef cn01 chain="runcmd=bmcsetup"
    
  5. [Optional] More operation plan to do after hardware disocvery is done, ondiscover option can be used.

    For example, configure console, copy SSH key for OpenBMC, then disable powersupplyredundancy

    chdef cn01 -p chain="ondiscover=makegocons|rspconfig:sshcfg|rspconfig:powersupplyredundancy=disabled"
    

    Note: | is used to split commands, and : is used to split command with its option.

  6. Set the target osimage into the chain table to automatically provision the operating system after the node discovery is complete.

    chdef cn01 -p chain="osimage=<osimage_name>"
    
  7. Add the compute node IP information to /etc/hosts:

    makehosts cn01
    
  8. Refresh the DNS configuration for the new hosts:

    makedns -n
    
  9. [Optional] Monitor the node discovery process using rcons

    Configure the goconserver for the predefined node to watch the discovery process using rcons:

    makegocons cn01
    

    In another terminal window, open the remote console:

    rcons cn01
    
  10. Start the discovery process by booting the predefined node definition:

    rsetboot cn01 net
    rpower cn01 on
    
  11. The discovery process will network boot the machine into the diskless xCAT genesis kernel and perform the discovery process. When the discovery process is complete, doing lsdef on the compute nodes should show discovered attributes for the machine. The important mac information should be discovered, which is necessary for xCAT to perform OS provisioning.

Switch-based Discovery

For switch based hardware discovery, the servers are identified through the switches and switchposts they are directly connected to.

In this document, the following configuration is used in the example

Management Node info:

MN Hostname: xcat1
MN NIC info for Management Network(Host network): eth1, 10.0.1.1/16
MN NIC info for Service Network(FSP/BMC network): eth2, 50.0.1.1/16
Dynamic IP range for Hosts: 10.0.100.1-10.0.100.100
Dynamic IP range for FSP/BMC: 50.0.100.1-50.0.100.100

Compute Node info:

CN Hostname: cn1
Machine type/model: 8247-22L
Serial: 10112CA
IP Address: 10.0.101.1
Root Password: cluster
Desired FSP/BMC IP Address: 50.0.101.1
DHCP assigned FSP/BMC IP Address: 50.0.100.1
FSP/BMC username: ADMIN
FSP/BMC Password: admin

Switch info:

Switch name: switch1
Switch username: xcat
Switch password: passw0rd
Switch IP Address: 10.0.201.1
Switch port for Compute Node: port0
Configure xCAT
Configure network table

Normally, there will be at least two entries for the two subnet on MN in networks table after xCAT is installed:

#tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable
"10_0_0_0-255_255_0_0","10.0.0.0","255.255.0.0","eth1","<xcatmaster>",,"10.0.1.1",,,,,,,,,,,,,
"50_0_0_0-255_255_0_0","50.0.0.0","255.255.0.0","eth2","<xcatmaster>",,"50.0.1.1",,,,,,,,,,,,,

Run the following command to add networks in networks table if there are no entries in it:

makenetworks
Setup DHCP

Set the correct NIC from which DHCP server provide service:

chdef -t site dhcpinterfaces=eth1,eth2

Add dynamic range in purpose of assigning temporary IP address for FSP/BMCs and hosts:

chdef -t network 10_0_0_0-255_255_0_0 dynamicrange="10.0.100.1-10.0.100.100"
chdef -t network 50_0_0_0-255_255_0_0 dynamicrange="50.0.100.1-50.0.100.100"

Update DHCP configuration file:

makedhcp -n
makedhcp -a
Config passwd table

Set required passwords for xCAT to do hardware management and/or OS provisioning by adding entries to the xCAT passwd table:

# tabedit passwd
# key,username,password,cryptmethod,authdomain,comments,disable

For hardware management with ipmi, add the following line:

"ipmi","ADMIN","admin",,,,
Verify the genesis packages

The xcat-genesis packages should have been installed when xCAT was installed, but would cause problems if missing. xcat-genesis packages are required to create the genesis root image to do hardware discovery and the genesis kernel sits in /tftpboot/xcat/. Verify that the genesis-scripts and genesis-base packages are installed:

  • [RHEL/SLES]: rpm -qa | grep -i genesis
  • [Ubuntu]: dpkg -l | grep -i genesis

If missing, install them from the xcat-deps package and run mknb ppc64 to create the genesis network boot root image.

Predefined Nodes

In order to differentiate one node from another, the admin needs to predefine node in xCAT database based on the switches information. This consists of two parts:

  1. Predefine Switches
  2. Predefine Server Node

Predefine Switches

The predefined switches will represent devices that the physical servers are connected to. xCAT need to access those switches to get server related information through SNMP v3.

So the admin need to make sure those switches are configured correctly with SNMP v3 enabled. <TODO: The document that Configure Ethernet Switches>

Then, define switch info into xCAT:

nodeadd switch1 groups=switch,all
chdef switch1 ip=10.0.201.1
tabch switch=switch1 switches.snmpversion=3 switches.username=xcat switches.password=passw0rd switches.auth=sha

Add switch into DNS using the following commands:

makehosts switch1
makedns -n

Predefine Server node

After switches are defined, the server node can be predefined with the following commands:

nodeadd cn1 groups=powerLE,all
chdef cn1 mgt=ipmi cons=ipmi ip=10.0.101.1 bmc=50.0.101.1 netboot=petitboot installnic=mac primarynic=mac
chdef cn1 switch=switch1 switchport=0

[Optional] If more configuration planed to be done on BMC, the following command is also needed.

chdef cn1 bmcvlantag=<vlanid>                 # tag VLAN ID for BMC
chdef cn1 bmcusername=<desired_username>
chdef cn1 bmcpassword=<desired_password>

In order to do BMC configuration during the discovery process, set runcmd=bmcsetup.

chdef cn1 chain="runcmd=bmcsetup"

[Optional] More operation plan to do after hardware disocvery is done, ondiscover option can be used.

For example, configure console, copy SSH key for OpenBMC, then disable powersupplyredundancy

chdef cn01 -p chain="ondiscover=makegocons|rspconfig:sshcfg|rspconfig:powersupplyredundancy=disabled"

Note: | is used to split commands, and : is used to split command with its option.

Set the target osimage into the chain table to automatically provision the operating system after the node discovery is complete.

chdef cn1 -p chain="osimage=<osimage_name>"

For more information about chain, refer to Chain

Add cn1 into DNS:

makehosts cn1
maekdns -n
Discover server and define

After environment is ready, and the server is powered, we can start server discovery process. The first thing to do is discovering the FSP/BMC of the server. It is automatically powered on when the physical server is powered.

Use the bmcdiscover command to discover the BMCs responding over an IP range and write the output into the xCAT database. This discovered BMC node is used to control the physical server during hardware discovery and will be deleted after the correct server node object is matched to a pre-defined node. You must use the -w option to write the output into the xCAT database.

To discover the BMC with an IP address range of 50.0.100.1-100:

bmcdiscover --range 50.0.100.1-100 -z -w

The discovered nodes will be written to xCAT database. The discovered BMC nodes are in the form node-model_type-serial. To view the discovered nodes:

lsdef /node-.*

Note: The bmcdiscover command will use the username/password from the passwd table corresponding to key=ipmi. To overwrite with a different username/password use the -u and -p option to bmcdiscover.

Start discovery process

To start discovery process, just need to power on the PBMC node remotely with the following command, and the discovery process will start automatically after the host is powered on:

rpower node-8247-42l-10112ca on

[Optional] If you’d like to monitor the discovery process, you can use:

makegocons node-8247-42l-10112ca
rcons node-8247-42l-10112ca
Verify node definition

The following is an example of the server node definition after hardware discovery:

#lsdef cn1
Object name: cn1
    arch=ppc64
    bmc=50.0.101.1
    cons=ipmi
    cpucount=192
    cputype=POWER8E (raw), altivec supported
    groups=powerLE,all
    installnic=mac
    ip=10.0.101.1
    mac=6c:ae:8b:02:12:50
    memory=65118MB
    mgt=ipmi
    mtm=8247-22L
    netboot=petitboot
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles
    primarynic=mac
    serial=10112CA
    supportedarchs=ppc64
    switch=switch1
    switchport=0
Sequential-based Discovery

When the physical location of the server is not so important, sequential-based hardware discovery can be used to simplify the discovery work. The idea is: provided a node pool, each node in the pool will be assigned an IP address for host and an IP address for FSP/BMC, then the first physical server discovery request will be matched to the first free node in the node pool, and IP addresses for host and FSP/BMC will be assigned to that physical server.

In this document, the following configuration is used in the example

Management Node info:

MN Hostname: xcat1
MN NIC info for Management Network(Host network): eth1, 10.0.1.1/16
MN NIC info for Service Network(FSP/BMC network): eth2, 50.0.1.1/16
Dynamic IP range for Hosts: 10.0.100.1-10.0.100.100
Dynamic IP range for FSP/BMC: 50.0.100.1-50.0.100.100

Compute Node info:

CN Hostname: cn1
Machine type/model: 8247-22L
Serial: 10112CA
IP Address: 10.0.101.1
Root Password: cluster
Desired FSP/BMC IP Address: 50.0.101.1
DHCP assigned FSP/BMC IP Address: 50.0.100.1
FSP/BMC username: ADMIN
FSP/BMC Password: admin
Configure xCAT
Configure network table

Normally, there will be at least two entries for the two subnet on MN in networks table after xCAT is installed:

#tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable
"10_0_0_0-255_255_0_0","10.0.0.0","255.255.0.0","eth1","<xcatmaster>",,"10.0.1.1",,,,,,,,,,,,,
"50_0_0_0-255_255_0_0","50.0.0.0","255.255.0.0","eth2","<xcatmaster>",,"50.0.1.1",,,,,,,,,,,,,

Run the following command to add networks in networks table if there are no entries in it:

makenetworks
Setup DHCP

Set the correct NIC from which DHCP server provide service:

chdef -t site dhcpinterfaces=eth1,eth2

Add dynamic range in purpose of assigning temporary IP address for FSP/BMCs and hosts:

chdef -t network 10_0_0_0-255_255_0_0 dynamicrange="10.0.100.1-10.0.100.100"
chdef -t network 50_0_0_0-255_255_0_0 dynamicrange="50.0.100.1-50.0.100.100"

Update DHCP configuration file:

makedhcp -n
makedhcp -a
Config passwd table

Set required passwords for xCAT to do hardware management and/or OS provisioning by adding entries to the xCAT passwd table:

# tabedit passwd
# key,username,password,cryptmethod,authdomain,comments,disable

For hardware management with ipmi, add the following line:

"ipmi","ADMIN","admin",,,,
Verify the genesis packages

The xcat-genesis packages should have been installed when xCAT was installed, but would cause problems if missing. xcat-genesis packages are required to create the genesis root image to do hardware discovery and the genesis kernel sits in /tftpboot/xcat/. Verify that the genesis-scripts and genesis-base packages are installed:

  • [RHEL/SLES]: rpm -qa | grep -i genesis
  • [Ubuntu]: dpkg -l | grep -i genesis

If missing, install them from the xcat-deps package and run mknb ppc64 to create the genesis network boot root image.

Prepare node pool

To prepare the node pool, shall predefine nodes first, then initialize the discovery process with the predefined nodes.

Predefine nodes

Predefine a group of nodes with desired IP address for host and IP address for FSP/BMC:

nodeadd cn1 groups=powerLE,all
chdef cn1 mgt=ipmi cons=ipmi ip=10.0.101.1 bmc=50.0.101.1 netboot=petitboot installnic=mac primarynic=mac

[Optional] If more configuration planed to be done on BMC, the following command is also needed.

chdef cn1 bmcvlantag=<vlanid>                 # tag VLAN ID for BMC
chdef cn1 bmcusername=<desired_username>
chdef cn1 bmcpassword=<desired_password>

In order to do BMC configuration during the discovery process, set runcmd=bmcsetup.

chdef cn1 chain="runcmd=bmcsetup"

[Optional] More operation plan to do after hardware disocvery is done, ondiscover option can be used.

For example, configure console, copy SSH key for OpenBMC, then disable powersupplyredundancy

chdef cn01 -p chain="ondiscover=makegocons|rspconfig:sshcfg|rspconfig:powersupplyredundancy=disabled"

Note: | is used to split commands, and : is used to split command with its option.

Set the target osimage into the chain table to automatically provision the operating system after the node discovery is complete.

chdef cn1 -p chain="osimage=<osimage_name>"

For more information about chain, refer to Chain

Initialize the discovery process

Specify the predefined nodes to the nodediscoverstart command to initialize the discovery process:

nodediscoverstart noderange=cn1

See nodediscoverstart for more information.

Display information about the discovery process

There are additional nodediscover* commands you can run during the discovery process. See the man pages for more details.

Verify the status of discovery using nodediscoverstatus:

nodediscoverstatus

Show the nodes that have been discovered using nodediscoverls:

nodediscoverls -t seq -l

Stop the current sequential discovery process using: nodediscoverstop:

nodediscoverstop

Note: The sequential discovery process will stop automatically when all of the node names in the pool are consumed.

Start discovery process

To start the discovery process, the system administrator needs to power on the servers one by one manually. Then the hardware discovery process will start automatically.

Verify Node Definition

After discovery of the node, properties of the server will be added to the xCAT node definition.

Display the node definition and verify that the MAC address has been populated.

Manually Define Nodes

Manually Define Node means the admin knows the detailed information of the physical server and manually defines it into xCAT database with mkdef commands.

In this document, the following configuration is used in the example

Management Node info:

MN Hostname: xcat1
MN NIC info for Management Network(Host network): eth1, 10.0.1.1/16
MN NIC info for Service Network(FSP/BMC network): eth2, 50.0.1.1/16
Dynamic IP range for Hosts: 10.0.100.1-10.0.100.100
Dynamic IP range for FSP/BMC: 50.0.100.1-50.0.100.100

Compute Node info:

CN Hostname: cn1
Machine type/model: 8247-22L
Serial: 10112CA
IP Address: 10.0.101.1
Root Password: cluster
Desired FSP/BMC IP Address: 50.0.101.1
DHCP assigned FSP/BMC IP Address: 50.0.100.1
FSP/BMC username: ADMIN
FSP/BMC Password: admin
Manually Define Node

Execute mkdef command to define the node:

mkdef -t node cn1 groups=powerLE,all mgt=ipmi cons=ipmi ip=10.0.101.1 netboot=petitboot bmc=50.0.101.1 bmcusername=ADMIN bmcpassword=admin installnic=mac primarynic=mac mac=6c:ae:8b:6a:d4:e4

The manually defined node will be like this:

Object name: cn1
    bmc=50.0.101.1
    bmcpassword=admin
    bmcusername=ADMIN
    cons=ipmi
    groups=powerLE,all
    installnic=mac
    ip=10.0.101.1
    mac=6c:ae:8b:6a:d4:e4
    mgt=ipmi
    netboot=petitboot
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles
    primarynic=mac

mkdef --template can be used to create node definitions easily from the typical node definition templates or existing node definitions, some examples:

  • creating node definition “cn2” from an existing node definition “cn1”

    mkdef -t node -o cn2 --template cn1 mac=66:55:44:33:22:11 ip=172.12.139.2 bmc=172.11.139.2
    

    except for the attributes specified (mac, ip and bmc), other attributes of the newly created node “cn2” inherit the values of template node “cn1”

  • creating a node definition “cn2” with the template “ppc64le-openbmc-template” (openbmc controlled ppc64le node) shipped by xCAT

    mkdef -t node -o cn2 --template ppc64le-openbmc-template mac=66:55:44:33:22:11 ip=172.12.139.2 bmc=172.11.139.2 bmcusername=root bmcpassword=0penBmc
    

    the unspecified attributes of newly created node “cn2” will be assigned with the default values in the template

    to list all the node definition templates available in xCAT, run

    lsdef -t node --template
    

    to display the full definition of template “ppc64le-openbmc-template”, run

    lsdef -t node --template ppc64le-openbmc-template
    

    the mandatory attributes, which must be specified while creating definitions with templates, are denoted with the value MANDATORY:<attribute description> in template definition.

    the optional attributes, which can be specified optionally, are denoted with the value OPTIONAL:<attribute description> in template definition

Manually Discover Nodes

If you have a few nodes which were not discovered by automated hardware discovery process, you can find them in discoverydata table using the nodediscoverls command. The undiscovered nodes are those that have a discovery method value of ‘undef’ in the discoverydata table.

Display the undefined nodes with the nodediscoverls command:

#nodediscoverls -t undef
UUID                                    NODE                METHOD         MTM       SERIAL
fa2cec8a-b724-4840-82c7-3313811788cd    undef               undef          8247-22L  10112CA

If you want to manually define an ‘undefined’ node to a specific free node name, use the nodediscoverdef(TODO) command.

Before doing that, a node with desired IP address for host and FSP/BMC must be defined first:

nodeadd cn1 groups=powerLE,all
chdef cn1 mgt=ipmi cons=ipmi ip=10.0.101.1 bmc=50.0.101.1 netboot=petitboot installnic=mac primarynic=mac

For example, if you want to assign the undefined node whose uuid is fa2cec8a-b724-4840-82c7-3313811788cd to cn1, run:

nodediscoverdef -u fa2cec8a-b724-4840-82c7-3313811788cd -n cn1

After manually defining it, the ‘node name’ and ‘discovery method’ attributes of the node will be changed. You can display the changed attributes using the nodediscoverls command:

#nodediscoverls
UUID                                    NODE                METHOD         MTM       SERIAL
fa2cec8a-b724-4840-82c7-3313811788cd    cn1                manual          8247-22L  10112CA

Following are the brief characteristics and adaptability of each method, you can select a proper one according to your cluster size and other consideration.

  • Manually Define Nodes

    Manually collect information for target servers and manually define them to xCAT Node Object through mkdef command.

    This method is recommended for small cluster which has less than 10 nodes.

    • pros

      No specific configuration and procedure required and very easy to use.

    • cons

      It will take additional time to configure the SP (Management Modules like: BMC, FSP) and collect the server information like MTMS (Machine Type and Machine Serial) and Host MAC address for OS deployment …

      This method is inefficient and error-prone for a large number of servers.

  • MTMS-based Discovery

    Step1: Automatically search all the servers and collect server MTMS information.

    Step2: Define the searched server to a Node Object automatically. In this case, the node name will be generated based on the MTMS string. The admin can rename the Node Object to a reasonable name like r1u1 (It means the physical location is in Rack1 and Unit1).

    Step3: Power on the nodes, xCAT discovery engine will update additional information like the MAC for deployment for the nodes.

    This method is recommended for the medium scale of cluster which has less than 100 nodes.

    • pros

      With limited effort to get the automatic discovery benefit.

    • cons

      Compared to Switch-based Discovery, the admin needs to be involved to rename the automatically discovered node to a reasonable name (optional). It’s hard to rename the node to a location-based name for a large number of server.

  • Switch-based Discovery

    Step1: Pre-define the Node Object for all the nodes in the cluster. The Pre-defined node must have the attributes switch and switchport defined to specify which Switch and Port this server connected to. xCAT will use this Switch and Port information to map a discovered node to certain Pre-defined node.

    Step2: Power on the nodes, xCAT discovery engine will discover node attributes and update them to certain Pre-defined node.

    • pros

      The whole discovery process is totally automatic.

      Since the node is physically identified by the Switch and Port that the server connected, if a node fail and replaced with a new one, xCAT will automatically discover the new one and assign it to the original node name since the Switch and Port does not change.

    • cons

      You need to plan the cluster with planned Switch and Port mapping for each server and switch. All the Switches need be configured with snmpv3 accessible for xCAT management node.

  • Sequential-based Discovery

    Step1: Pre-define the Node Object for all the nodes in the cluster.

    Step2: Manually power on the node one by one. The booted node will be discovered, each new discovered node will be assigned to one of the Pre-defined node in Sequential.

    • pros

      No special configuration required like Switch-based Discovery. No manual rename node step required like MTMS-based Discovery.

    • cons

      You have to strictly boot on the node in order if you want the node has the expected name. Generally you have to waiting for the discovery process finished before power on the next one.

Hardware Management
Basic Operations
rbeacon - Beacon Light

See rbeacon manpage for more information.

Most enterprise level servers have LEDs on their front and/or rear panels, one of which is a beacon light. If turned on, this light can assist the system administrator in locating one physical machine in the cluster.

Using xCAT, administrators can turn on and off the beacon light using: rbeacon <node> on|off

rpower - Remote Power Control

See rpower manpage for more information.

Use the rpower command to remotely power on and off a single server or a range of servers.

rpower <noderange> on
rpower <noderange> off

Other actions include:

  • To get the current power state of a server: rpower <noderange> state
  • To boot/reboot a server: rpower <noderange> boot
  • To hardware reset a server: rpower <noderange> reset
rcons - Remote Console

See rcons manpage for more information.

Most enterprise servers do not have video adapters installed with the machine and often do not provide a method for attaching a physical monitor/keyboard/mouse to get the display output. For this purpose xCAT can assist the system administrator to view the console over a “Serial-over-LAN” (SOL) connection through the BMC.

Configure the correct console management by modifying the node definition:

  • For OpenPOWER, IPMI managed server:

    chdef -t node -o <noderange> cons=ipmi
    
  • For OpenPOWER, OpenBMC managed servers:

    chdef -t node -o <noderange> cons=openbmc
    

Open a console to compute1:

rcons compute1

Note

The keystroke ctrl+e c . will disconnect you from the console.

Troubleshooting
General

xCAT has been integrated with 3 kinds of console server service, they are

rcons command relies on one of them. The conserver and goconserver packages should have been installed with xCAT as they are part of the xCAT dependency packages. If you want to try confluent, see confluent server.

For systemd based systems, goconserver is used by default. If you are having problems seeing the console, try the following.

  1. Make sure goconserver is configured by running makegocons.

  2. Check if goconserver is up and running

    systemctl status goconserver.service

  3. If goconserver is not running, start the service using:

    systemctl start goconserver.service

  4. Try makegocons -q [<node>] to verify if the node has been registered.

  5. Invoke the console again: rcons <node>

More details for goconserver, see goconserver documentation.

[Deprecated] If conserver is used, try the following.

  1. Make sure conserver is configured by running makeconservercf.

  2. Check if conserver is up and running

    [sysvinit] service conserver status
    [systemd] systemctl status conserver.service
    
  3. If conserver is not running, start the service using:

    [sysvinit] service conserver start
    [systemd] systemctl start conserver.service
    
  4. Invoke the console again: rcons <node>

Advanced Operations
rinv - Remote Hardware Inventory

See rinv manpage for more information.

Use rinv command to remotely obtain inventory information of a physical machine. This will help to distinguish one machine from another and aid in mapping the model type and/or serial number of a machine with its host name.

To get all the hardware information for node cn1:

rinv cn1 all

To get just the firmware information for cn1:

rinv cn1 firm
rvitals - Remote Hardware Vitals

See rvitals manpage for more information.

Collecting runtime information from a running physical machine is an important part of system administration. Data can be obtained from the service processor including temperature, voltage, cooling fans, etc.

Use the rvitals command to obtain this information.

rvitals <noderange> all

To only get the temperature information of machines in a particular noderange:

rvitals <noderange> temp
rflash - Remote Firmware Flashing

See rflash manpage for more information.

IPMI Firmware Update

The rflash command is provided to assist the system administrator in updating firmware.

To check the current firmware version on the node’s BMC and the HPM file:

rflash <noderange> -c /firmware/8335_810.1543.20151021b_update.hpm

To update the firmware on the node’s BMC to version in the HPM file:

rflash <noderange> /firmware/8335_810.1543.20151021b_update.hpm
OpenBMC Firmware Update
Manual Firmware Flash

The sequence of events that must happen to flash OpenBMC firmware is the following:

  1. Power off the Host
  2. Upload and Activate BMC
  3. Reboot the BMC (applies BMC)
  4. Upload and Activate Host
  5. Power on the Host (applies Host)
Power off Host

Use the rpower command to power off the host:

rpower <noderange> off
Upload and Activate BMC Firmware

Use the rflash command to upload and activate the Host firmware:

rflash <noderange> -a /path/to/obmc-phosphor-image-witherspoon.ubi.mtd.tar

If running rflash in Hierarchy, the firmware files must be accessible on the Service Nodes.

Note: If a .tar file is provided, the -a option does an upload and activate in one step. If an ID is provided, the -a option just does activate the specified firmware. After firmware is activated, use the rflash <noderange> -l to view. The rflash command shows (*) as the active firmware and (+) on the firmware that requires reboot to become effective.

Reboot the BMC

Use the rpower command to reboot the BMC:

rpower <noderange> bmcreboot

The BMC will take 2-5 minutes to reboot, check the status using: rpower <noderange> bmcstate and wait for BMCReady to be returned.

Known Issue: On reboot, the first call to the BMC after reboot, xCAT will return Error: BMC did not respond within 10 seconds, retry the command.. Please retry.

Upload and Activate Host Firmware

Use the rflash command to upload and activate the Host firmware:

rflash <noderange> -a /path/to/witherspoon.pnor.squashfs.tar

If running rflash in Hierarchy, the firmware files must be accessible on the Service Nodes.

Note: The -a option does an upload and activate in one step, after firmware is activated, use the rflash <noderange> -l to view. The rflash command shows (*) as the active firmware and (+) on the firmware that requires reboot to become effective.

Power on Host

User the rpower command to power on the Host:

rpower <noderange> on
Validation

Use one of the following commands to validate firmware levels are in sync:

  • Use the rinv command to validate firmware level:

    rinv <noderange> firm -V | grep -i ibm | grep "\*" | xcoll
    
  • Use the rflash command to validate the firmware level:

    rflash <noderange> -l | grep "\*" | xcoll
    
Unattended Firmware Flash

Unattended flash of OpenBMC firmware will do the following events:

  1. Upload both BMC firmware file and Host firmware file
  2. Activate both BMC firmware and Host firmware
  3. If BMC firmware becomes activate, reboot BMC to apply new BMC firmware, or else, rflash will exit
  4. If BMC itself state is NotReady, rflash will exit
  5. If BMC itself state is Ready, rflash will reboot the compute node to apply Host firmware

Use the following command to flash the firmware unattended:

rflash <noderange> -d /path/to/directory

If there are errors encountered during the flash process, take a look at the manual steps to continue flashing the BMC.

Validation

Use one of the following commands to validate firmware levels are in sync:

  • Use the rinv command to validate firmware level:

    rinv <noderange> firm -V | grep -i ibm | grep "\*" | xcoll
    
  • Use the rflash command to validate the firmware level:

    rflash <noderange> -l | grep "\*" | xcoll
    
rspconfig - Remote Configuration of Service Processors

See rspconfig manpage for more information.

The rspconfig command can be used to configure the service processor, or Baseboard Management Controller (BMC), of a physical machine.

For example, to turn on SNMP alerts for node cn5:

rspconfig cn5 alert=on
reventlog - Remote Event Log of Service Processors

See reventlog manpage for more information.

The reventlog command can be used to display and clear event log information on the service processor, or Baseboard Management Controller (BMC), of a physical machine. OpenBMC based servers need the IBM OpenBMC tool to obtain more detailed logging messages.

For example, to display all event log entries for node cn5:

reventlog cn5

To clear all event log entries for node cn5:

reventlog cn5 clear
Diskful Installation
Select or Create an osimage Definition

Before creating an image on xCAT, the distro media should be prepared. That can be ISOs or DVDs.

XCAT uses copycds command to create an image which will be available to install nodes. copycds will copy all contents of Distribution DVDs/ISOs or Service Pack DVDs/ISOs to a destination directory, and create several relevant osimage definitions by default.

If using an ISO, copy it to (or NFS mount it on) the management node, and then run:

copycds <path>/<specific-distro>.iso

Note

While sle15 contains installer medium and packages medium, need copycds copy all contents of DVD1 of the installer medium and DVD1 of the packages medium, for example:

copycds SLE-15-Installer-DVD-ppc64le-GM-DVD1.iso SLE-15-Packages-ppc64le-GM-DVD1.iso

If using a DVD, put it in the DVD drive of the management node and run:

copycds /dev/<dvd-drive-name>

To see the list of osimages:

lsdef -t osimage

To see the attributes of a particular osimage:

lsdef -t osimage <osimage-name>

Initially, some attributes of osimage are assigned default values by xCAT - they all can work correctly because the files or templates invoked by those attributes are shipped with xCAT by default. If you need to customize those attributes, refer to the next section Customize osimage

Below is an example of osimage definitions created by copycds:

# lsdef -t osimage
rhels7.2-ppc64le-install-compute  (osimage)
rhels7.2-ppc64le-install-service  (osimage)
rhels7.2-ppc64le-netboot-compute  (osimage)
rhels7.2-ppc64le-stateful-mgmtnode  (osimage)

In these osimage definitions shown above

  • <os>-<arch>-install-compute is the default osimage definition used for diskful installation
  • <os>-<arch>-netboot-compute is the default osimage definition used for diskless installation
  • <os>-<arch>-install-service is the default osimage definition used for service node deployment which shall be used in hierarchical environment

Note

Additional steps are needed for ubuntu ppc64le osimages:

For pre-16.04.02 version of Ubuntu for ppc64el, the initrd.gz shipped with the ISO does not support network booting. In order to install Ubuntu with xCAT, you need to follow the steps to complete the osimage definition.

[Tips 1]

If this is the same distro version as what your management node uses, create a .repo file in /etc/yum.repos.d with contents similar to:

[local-<os>-<arch>]
name=xCAT local <os> <version>
baseurl=file:/install/<os>/<arch>
enabled=1
gpgcheck=0

This way, if you need to install some additional RPMs into your MN later, you can simply install them with yum. Or if you are installing a software on your MN that depends some RPMs from this disto, those RPMs will be found and installed automatically.

[Tips 2]

You can create/modify an osimage definition easily with any existing osimage definition, the command is

mkdef -t osimage -o <new osimage> --template <existing osimage> [<attribute>=<value>, ...]

Except the specified attributes <attribute>, the attributes of <new osimage> will inherit the values of template osimage <existing osimage>.

As an example, the following command creates a new osimage myosimage.rh7.compute.netboot based on the existing osimage rhels7.4-ppc64le-netboot-compute with some customized attributes

mkdef -t osimage -o myosimage.rh7.compute.netboot --template rhels7.4-ppc64le-netboot-compute synclists=/tmp/synclist otherpkgdir=/install/custom/osimage/myosimage.rh7.compute.netboot/3rdpkgs/ otherpkglist=/install/custom/osimage/myosimage.rh7.compute.netboot/3rd.pkglist
Customize osimage (Optional)

Optional means all the subitems in this page are not necessary to finish an OS deployment. If you are new to xCAT, you can just jump to Initialize the Compute for Deployment.

Configure RAID before deploying the OS
Overview

xCAT provides an user interface linuximage.partitionfile to specify the customized partition script for diskful provision, and provides some default partition scripts.

Deploy Diskful Nodes with RAID1 Setup on RedHat

xCAT provides a partition script raid1_rh.sh which configures RAID1 across 2 disks on RHEL 7.x operating systems.

In most scenarios, the sample partitioning script is sufficient to create a basic RAID1 across two disks and is provided as a sample to build upon.

  1. Obtain the partition script:

    mkdir -p /install/custom/partition/
    wget https://raw.githubusercontent.com/xcat2/xcat-extensions/master/partition/raid1_rh.sh \
         -O /install/custom/partition/raid1_rh.sh
    
  2. Associate the partition script to the osimage:

    chdef -t osimage -o rhels7.3-ppc64le-install-compute \
          partitionfile="s:/install/custom/partition/raid1_rh.sh"
    
  3. Provision the node:

    rinstall cn1 osimage=rhels7.3-ppc64le-install-compute
    

After the diskful nodes are up and running, you can check the RAID1 settings with the following process:

mount command shows the /dev/mdx devices are mounted to various file systems, the /dev/mdx indicates that the RAID is being used on this node.

# mount
...
/dev/md1 on / type xfs (rw,relatime,attr2,inode64,noquota)
/dev/md0 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
/dev/md2 on /var type xfs (rw,relatime,attr2,inode64,noquota)

The file /proc/mdstat includes the RAID devices status on the system, here is an example of /proc/mdstat in the non-multipath environment:

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdk2[0] sdj2[1]
      1047552 blocks super 1.2 [2/2] [UU]
        resync=DELAYED
      bitmap: 1/1 pages [64KB], 65536KB chunk

md3 : active raid1 sdk3[0] sdj3[1]
      1047552 blocks super 1.2 [2/2] [UU]
        resync=DELAYED

md0 : active raid1 sdk5[0] sdj5[1]
      524224 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sdk6[0] sdj6[1]
      973998080 blocks super 1.2 [2/2] [UU]
      [==>..................]  resync = 12.8% (125356224/973998080) finish=138.1min speed=102389K/sec
      bitmap: 1/1 pages [64KB], 65536KB chunk

unused devices: <none>

On the system with multipath configuration, the /proc/mdstat looks like:

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 dm-11[0] dm-6[1]
      291703676 blocks super 1.1 [2/2] [UU]
      bitmap: 1/1 pages [64KB], 65536KB chunk

md1 : active raid1 dm-8[0] dm-3[1]
      1048568 blocks super 1.1 [2/2] [UU]

md0 : active raid1 dm-9[0] dm-4[1]
      204788 blocks super 1.0 [2/2] [UU]

unused devices: <none>

The command mdadm can query the detailed configuration for the RAID partitions:

mdadm --detail /dev/md2
Deploy Diskful Nodes with RAID1 Setup on SLES

xCAT provides one sample autoyast template files with the RAID1 settings /opt/xcat/share/xcat/install/sles/service.raid1.sles11.tmpl. You can customize the template file and put it under /install/custom/install/<platform>/ if the default one does not match your requirements.

Here is the RAID1 partitioning section in service.raid1.sles11.tmpl:

<partitioning config:type="list">
   <drive>
     <device>/dev/sda</device>
     <partitions config:type="list">
       <partition>
         <format config:type="boolean">false</format>
         <partition_id config:type="integer">65</partition_id>
         <partition_nr config:type="integer">1</partition_nr>
         <partition_type>primary</partition_type>
         <size>24M</size>
       </partition>
       <partition>
         <format config:type="boolean">false</format>
         <partition_id config:type="integer">253</partition_id>
         <partition_nr config:type="integer">2</partition_nr>
         <raid_name>/dev/md0</raid_name>
         <raid_type>raid</raid_type>
         <size>2G</size>
       </partition>
       <partition>
         <format config:type="boolean">false</format>
         <partition_id config:type="integer">253</partition_id>
         <partition_nr config:type="integer">3</partition_nr>
         <raid_name>/dev/md1</raid_name>
         <raid_type>raid</raid_type>
         <size>max</size>
       </partition>
     </partitions>
     <use>all</use>
   </drive>
   <drive>
     <device>/dev/sdb</device>
     <partitions config:type="list">
       <partition>
         <format config:type="boolean">false</format>
         <partition_id config:type="integer">131</partition_id>
         <partition_nr config:type="integer">1</partition_nr>
         <partition_type>primary</partition_type>
         <size>24M</size>
       </partition>
       <partition>
         <format config:type="boolean">false</format>
         <partition_id config:type="integer">253</partition_id>
         <partition_nr config:type="integer">2</partition_nr>
         <raid_name>/dev/md0</raid_name>
         <raid_type>raid</raid_type>
         <size>2G</size>
       </partition>
       <partition>
         <format config:type="boolean">false</format>
         <partition_id config:type="integer">253</partition_id>
         <partition_nr config:type="integer">3</partition_nr>
         <raid_name>/dev/md1</raid_name>
         <raid_type>raid</raid_type>
         <size>max</size>
       </partition>
     </partitions>
     <use>all</use>
   </drive>
  <drive>
    <device>/dev/md</device>
    <partitions config:type="list">
      <partition>
        <filesystem config:type="symbol">reiser</filesystem>
        <format config:type="boolean">true</format>
        <mount>swap</mount>
        <partition_id config:type="integer">131</partition_id>
        <partition_nr config:type="integer">0</partition_nr>
        <raid_options>
          <chunk_size>4</chunk_size>
          <parity_algorithm>left-asymmetric</parity_algorithm>
          <raid_type>raid1</raid_type>
        </raid_options>
      </partition>
      <partition>
        <filesystem config:type="symbol">reiser</filesystem>
        <format config:type="boolean">true</format>
        <mount>/</mount>
        <partition_id config:type="integer">131</partition_id>
        <partition_nr config:type="integer">1</partition_nr>
        <raid_options>
          <chunk_size>4</chunk_size>
          <parity_algorithm>left-asymmetric</parity_algorithm>
          <raid_type>raid1</raid_type>
        </raid_options>
      </partition>
    </partitions>
    <use>all</use>
  </drive>
</partitioning>

The samples above created one 24MB PReP partition on each disk, one 2GB mirrored swap partition and one mirrored / partition uses all the disk space. If you want to use different partitioning scheme in your cluster, modify this RAID1 section in the autoyast template file accordingly.

Since the PReP partition can not be mirrored between the two disks, some additional postinstall commands should be run to make the second disk bootable, here the commands needed to make the second disk bootable:

# Set the second disk to be bootable for RAID1 setup
parted -s /dev/sdb mkfs 1 fat32
parted /dev/sdb set 1 type 6
parted /dev/sdb set 1 boot on
dd if=/dev/sda1 of=/dev/sdb1
bootlist -m normal sda sdb

The procedure listed above has been added to the file /opt/xcat/share/xcat/install/scripts/post.sles11.raid1 to make it be automated. The autoyast template file service.raid1.sles11.tmpl will include the content of post.sles11.raid1, so no manual steps are needed here.

After the diskful nodes are up and running, you can check the RAID1 settings with the following commands:

Mount command shows the /dev/mdx devices are mounted to various file systems, the /dev/mdx indicates that the RAID is being used on this node.

server:~ # mount
/dev/md1 on / type reiserfs (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
devtmpfs on /dev type devtmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,mode=1777)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)

The file /proc/mdstat includes the RAID devices status on the system, here is an example of /proc/mdstat:

server:~ # cat /proc/mdstat
Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4]
md0 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      2104500 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 128KB chunk

md1 : active raid1 sda3[0] sdb3[1]
      18828108 blocks super 1.0 [2/2] [UU]
      bitmap: 0/9 pages [0KB], 64KB chunk

unused devices: <none>

The command mdadm can query the detailed configuration for the RAID partitions:

mdadm --detail /dev/md1
Disk Replacement Procedure

If any one disk fails in the RAID1 array, do not panic. Follow the procedure listed below to replace the failed disk.

Faulty disks should appear marked with an (F) if you look at /proc/mdstat:

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 dm-11[0](F) dm-6[1]
      291703676 blocks super 1.1 [2/1] [_U]
      bitmap: 1/1 pages [64KB], 65536KB chunk

md1 : active raid1 dm-8[0](F) dm-3[1]
      1048568 blocks super 1.1 [2/1] [_U]

md0 : active raid1 dm-9[0](F) dm-4[1]
      204788 blocks super 1.0 [2/1] [_U]

unused devices: <none>

We can see that the first disk is broken because all the RAID partitions on this disk are marked as (F).

Remove the failed disk from RAID array

mdadm is the command that can be used to query and manage the RAID arrays on Linux. To remove the failed disk from RAID array, use the command:

mdadm --manage /dev/mdx --remove /dev/xxx

Where the /dev/mdx are the RAID partitions listed in /proc/mdstat file, such as md0, md1 and md2; the /dev/xxx are the backend devices like dm-11, dm-8 and dm-9 in the multipath configuration and sda5, sda3 and sda2 in the non-multipath configuration.

Here is the example of removing failed disk from the RAID1 array in the non-multipath configuration:

mdadm --manage /dev/md0 --remove /dev/sda3
mdadm --manage /dev/md1 --remove /dev/sda2
mdadm --manage /dev/md2 --remove /dev/sda5

Here is the example of removing failed disk from the RAID1 array in the multipath configuration:

mdadm --manage /dev/md0 --remove /dev/dm-9
mdadm --manage /dev/md1 --remove /dev/dm-8
mdadm --manage /dev/md2 --remove /dev/dm-11

After the failed disk is removed from the RAID1 array, the partitions on the failed disk will be removed from /proc/mdstat and the mdadm --detail output also.

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 dm-6[1]
      291703676 blocks super 1.1 [2/1] [_U]
      bitmap: 1/1 pages [64KB], 65536KB chunk

md1 : active raid1 dm-3[1]
      1048568 blocks super 1.1 [2/1] [_U]

md0 : active raid1 dm-4[1]
      204788 blocks super 1.0 [2/1] [_U]

unused devices: <none>

# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.0
  Creation Time : Tue Jul 19 02:39:03 2011
     Raid Level : raid1
     Array Size : 204788 (200.02 MiB 209.70 MB)
  Used Dev Size : 204788 (200.02 MiB 209.70 MB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Wed Jul 20 02:00:04 2011
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : c250f17c01ap01:0  (local to host c250f17c01ap01)
           UUID : eba4d8ad:8f08f231:3c60e20f:1f929144
         Events : 26

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1     253        4        1      active sync   /dev/dm-4
Replace the disk

Depends on the hot swap capability, you may simply unplug the disk and replace with a new one if the hot swap is supported; otherwise, you will need to power off the machine and replace the disk and the power on the machine. Create partitions on the new disk

The first thing we must do now is to create the exact same partitioning as on the new disk. We can do this with one simple command:

sfdisk -d /dev/<good_disk> | sfdisk /dev/<new_disk>

For the non-mulipath configuration, here is an example:

sfdisk -d /dev/sdb | sfdisk /dev/sda

For the multipath configuration, here is an example:

sfdisk -d /dev/dm-1 | sfdisk /dev/dm-0

If you got error message “sfdisk: I don’t like these partitions - nothing changed.”, you can add --force option to the sfdisk command:

sfdisk -d /dev/sdb | sfdisk /dev/sda --force

You can run:

fdisk -l

To check if both hard drives have the same partitioning now.

Add the new disk into the RAID1 array

After the partitions are created on the new disk, you can use command:

mdadm --manage /dev/mdx --add /dev/xxx

To add the new disk to the RAID1 array. Where the /dev/mdx are the RAID partitions like md0, md1 and md2; the /dev/xxx are the backend devices like dm-11, dm-8 and dm-9 in the multipath configuration and sda5, sda3 and sda2 in the non-multipath configuration.

Here is an example for the non-multipath configuration:

mdadm --manage /dev/md0 --add /dev/sda3
mdadm --manage /dev/md1 --add /dev/sda2
mdadm --manage /dev/md2 --add /dev/sda5

Here is an example for the multipath configuration:

mdadm --manage /dev/md0 --add /dev/dm-9
mdadm --manage /dev/md1 --add /dev/dm-8
mdadm --manage /dev/md2 --add /dev/dm-11

All done! You can have a cup of coffee to watch the fully automatic reconstruction running…

While the RAID1 array is reconstructing, you will see some progress information in /proc/mdstat:

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 dm-11[0] dm-6[1]
      291703676 blocks super 1.1 [2/1] [_U]
      [>....................]  recovery =  0.7% (2103744/291703676) finish=86.2min speed=55960K/sec
      bitmap: 1/1 pages [64KB], 65536KB chunk

md1 : active raid1 dm-8[0] dm-3[1]
      1048568 blocks super 1.1 [2/1] [_U]
      [=============>.......]  recovery = 65.1% (683904/1048568) finish=0.1min speed=48850K/sec

md0 : active raid1 dm-9[0] dm-4[1]
      204788 blocks super 1.0 [2/1] [_U]
      [===================>.]  recovery = 96.5% (198016/204788) finish=0.0min speed=14144K/sec

unused devices: <none>

After the reconstruction is done, the /proc/mdstat becomes like:

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 dm-11[0] dm-6[1]
      291703676 blocks super 1.1 [2/2] [UU]
      bitmap: 1/1 pages [64KB], 65536KB chunk

md1 : active raid1 dm-8[0] dm-3[1]
      1048568 blocks super 1.1 [2/2] [UU]

md0 : active raid1 dm-9[0] dm-4[1]
      204788 blocks super 1.0 [2/2] [UU]

unused devices: <none>
Make the new disk bootable

If the new disk does not have a PReP partition or the PReP partition has some problem, it will not be bootable, here is an example on how to make the new disk bootable, you may need to substitute the device name with your own values.

  • [RHEL]:

    mkofboot .b /dev/sda
    bootlist -m normal sda sdb
    
  • [SLES]:

    parted -s /dev/sda mkfs 1 fat32
    parted /dev/sda set 1 type 6
    parted /dev/sda set 1 boot on
    dd if=/dev/sdb1 of=/dev/sda1
    bootlist -m normal sda sdb
    
Load Additional Drivers
Overview

During the installing or netbooting of a node, the drivers in the initrd will be used to drive the devices like network cards and IO devices to perform the installation/netbooting tasks. But sometimes the drivers for the new devices were not included in the default initrd shipped by Red Hat or Suse. A solution is to inject the new drivers into the initrd to drive the new device during the installation/netbooting process.

Generally there are two approaches to inject the new drivers: Driver Update Disk and Drive RPM package.

A “Driver Update Disk” is media which contains the drivers, firmware and related configuration files for certain devices. The driver update disk is always supplied by the vendor of the device. One driver update disk can contain multiple drivers for different OS releases and different hardware architectures. Red Hat and Suse have different driver update disk formats.

The ‘Driver RPM Package’ is the rpm package which includes the drivers and firmware for the specific devices. The Driver RPM is the rpm package which is shipped by the Vendor of the device for a new device or a new kernel version.

xCAT supports both. But for ‘Driver RPM Package’ is only supported in xCAT 2.8 and later.

No matter which approach chosen, there are two steps to make new drivers work. one is locate the new driver’s path, another is inject the new drivers into the initrd.

Locate the New Drivers
For Driver Update Disk

There are two approaches for xCAT to find the driver disk (pick one):

  1. Specify the location of the driver disk in the osimage object (This is ONLY supported in xCAT 2.8 and later)

The value for the ‘driverupdatesrc’ attribute is a comma separated driver disk list. The tag ‘dud’ must be specified before the full path of ‘driver update disk’ to specify the type of the file:

chdef -t osimage <osimagename> driverupdatesrc=dud:<full path of driver disk>
  1. Put the driver update disk in the directory <installroot>/driverdisk/<os>/<arch> (example: /install/driverdisk/sles11.1/x86_64).

    During the running of the genimage, geninitrd, or nodeset commands, xCAT will look for driver update disks in the directory <installroot>/driverdisk/<os>/<arch>.

For Driver RPM Packages

The Driver RPM packages must be specified in the osimage object.

Three attributes of osimage object can be used to specify the Driver RPM location and Driver names. If you want to load new drivers in the initrd, the ‘netdrivers’ attribute must be set. And one or both of the ‘driverupdatesrc’ and ‘osupdatename’ attributes must be set. If both of ‘driverupdatesrc’ and ‘osupdatename’ are set, the drivers in the ‘driverupdatesrc’ have higher priority.

  • netdrivers - comma separated driver names that need to be injected into the initrd. The postfix ‘.ko’ can be ignored.

The ‘netdrivers’ attribute must be set to specify the new driver list. If you want to load all the drivers from the driver rpms, use the keyword allupdate. Another keyword for the netdrivers attribute is updateonly, which means only the drivers located in the original initrd will be added to the newly built initrd from the driver rpms. This is useful to reduce the size of the new built initrd when the distro is updated, since there are many more drivers in the new kernel rpm than in the original initrd. Examples:

chdef -t osimage <osimagename> netdrivers=megaraid_sas.ko,igb.ko
chdef -t osimage <osimagename> netdrivers=allupdate
chdef -t osimage <osimagename> netdrivers=updateonly,igb.ko,new.ko
  • driverupdatesrc - comma separated driver rpm packages (full path should be specified)

A tag named ‘rpm’ can be specified before the full path of the rpm to specify the file type. The tag is optional since the default format is ‘rpm’ if no tag is specified. Example:

chdef -t osimage <osimagename> driverupdatesrc=rpm:<full path of driver disk1>,rpm:<full path of driver disk2>
  • osupdatename - comma separated ‘osdistroupdate’ objects. Each ‘osdistroupdate’ object specifies a Linux distro update.

When geninitrd is run, kernel-*.rpm will be searched in the osdistroupdate.dirpath to get all the rpm packages and then those rpms will be searched for drivers. Example:

mkdef -t osdistroupdate update1 dirpath=/install/<os>/<arch>
chdef -t osimage <osimagename> osupdatename=update1

If ‘osupdatename’ is specified, the kernel shipped with the ‘osupdatename’ will be used to load the newly built initrd, then only the drivers matching the new kernel will be kept in the newly built initrd. If trying to use the ‘osupdatename’, the ‘allupdate’ or ‘updateonly’ should be added in the ‘netdrivers’ attribute, or all the necessary driver names for the new kernel need to be added in the ‘netdrivers’ attribute. Otherwise the new drivers for the new kernel will be missed in newly built initrd. ..

Inject the Drivers into the initrd
For Driver Update Disk
  • If specifying the driver disk location in the osimage, there are two ways to inject drivers:

    1. Using nodeset command only:

      nodeset <noderange> osimage=<osimagename>
      
    2. Using geninitrd with nodeset command:

      geninitrd <osimagename>
      nodeset <noderange> osimage=<osimagename> --noupdateinitrd
      

Note

‘geninitrd’ + ‘nodeset –noupdateinitrd’ is useful when you need to run nodeset frequently for a diskful node. ‘geninitrd’ only needs be run once to rebuild the initrd and ‘nodeset –noupdateinitrd’ will not touch the initrd and kernel in /tftpboot/xcat/osimage/<osimage name>/.

  • If putting the driver disk in <installroot>/driverdisk/<os>/<arch>:

Running ‘nodeset <nodenrage>’ in anyway will load the driver disk

For Driver RPM Packages

There are two ways to inject drivers:

  1. Using nodeset command only:

    nodeset <noderange> osimage=<osimagename> [--ignorekernelchk]
    
  2. Using geninitrd with nodeset command:

    geninitrd <osimagename> [--ignorekernelchk]
    nodeset <noderange> osimage=<osimagename> --noupdateinitrd
    

Note

‘geninitrd’ + ‘nodeset –noupdateinitrd’ is useful when you need to run nodeset frequently for diskful nodes. ‘geninitrd’ only needs to be run once to rebuild the initrd and ‘nodeset –noupdateinitrd’ will not touch the initrd and kernel in /tftpboot/xcat/osimage/<osimage name>/.

The option ‘–ignorekernelchk’ is used to skip the kernel version checking when injecting drivers from osimage.driverupdatesrc. To use this flag, you should make sure the drivers in the driver rpms are usable for the target kernel. ..

Notes
  • If the drivers from the driver disk or driver rpm are not already part of the installed or booted system, it’s necessary to add the rpm packages for the drivers to the .pkglist or .otherpkglist of the osimage object to install them in the system.
  • If a driver rpm needs to be loaded, the osimage object must be used for the ‘nodeset’ and ‘genimage’ command, instead of the older style profile approach.
  • Both a Driver disk and a Driver rpm can be loaded in one ‘nodeset’ or ‘genimage’ invocation.
Configure Disk Partition

By default, xCAT will attempt to determine the first physical disk and use a generic default partition scheme for the operating system. You may require a more customized disk partitioning scheme and can accomplish this in one of the following methods:

  • partition definition file
  • partition definition script

Note

partition definition file can be used for RedHat, SLES, and Ubuntu. However, disk configuration for Ubuntu is different from RedHat/SLES, there may be some special sections required for Ubuntu.

Warning

partition definition script has only been verified on RedHat and Ubuntu, use at your own risk for SLES.

Partition Definition File

The following steps are required for this method:

  1. Create a partition file
  2. Associate the partition file with an xCAT osimage

The nodeset command will then insert the contents of this partition file into the generated autoinst config file that will be used by the operation system installer.

Create Partition File

The partition file must follow the partitioning syntax of the respective installer

  • Redhat: Kickstart documentation
    • The file /root/anaconda-ks.cfg is a sample kickstart file created by RedHat installing during the installation process based on the options that you selected.
    • system-config-kickstart is a tool with graphical interface for creating kickstart files
  • SLES: Autoyast documentation
    • Use yast2 autoyast in GUI or CLI mode to customize the installation options and create autoyast file
    • Use yast2 clone_system to create autoyast configuration file /root/autoinst.xml to clone an existing system
  • Ubuntu: Preseed documentation
    • For detailed information see the files partman-auto-recipe.txt and partman-auto-raid-recipe.txt included in the debian-installer package. Both files are also available from the debian-installer source repository.

Note

Supported functionality may change between releases of the Operating System, always refer to the latest documentation provided by the operating system.

Here is partition definition file example for Ubuntu standard partition in ppc64le machines

ubuntu-boot ::
8 1 1 prep
        $primary{ } $bootable{ } method{ prep }
        .
500 10000 1000000000 ext4
        method{ format } format{ } use_filesystem{ } filesystem{ ext4 } mountpoint{ / }
        .
2048 512 300% linux-swap
        method{ swap } format{ }
        .
Associate Partition File with Osimage

If your custom partition file is located at: /install/custom/my-partitions, run the following command to associate the partition file with an osimage:

chdef -t osimage <osimagename> partitionfile=/install/custom/my-partitions

To generate the configuration, run the nodeset command:

nodeset <nodename> osimage=<osimagename>

Note

RedHat: Running nodeset will generate the /install/autoinst file for the node. It will replace the #XCAT_PARTITION_START# and #XCAT_PARTITION_END# directives with the contents of your custom partition file.

Note

SLES: Running nodeset will generate the /install/autoinst file for the node. It will replace the #XCAT-PARTITION-START# and #XCAT-PARTITION-END# directives with the contents of your custom partition file. Do not include <partitioning config:type="list"> and </partitioning> tags, they will be added by xCAT.

Note

Ubuntu: Running nodeset will generate the /install/autoinst file for the node. It will write the partition file to /tmp/partitionfile and replace the #XCA_PARTMAN_RECIPE_SCRIPT# directive in /install/autoinst/<node>.pre with the contents of your custom partition file.

Partitioning disk file(For Ubuntu only)

The disk file contains the name of the disks to partition in traditional, non-devfs format and delimited with space ” “, for example :

/dev/sda /dev/sdb

If not specified, the default value will be used.

Associate partition disk file with osimage

chdef -t osimage <osimagename> -p partitionfile='d:/install/custom/partitiondisk'
nodeset <nodename> osimage=<osimage>
  • the d: preceding the filename tells nodeset that this is a partition disk file.
  • For Ubuntu, when nodeset runs and generates the /install/autoinst file for a node, it will generate a script to write the content of the partition disk file to /tmp/install_disk, this context to run the script will replace the #XCA_PARTMAN_DISK_SCRIPT# directive in /install/autoinst/<node>.pre.
Additional preseed configuration file(For Ubuntu only)

To support other specific partition methods such as RAID or LVM in Ubuntu, some additional preseed configuration entries should be specified.

If using file way, c:<the absolute path of the additional preseed config file>, the additional preseed config file contains the additional preseed entries in d-i ... syntax. When nodeset, the #XCA_PARTMAN_ADDITIONAL_CFG# directive in /install/autoinst/<node> will be replaced with content of the config file. For example:

d-i partman-auto/method string raid
d-i partman-md/confirm boolean true

If not specified, the default value will be used. ..

Partition Definition Script

Create a shell script that will be run on the node during the install process to dynamically create the disk partitioning definition. This script will be run during the OS installer %pre script on RedHat or preseed/early_command on Unbuntu execution and must write the correct partitioning definition into the file /tmp/partitionfile on the node

Create Partition Script

The purpose of the partition script is to create the /tmp/partionfile that will be inserted into the kickstart/autoyast/preseed template, the script could include complex logic like select which disk to install and even configure RAID, etc

Note

the partition script feature is not thoroughly tested on SLES, there might be problems, use this feature on SLES at your own risk.

Here is an example of the partition script on RedHat and SLES, the partitioning script is /install/custom/my-partitions.sh:

instdisk="/dev/sda"

modprobe ext4 >& /dev/null
modprobe ext4dev >& /dev/null
if grep ext4dev /proc/filesystems > /dev/null; then
    FSTYPE=ext3
elif grep ext4 /proc/filesystems > /dev/null; then
    FSTYPE=ext4
else
    FSTYPE=ext3
fi
BOOTFSTYPE=ext4
EFIFSTYPE=vfat
if uname -r|grep ^3.*el7 > /dev/null; then
    FSTYPE=xfs
    BOOTFSTYPE=xfs
    EFIFSTYPE=efi
fi

if [ `uname -m` = "ppc64" ]; then
    echo 'part None --fstype "PPC PReP Boot" --ondisk '$instdisk' --size 8' >> /tmp/partitionfile
fi
if [ -d /sys/firmware/efi ]; then
    echo 'bootloader --driveorder='$instdisk >> /tmp/partitionfile
    echo 'part /boot/efi --size 50 --ondisk '$instdisk' --fstype $EFIFSTYPE' >> /tmp/partitionfile
else
    echo 'bootloader' >> /tmp/partitionfile
fi

echo "part /boot --size 512 --fstype $BOOTFSTYPE --ondisk $instdisk" >> /tmp/partitionfile
echo "part swap --recommended --ondisk $instdisk" >> /tmp/partitionfile
echo "part / --size 1 --grow --ondisk $instdisk --fstype $FSTYPE" >> /tmp/partitionfile

The following is an example of the partition script on Ubuntu, the partitioning script is /install/custom/my-partitions.sh:

if [ -d /sys/firmware/efi ]; then
        echo "ubuntu-efi ::" > /tmp/partitionfile
        echo "    512 512 1024 fat32" >> /tmp/partitionfile
        echo '    $iflabel{ gpt } $reusemethod{ } method{ efi } format{ }' >> /tmp/partitionfile
        echo "    ." >> /tmp/partitionfile
else
        echo "ubuntu-boot ::" > /tmp/partitionfile
        echo "100 50 100 ext4" >> /tmp/partitionfile
        echo '    $primary{ } $bootable{ } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } mountpoint{ /boot }' >> /tmp/partitionfile
        echo "    ." >> /tmp/partitionfile
fi
echo "500 10000 1000000000 ext4" >> /tmp/partitionfile
echo "    method{ format } format{ } use_filesystem{ } filesystem{ ext4 } mountpoint{ / }" >> /tmp/partitionfile
echo "    ." >> /tmp/partitionfile
echo "2048 512 300% linux-swap" >> /tmp/partitionfile
echo "    method{ swap } format{ }" >> /tmp/partitionfile
echo "    ." >> /tmp/partitionfile
Associate partition script with osimage

Run below commands to associate partition script with osimage:

chdef -t osimage <osimagename> partitionfile='s:/install/custom/my-partitions.sh'
nodeset <nodename> osimage=<osimage>
  • The s: preceding the filename tells nodeset that this is a script.
  • For RedHat, when nodeset runs and generates the /install/autoinst file for a node, it will add the execution of the contents of this script to the %pre section of that file. The nodeset command will then replace the #XCAT_PARTITION_START#...#XCAT_PARTITION_END# directives from the osimage template file with %include /tmp/partitionfile to dynamically include the tmp definition file your script created.
  • For Ubuntu, when nodeset runs and generates the /install/autoinst file for a node, it will replace the #XCA_PARTMAN_RECIPE_SCRIPT# directive and add the execution of the contents of this script to the /install/autoinst/<node>.pre, the /install/autoinst/<node>.pre script will be run in the preseed/early_command.
Partitioning disk script (For Ubuntu only)

The disk script contains a script to generate a partitioning disk file named /tmp/install_disk. for example:

rm /tmp/devs-with-boot 2>/dev/null || true;
for d in $(list-devices partition); do
    mkdir -p /tmp/mymount;
    rc=0;
    mount $d /tmp/mymount || rc=$?;
    if [[ $rc -eq 0 ]]; then
        [[ -d /tmp/mymount/boot ]] && echo $d >>/tmp/devs-with-boot;
        umount /tmp/mymount;
    fi
done;
if [[ -e /tmp/devs-with-boot ]]; then
    head -n1 /tmp/devs-with-boot | egrep  -o '\S+[^0-9]' > /tmp/install_disk;
    rm /tmp/devs-with-boot 2>/dev/null || true;
else
    DEV=`ls /dev/disk/by-path/* -l | egrep -o '/dev.*[s|h|v]d[^0-9]$' | sort -t : -k 1 -k 2 -k 3 -k 4 -k 5 -k 6 -k 7 -k 8 -g | head -n1 | egrep -o '[s|h|v]d.*$'`;
    if [[ "$DEV" == "" ]]; then DEV="sda"; fi;
    echo "/dev/$DEV" > /tmp/install_disk;
fi;

If not specified, the default value will be used.

Associate partition disk script with osimage

chdef -t osimage <osimagename> -p partitionfile='s:d:/install/custom/partitiondiskscript'
nodeset <nodename> osimage=<osimage>
  • the s: prefix tells nodeset that is a script, the s:d: preceding the filename tells nodeset that this is a script to generate the partition disk file.
  • For Ubuntu, when nodeset runs and generates the /install/autoinst file for a node, this context to run the script will replace the #XCA_PARTMAN_DISK_SCRIPT# directive in /install/autoinst/<node>.pre.
Additional preseed configuration script (For Ubuntu only)

To support other specific partition methods such as RAID or LVM in Ubuntu, some additional preseed configuration entries should be specified.

If using script way, ‘s:c:<the absolute path of the additional preseed config script>’, the additional preseed config script is a script to set the preseed values with “debconf-set”. When “nodeset”, the #XCA_PARTMAN_ADDITIONAL_CONFIG_SCRIPT# directive in /install/autoinst/<node>.pre will be replaced with the content of the script. For example:

debconf-set partman-auto/method string raid
debconf-set partman-md/confirm boolean true

If not specified, the default value will be used. ..

Prescripts and Postscripts
Using Prescript

The prescript table will allow you to run scripts before the install process. This can be helpful for performing advanced actions such as manipulating system services or configurations before beginning to install a node, or to prepare application servers for the addition of new nodes. Check the man page for more information.

man prescripts

The scripts will be run as root on the MASTER for the node. If there is a service node for the node, then the scripts will be run on the service node.

Identify the scripts to be run for each node by adding entries to the prescripts table:

tabedit prescripts
Or:
chdef -t node -o <noderange> prescripts-begin=<beginscripts> prescripts-end=<endscripts>
Or:
chdef -t group -o <nodegroup> prescripts-begin=<beginscripts> prescripts-end=<endscripts>

tabdump prescripts
#node,begin,end,comments,disable

begin or prescripts-begin - This attribute lists the scripts to be run at the beginning of the nodeset.
end or prescripts-end - This attribute lists the scripts to be run at the end of the nodeset.
Format for naming prescripts

The general format for the prescripts-begin or prescripts-end attribute is:

[action1:]s1,s2...[|action2:s3,s4,s5...]

where:

- action1 and action2 are the nodeset actions ( 'install', 'netboot',etc) specified in the command .

- s1 and s2 are the scripts to run for _action1_ in order.

- s3, s4, and s5 are the scripts to run for action2.

If actions are omitted, the scripts apply to all actions.

Examples:

  • myscript1,myscript2 - run scripts for all supported commands
  • install:myscript1,myscript2|netboot:myscript3 - Run scripts myscript1 and myscript2 for nodeset(install), runs myscript3 for nodeset(netboot).

All the scripts should be copied to /install/prescripts directory and made executable for root and world readable for mounting. If you have service nodes in your cluster with a local /install directory (i.e. /install is not mounted from the xCAT management node to the service nodes), you will need to synchronize your /install/prescripts directory to your service node anytime you create new scripts or make changes to existing scripts.

The following two environment variables will be passed to each script:

  • NODES - a comma separated list of node names on which to run the script
  • ACTION - current nodeset action.

By default, the script will be invoked once for all nodes. However, if #xCAT setting:MAX_INSTANCE=<number> is specified in the script, the script will be invoked for each node in parallel, but no more than number of instances specified in <number> will be invoked at a time.

Exit values for prescripts

If there is no error, a prescript should return with 0. If an error occurs, it should put the error message on the stdout and exit with 1 or any non zero values. The command (nodeset for example) that runs prescripts can be divided into 3 sections.

  1. run begin prescripts
  2. run other code
  3. run end prescripts

If one of the prescripts returns 1, the command will finish the rest of the prescripts in that section and then exit out with value 1. For example, a node has three begin prescripts s1,s2 and s3, three end prescripts s4,s5,s6. If s2 returns 1, the prescript s3 will be executed, but other code and the end prescripts will not be executed by the command.

If one of the prescripts returns 2 or greater, then the command will exit out immediately. This only applies to the scripts that do not have #xCAT setting:MAX_INSTANCE=<number>.

Using Postscript
Postscript Execution Order Summary
Diskful
Stage Scripts Execute Order
N/A postinstall Does not execute for diskfull install
Install/Create postscripts (execute before reboot) 1 postscripts.xcatdefaults
2 osimage
3 node
Boot/Reboot postbootscripts 4 postscripts.xcatdefaults
5 osimage
6 node

xCAT automatically runs a few postscripts and postbootscripts that are delivered with xCAT to set up the nodes. You can also add your own scripts to further customize the nodes.

Types of scripts

There are two types of scripts in the postscripts table ( postscripts and postbootscripts). The types are based on when in the install process they will be executed. Run the following for more information:

man postscripts
  • postscripts attribute - List of scripts that should be run on this node after diskful installation or diskless boot.

    • [RHEL]

    Postscripts will be run before the reboot.

    • [SLES]

    Postscripts will be run after the reboot but before the init.d process. For Linux diskless deployment, the postscripts will be run at the init.d time, and xCAT will automatically add the list of postscripts from the postbootscripts attribute to run after postscripts list.

  • postbootscripts attribute - list of postbootscripts that should be run on this Linux node at the init.d time after diskful installation reboot or diskless boot

  • xCAT, by default, for diskful installs only runs the postbootscripts on the install and not on reboot. In xCAT a site table attribute runbootscripts is available to change this default behavior. If set to yes, then the postbootscripts will be run on install and on reboot.

Note

xCAT automatically adds the postscripts from the xcatdefaults.postscripts attribute of the table to run first on the nodes after install or diskless boot.

Adding your own postscripts

To add your own script, place it in /install/postscripts on the management node. Make sure it is executable and world readable. Then add it to the postscripts table for the group of nodes you want it to be run on (or the all group if you want it run on all nodes).

To check what scripts will be run on your node during installation:

lsdef node1 | grep scripts
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles

You can pass parameters to the postscripts. For example:

script1 p1 p2,script2,....

p1 p2 are the parameters to script1.

Postscripts could be placed in the subdirectories in /install/postscripts on management node, and specify subdir/postscriptname in the postscripts table to run the postscripts in the subdirectories. This feature could be used to categorize the postscripts for different purposes. For example:

mkdir -p /install/postscripts/subdir1
mkdir -p /install/postscripts/subdir2
cp postscript1 /install/postscripts/subdir1/
cp postscript2 /install/postscripts/subdir2/
chdef node1 -p postscripts=subdir1/postscript1,subdir2/postscript2
updatenode node1 -P

If some of your postscripts will affect the network communication between the management node and compute node, like restarting network or configuring bond, the postscripts execution might not be able to be finished successfully because of the network connection problems. Even if we put this postscript be the last postscript in the list, xCAT still may not be able to update the node status to be booted. The recommendation is to use the Linux at mechanism to schedule this network-killing postscript to be run at a later time. For example:

The user needs to add a postscript to customize the nics bonding setup, the nics bonding setup will break the network between the management node and compute node. User could use at to run this nic bonding postscripts after all the postscripts processes have been finished.

Write a script, /install/postscripts/nicbondscript, the nicbondscript simply calls the confignicsbond using at:

[root@xcatmn ~]#cat /install/postscripts/nicbondscript
#!/bin/bash
at -f ./confignicsbond now + 1 minute
[root@xcatmn ~]#

Then

chdef <nodename> -p postbootscripts=nicbondscript
PostScript/PostbootScript execution

When your script is executed on the node, all the attributes in the site table are exported as variables for your scripts to use. You can add extra attributes for yourself. See the sample mypostscript file below.

To run the postscripts, a script is built, so the above exported variables can be input. You can usually find that script in /xcatpost on the node and in the Linux case it is call mypostscript. A good way to debug problems is to go to the node and just run mypostscript and see errors. You can also check the syslog on the Management Node for errors.

When writing you postscripts, it is good to follow the example of the current postscripts and write errors to syslog and in shell. See Suggestions for writing scripts.

All attributes in the site table are exported and available to the postscript/postbootscript during execution. See the mypostscript file, which is generated and executed on the nodes to run the postscripts.

Example of mypostscript

#subroutine used to run postscripts
run_ps () {
logdir="/var/log/xcat"
mkdir -p $logdir
logfile="/var/log/xcat/xcat.log"
if [_-f_$1_]; then
 echo "Running postscript: $@" | tee -a $logfile
 ./$@ 2>&1 | tee -a $logfile
else
 echo "Postscript $1 does NOT exist." | tee -a $logfile
fi
}
# subroutine end
AUDITSKIPCMDS='tabdump,nodels'
export AUDITSKIPCMDS
TEST='test'
export TEST
NAMESERVERS='7.114.8.1'
export NAMESERVERS
NTPSERVERS='7.113.47.250'
export NTPSERVERS
INSTALLLOC='/install'
export INSTALLLOC
DEFSERIALPORT='0'
export DEFSERIALPORT
DEFSERIALSPEED='19200'
export DEFSERIALSPEED
DHCPINTERFACES="'xcat20RRmn|eth0;rra000-m|eth1'"
export DHCPINTERFACES
FORWARDERS='7.113.8.1,7.114.8.2'
export FORWARDERS
NAMESERVER='7.113.8.1,7.114.47.250'
export NAMESERVER
DB='postg'
export DB
BLADEMAXP='64'
export BLADEMAXP
FSPTIMEOUT='0'
export FSPTIMEOUT
INSTALLDIR='/install'
export INSTALLDIR
IPMIMAXP='64'
export IPMIMAXP
IPMIRETRIES='3'
export IPMIRETRIES
IPMITIMEOUT='2'
export IPMITIMEOUT
CONSOLEONDEMAND='no'
export CONSOLEONDEMAND
SITEMASTER=7.113.47.250
export SITEMASTER
MASTER=7.113.47.250
export MASTER
MAXSSH='8'
export MAXSSH
PPCMAXP='64'
export PPCMAXP
PPCRETRY='3'
export PPCRETRY
PPCTIMEOUT='0'
export PPCTIMEOUT
SHAREDTFTP='1'
export SHAREDTFTP
SNSYNCFILEDIR='/var/xcat/syncfiles'
export SNSYNCFILEDIR
TFTPDIR='/tftpboot'
export TFTPDIR
XCATDPORT='3001'
export XCATDPORT
XCATIPORT='3002'
export XCATIPORT
XCATCONFDIR='/etc/xcat'
export XCATCONFDIR
TIMEZONE='America/New_York'
export TIMEZONE
USENMAPFROMMN='no'
export USENMAPFROMMN
DOMAIN='cluster.net'
export DOMAIN
USESSHONAIX='no'
export USESSHONAIX
NODE=rra000-m
export NODE
NFSSERVER=7.113.47.250
export NFSSERVER
INSTALLNIC=eth0
export INSTALLNIC
PRIMARYNIC=eth1
OSVER=fedora9
export OSVER
ARCH=x86_64
export ARCH
PROFILE=service
export PROFILE
PATH=`dirname $0`:$PATH
export PATH
NODESETSTATE='netboot'
export NODESETSTATE
UPDATENODE=1
export UPDATENODE
NTYPE=service
export NTYPE
MACADDRESS='00:14:5E:5B:51:FA'
export MACADDRESS
MONSERVER=7.113.47.250
export MONSERVER
MONMASTER=7.113.47.250
export MONMASTER
OSPKGS=bash,openssl,dhclient,kernel,openssh-server,openssh-clients,busybox-anaconda,vim-
minimal,rpm,bind,bind-utils,ksh,nfs-utils,dhcp,bzip2,rootfiles,vixie-cron,wget,vsftpd,ntp,rsync
OTHERPKGS1=xCATsn,xCAT-rmc,rsct/rsct.core,rsct/rsct.core.utils,rsct/src,yaboot-xcat
export OTHERPKGS1
OTHERPKGS_INDEX=1
export OTHERPKGS_INDEX
export NOSYNCFILES
# postscripts-start-here\n
run_ps ospkgs
run_ps script1 p1 p2
run_ps script2
# postscripts-end-here\n

The mypostscript file is generated according to the mypostscript.tmpl file.

Using the mypostscript template
Using the mypostscript template

xCAT provides a way for the admin to customize the information that will be provided to the postscripts/postbootscripts when they run on the node. This is done by editing the mypostscript.tmpl file. The attributes that are provided in the shipped mypostscript.tmpl file should not be removed. They are needed by the default xCAT postscripts.

The mypostscript.tmpl, is shipped in the /opt/xcat/share/xcat/mypostscript directory.

If the admin customizes the mypostscript.tmpl, they should copy the mypostscript.tmpl to /install/postscripts/mypostscript.tmpl, and then edit it. The mypostscript for each node will be named mypostscript.<nodename>. The generated mypostscript.<nodename>. will be put in the /tftpboot/mypostscripts directory.

site table precreatemypostscripts attribute

If the site table precreatemypostscripts attribute is set to 1 or yes, it will instruct xCAT at nodeset and updatenode time to query the db once for all of the nodes passed into the command and create the mypostscript file for each node and put them in a directory in $TFTPDIR (for example /tftpboot). The created mypostscript.<nodename>. file in the /tftpboot/mypostscripts directory will not be regenerated unless another nodeset or updatenode command is run to that node. This should be used when the system definition has stabilized. It saves time on the updatenode or reboot by not regenerating the mypostscript file.

If the precreatemyposcripts attribute is yes, and a database change is made or xCAT code is upgraded, then you should run a new nodeset or updatenode to regenerate the /tftpboot/mypostscript/mypostscript.<nodename> file to pick up the latest database setting. The default for precreatemypostscripts is no/0.

When you run nodeset or updatenode, it will search the /install/postscripts/mypostscript.tmpl first. If the /install/postscripts/mypostscript.tmpl exists, it will use that template to generate the mypostscript for each node. Otherwise, it will use /opt/xcat/share/xcat/mypostscript/mypostscript.tmpl.

Content of the template for mypostscript

Note

The attributes that are defined in the shipped mypostscript.tmpl file should not be removed. The xCAT default postscripts rely on that information to run successfully.

The following will explain the entries in the mypostscript.tmpl file.

The SITE_TABLE_ALL_ATTRIBS_EXPORT line in the file directs the code to export all attributes defined in the site table. The attributes are not always defined exactly as in the site table to avoid conflict with other table attributes of the same name. For example, the site table master attribute is named SITEMASTER in the generated mypostscript file.

#SITE_TABLE_ALL_ATTRIBS_EXPORT#

The following line exports ENABLESSHBETWEENNODES by running the internal xCAT routine (enablesshbetweennodes).

ENABLESSHBETWEENNODES=#Subroutine:xCAT::Template::enablesshbetweennodes:$NODE#
export ENABLESSHBETWEENNODES

tabdump(<TABLENAME>) is used to get all the information in the <TABLENAME> table

tabdump(networks)

These line export the node name based on its definition in the database.

NODE=$NODE
export NODE

These lines get a comma separated list of the groups to which the node belongs.

GROUP=#TABLE:nodelist:$NODE:groups#
export GROUP

These lines reads the nodesres table, the given attributes (nfsserver, installnic, primarynic, xcatmaster, routenames) for the node ($NODE), and exports it.

NFSSERVER=#TABLE:noderes:$NODE:nfsserver#
export NFSSERVER
INSTALLNIC=#TABLE:noderes:$NODE:installnic#
export INSTALLNIC
PRIMARYNIC=#TABLE:noderes:$NODE:primarynic#
export PRIMARYNIC
MASTER=#TABLE:noderes:$NODE:xcatmaster#
export MASTER
NODEROUTENAMES=#TABLE:noderes:$NODE:routenames#
export NODEROUTENAMES

The following entry exports multiple variables from the routes table. Not always set.

#ROUTES_VARS_EXPORT#

The following lines export nodetype table attributes.

OSVER=#TABLE:nodetype:$NODE:os#
export OSVER
ARCH=#TABLE:nodetype:$NODE:arch#
export ARCH
PROFILE=#TABLE:nodetype:$NODE:profile#
export PROFILE
PROVMETHOD=#TABLE:nodetype:$NODE:provmethod#
export PROVMETHOD

The following adds the current directory to the path for the postscripts.

PATH=`dirname $0`:$PATH
export PATH

The following sets the NODESETSTATE by running the internal xCAT getnodesetstate script.

NODESETSTATE=#Subroutine:xCAT::Postage::getnodesetstate:$NODE#
export NODESETSTATE

The following says the postscripts are not being run as a result of updatenode. (This is changed =1, when updatenode runs).

UPDATENODE=0
export UPDATENODE

The following sets the NTYPE to compute, service or MN.

NTYPE=$NTYPE
export NTYPE

The following sets the mac address.

MACADDRESS=#TABLE:mac:$NODE:mac#
export MACADDRESS

If vlan is setup, then the #VLAN_VARS_EXPORT# line will provide the following exports:

VMNODE='YES'
export VMNODE
VLANID=vlan1...
export VLANID
VLANHOSTNAME=..
  ..
#VLAN_VARS_EXPORT#

If monitoring is setup, then the #MONITORING_VARS_EXPORT# line will provide:

MONSERVER=11.10.34.108
export MONSERVER
MONMASTER=11.10.34.108
export MONMASTER
#MONITORING_VARS_EXPORT#

The #OSIMAGE_VARS_EXPORT# line will provide, for example:

OSPKGDIR=/install/<os>/<arch>
export OSPKGDIR
OSPKGS='bash,nfs-utils,openssl,dhclient,kernel,openssh-server,openssh-clients,busybox,wget,rsyslog,dash,vim-minimal,ntp,rsyslog,rpm,rsync,
  ppc64-utils,iputils,dracut,dracut-network,e2fsprogs,bc,lsvpd,irqbalance,procps,yum'
export OSPKGS

#OSIMAGE_VARS_EXPORT#

THE #NETWORK_FOR_DISKLESS_EXPORT# line will provide diskless networks information, if defined.

NETMASK=255.255.255.0
export NETMASK
GATEWAY=8.112.34.108
export GATEWAY
..
#NETWORK_FOR_DISKLESS_EXPORT#

Note

The #INCLUDE_POSTSCRIPTS_LIST# and the #INCLUDE_POSTBOOTSCRIPTS_LIST# sections in /tftpboot/mypostscript(mypostbootscripts) on the Management Node will contain all the postscripts and postbootscripts defined for the node. When running an updatenode command for only some of the scripts , you will see in the /xcatpost/mypostscript file on the node, the list has been redefined during the execution of updatenode to only run the requested scripts. For example, if you run updatenode <nodename> -P syslog.

The #INCLUDE_POSTSCRIPTS_LIST# flag provides a list of postscripts defined for this $NODE.

#INCLUDE_POSTSCRIPTS_LIST#

For example, you will see in the generated file the following stanzas:

# postscripts-start-here
# defaults-postscripts-start-here
syslog
remoteshell
# defaults-postscripts-end-here
# node-postscripts-start-here
syncfiles
# node-postscripts-end-here

The #INCLUDE_POSTBOOTSCRIPTS_LIST# provides a list of postbootscripts defined for this $NODE.

#INCLUDE_POSTBOOTSCRIPTS_LIST#

For example, you will see in the generated file the following stanzas:

# postbootscripts-start-here
# defaults-postbootscripts-start-here
otherpkgs
# defaults-postbootscripts-end-here
# node-postbootscripts-end-here
# postbootscripts-end-here
Kinds of variables in the template

Type 1: For the simple variable, the syntax is as follows. The mypostscript.tmpl has several examples of this. $NODE is filled in by the code. UPDATENODE is changed to 1, when the postscripts are run by updatenode. $NTYPE is filled in as either compute, service or MN.

NODE=$NODE
export NODE
UPDATENODE=0
export UPDATENODE
NTYPE=$NTYPE
export NTYPE

Type 2: This is the syntax to get the value of one attribute from the <tablename> and its key is $NODE. It does not support tables with two keys. Some of the tables with two keys are litefile, prodkey, deps, monsetting, mpa, networks. It does not support tables with keys other than $NODE. Some of the tables that do not use $NODE as the key, are passwd, rack, token

VARNAME=#TABLE:tablename:$NODE:attribute#

For example, to get the new updatestatus attribute from the nodelist table:

UPDATESTATUS=#TABLE:nodelist:$NODE:updatestatus#
export UPDATESTATUS

Type 3: The syntax is as follows:

VARNAME=#Subroutine:modulename::subroutinename:$NODE#
or
VARNAME=#Subroutine:modulename::subroutinename#

Examples in the mypostscript.tmpl are the following:

NODESETSTATE=#Subroutine:xCAT::Postage::getnodesetstate:$NODE#
export NODESETSTATE
ENABLESSHBETWEENNODES=#Subroutine:xCAT::Template::enablesshbetweennodes:$NODE#
export ENABLESSHBETWEENNODES

Note

Type 3 is not an open interface to add extensions to the template.

Type 4: The syntax is #FLAG#. When parsing the template, the code generates all entries defined by #FLAG#, if they are defined in the database. For example: To export all values of all attributes from the site table. The tag is

#SITE_TABLE_ALL_ATTRIBS_EXPORT#

For the #SITE_TABLE_ALL_ATTRIBS_EXPORT# flag, the related subroutine will get the attributes’ values and deal with the special case. such as : the site.master should be exported as "SITEMASTER". And if the noderes.xcatmaster exists, the noderes.xcatmaster should be exported as "MASTER", otherwise, we also should export site.master as the "MASTER".

Other examples are:

#VLAN_VARS_EXPORT#  - gets all vlan related items
#MONITORING_VARS_EXPORT#  - gets all monitoring configuration and setup da ta
#OSIMAGE_VARS_EXPORT# - get osimage related variables, such as ospkgdir, ospkgs ...
#NETWORK_FOR_DISKLESS_EXPORT# - gets diskless network information
#INCLUDE_POSTSCRIPTS_LIST# - includes the list of all postscripts for the node
#INCLUDE_POSTBOOTSCRIPTS_LIST# - includes the list of all postbootscripts for the node

Note

Type4 is not an open interface to add extensions to the template.

Type 5: Get all the data from the specified table. The <TABLENAME> should not be a node table, like nodelist. This should be handles with TYPE 2 syntax to get specific attributes for the $NODE. tabdump would result in too much data for a nodetype table. Also the auditlog, eventlog should not be in tabdump for the same reason. site table should not be specified, it is already provided with the #SITE_TABLE_ALL_ATTRIBS_EXPORT# flag. It can be used to get the data from the two key tables (like switch). The syntax is:

tabdump(<TABLENAME>)
Edit mypostscript.tmpl

Add new attributes into mypostscript.tmpl

When you add new attributes into the template, you should edit the /install/postscripts/mypostscript.tmpl which you created by copying /opt/xcat/share/xcat/mypostscript/mypostscript.tmpl. Make all additions before the # postscripts-start-here section. xCAT will first look in /install/postscripts/mypostscript.tmpl for a file and then, if not found, will use the one in /opt/xcat/share/xcat/mypostcript/mypostscript.tmpl.

For example:

UPDATESTATUS=#TABLE:nodelist:$NODE:updatestatus#
export UPDATESTATUS
...
# postscripts-start-here
#INCLUDE_POSTSCRIPTS_LIST#
## The following flag postscripts-end-here must not be deleted.
# postscripts-end-here

Note

If you have a hierarchical cluster, you must copy your new mypostscript.tmpl to /install/postscripts/mypostscript.tmpl on the service nodes, unless /install/postscripts directory is mounted from the MN to the service node.

Remove attribute from mypostscript.tmpl

If you want to remove an attribute that you have added, you should remove all the related lines or comment them out with ##. For example, comment out the added lines.

##UPDATESTATUS=#TABLE:nodelist:$NODE:updatestatus#
##export UPDATESTATUS
Test the new template

There are two quick ways to test the template.

  1. If the node is up

    updatenode <nodename> -P syslog
    

Check your generated mypostscript on the compute node:

vi /xcatpost/mypostscript
  1. Set the precreatemypostscripts option

    chdef -t site -o clustersite precreatemypostscripts=1
    

Then run

nodeset <nodename> ....

Check your generated mypostscript

vi /tftpboot/mypostscripts/mypostscript.<nodename>
Sample /xcatpost/mypostscript

This is an example of the generated postscript for a servicenode install. It is found in /xcatpost/mypostscript on the node.

# global value to store the running status of the postbootscripts,the value
#is non-zero if one postbootscript failed
return_value=0
# subroutine used to run postscripts
run_ps () {
 local ret_local=0
 logdir="/var/log/xcat"
 mkdir -p $logdir
 logfile="/var/log/xcat/xcat.log"
 if [ -f $1 ]; then
  echo "`date` Running postscript: $@" | tee -a $logfile
  #./$@ 2>&1 1> /tmp/tmp4xcatlog
  #cat /tmp/tmp4xcatlog | tee -a $logfile
  ./$@ 2>&1 | tee -a $logfile
  ret_local=${PIPESTATUS[0]}
  if [ "$ret_local" -ne "0" ]; then
    return_value=$ret_local
  fi
  echo "Postscript: $@ exited with code $ret_local"
 else
  echo "`date` Postscript $1 does NOT exist." | tee -a $logfile
  return_value=-1
 fi
 return 0
}
# subroutine end
SHAREDTFTP='1'
export SHAREDTFTP
TFTPDIR='/tftpboot'
export TFTPDIR
CONSOLEONDEMAND='yes'
export CONSOLEONDEMAND
PPCTIMEOUT='300'
export PPCTIMEOUT
VSFTP='y'
export VSFTP
DOMAIN='cluster.com'
export DOMAIN
XCATIPORT='3002'
export XCATIPORT
DHCPINTERFACES="'xcatmn2|eth1;service|eth1'"
export DHCPINTERFACES
MAXSSH='10'
export MAXSSH
SITEMASTER=10.2.0.100
export SITEMASTER
TIMEZONE='America/New_York'
export TIMEZONE
INSTALLDIR='/install'
export INSTALLDIR
NTPSERVERS='xcatmn2'
export NTPSERVERS
EA_PRIMARY_HMC='c76v2hmc01'
export EA_PRIMARY_HMC
NAMESERVERS='10.2.0.100'
export NAMESERVERS
SNSYNCFILEDIR='/var/xcat/syncfiles'
export SNSYNCFILEDIR
DISJOINTDHCPS='0'
export DISJOINTDHCPS
FORWARDERS='8.112.8.1,8.112.8.2'
export FORWARDERS
VLANNETS='|(\d+)|10.10.($1+0).0|'
export VLANNETS
XCATDPORT='3001'
export XCATDPORT
USENMAPFROMMN='no'
export USENMAPFROMMN
DNSHANDLER='ddns'
export DNSHANDLER
ROUTENAMES='r1,r2'
export ROUTENAMES
INSTALLLOC='/install'
export INSTALLLOC
ENABLESSHBETWEENNODES=YES
export ENABLESSHBETWEENNODES
NETWORKS_LINES=4
 export NETWORKS_LINES
NETWORKS_LINE1='netname=public_net||net=8.112.154.64||mask=255.255.255.192||mgtifname=eth0||gateway=8.112.154.126||dhcpserver=||tftpserver=8.112.154.69||nameservers=8.112.8.1||ntpservers=||logservers=||dynamicrange=||staticrange=||staticrangeincrement=||nodehostname=||ddnsdomain=||vlanid=||domain=||mtu=||disable=||comments='
export NETWORKS_LINE2
NETWORKS_LINE3='netname=sn21_net||net=10.2.1.0||mask=255.255.255.0||mgtifname=eth1||gateway=<xcatmaster>||dhcpserver=||tftpserver=||nameservers=10.2.1.100,10.2.1.101||ntpservers=||logservers=||dynamicrange=||staticrange=||staticrangeincrement=||nodehostname=||ddnsdomain=||vlanid=||domain=||mtu=||disable=||comments='
export NETWORKS_LINE3
NETWORKS_LINE4='netname=sn22_net||net=10.2.2.0||mask=255.255.255.0||mgtifname=eth1||gateway=10.2.2.100||dhcpserver=10.2.2.100||tftpserver=10.2.2.100||nameservers=10.2.2.100||ntpservers=||logservers=||dynamicrange=10.2.2.120-10.2.2.250||staticrange=||staticrangeincrement=||nodehostname=||ddnsdomain=||vlanid=||domain=||mtu=||disable=||comments='
export NETWORKS_LINE4
NODE=xcatsn23
export NODE
NFSSERVER=10.2.0.100
export NFSSERVER
INSTALLNIC=eth0
export INSTALLNIC
PRIMARYNIC=eth0
export PRIMARYNIC
MASTER=10.2.0.100
export MASTER
OSVER=sles11
export OSVER
ARCH=ppc64
export ARCH
PROFILE=service-xcattest
export PROFILE
PROVMETHOD=netboot
export PROVMETHOD
PATH=`dirname $0`:$PATH
export PATH
NODESETSTATE=netboot
export NODESETSTATE
UPDATENODE=1
export UPDATENODE
NTYPE=service
export NTYPE
MACADDRESS=16:3d:05:fa:4a:02
export MACADDRESS
NODEID=EA163d05fa4a02EA
export NODEID
MONSERVER=8.112.154.69
export MONSERVER
MONMASTER=10.2.0.100
export MONMASTER
MS_NODEID=0360238fe61815e6
export MS_NODEID
OSPKGS='kernel-ppc64,udev,sysconfig,aaa_base,klogd,device-mapper,bash,openssl,nfs- utils,ksh,syslog-ng,openssh,openssh-askpass,busybox,vim,rpm,bind,bind-utils,dhcp,dhcpcd,dhcp-server,dhcp-client,dhcp-relay,bzip2,cron,wget,vsftpd,util-linux,module-init-tools,mkinitrd,apache2,apache2-prefork,perl-Bootloader,psmisc,procps,dbus-1,hal,timezone,rsync,powerpc-utils,bc,iputils,uuid-runtime,unixODBC,gcc,zypper,tar'
export OSPKGS
OTHERPKGS1='xcat/xcat-core/xCAT-rmc,xcat/xcat-core/xCATsn,xcat/xcat-dep/sles11/ppc64/conserver,perl-DBD-mysql,nagios/nagios-nsca-client,nagios/nagios,nagios/nagios-plugins-nrpe,nagios/nagios-nrpe'
export OTHERPKGS1
OTHERPKGS_INDEX=1
export OTHERPKGS_INDEX
## get the diskless networks information. There may be no information.
NETMASK=255.255.255.0
export NETMASK
GATEWAY=10.2.0.100
export GATEWAY
# NIC related attributes for the node for confignetwork postscript
NICIPS=""
export NICIPS
NICHOSTNAMESUFFIXES=""
export NICHOSTNAMESUFFIXES
NICTYPES=""
export NICTYPES
NICCUSTOMSCRIPTS=""
export NICCUSTOMSCRIPTS
NICNETWORKS=""
export NICNETWORKS
NICCOMMENTS=
export NICCOMMENTS
# postscripts-start-here
# defaults-postscripts-start-here
run_ps test1
run_ps syslog
run_ps remoteshell
run_ps syncfiles
run_ps confNagios
run_ps configrmcnode
# defaults-postscripts-end-here
# node-postscripts-start-here
run_ps servicenode
run_ps configeth_new
# node-postscripts-end-here
run_ps setbootfromnet
# postscripts-end-here
# postbootscripts-start-here
# defaults-postbootscripts-start-here
run_ps otherpkgs
# defaults-postbootscripts-end-here
# node-postbootscripts-start-here
run_ps test
# The following line node-postbootscripts-end-here must not be deleted.
# node-postbootscripts-end-here
# postbootscripts-end-here
exit $return_value
Suggestions
For writing scripts
  • Some compute node profiles exclude perl to keep the image as small as possible. If this is your case, your postscripts should obviously be written in another shell language, e.g. bash,ksh.
  • If a postscript is specific for an os, name your postscript mypostscript.osname.
  • Add logger statements to send errors back to the Management Node. By default, xCAT configures the syslog service on compute nodes to forward all syslog messages to the Management Node. This will help debug.
Using Hierarchical Clusters

If you are running a hierarchical cluster, one with Service Nodes. If your /install/postscripts directory is not mounted on the Service Node. You are going to need to sync or copy the postscripts that you added or changed in the /install/postscripts on the MN to the SN, before running them on the compute nodes. To do this easily, use the xdcp command and just copy the entire /install/postscripts directory to the servicenodes ( usually in /xcatpost ).

xdcp service -R /install/postscripts/* /xcatpost
or
prsync /install/postscripts service:/xcatpost

If your /install/postscripts is not mounted on the Service Node, you should also:

xdcp service -R /install/postscripts/* /install
or
prsync /install/postscripts service:/install
Synchronizing Files
Overview

Synchronizing (sync) files to the nodes is a feature of xCAT used to distribute specific files from the management node to the newly-deploying or deployed nodes.

This function is supported for diskful or RAMdisk-based diskless nodes. Generally, the specific files are usually the system configuration files for the nodes in the /etc directory, like /etc/hosts, /etc/resolve.conf; it also could be the application programs configuration files for the nodes. The advantages of this function are: it can parallel sync files to the nodes or nodegroup for the installed nodes; it can automatically sync files to the newly-installing node after the installation. Additionally, this feature also supports the flexible format to define the files to be synced in a configuration file, called “synclist”.

The synclist file can be a common one for a group of nodes using the same profile or osimage, or can be the unique for each particular node. Since the location of the synclist file will be used to find the synclist file, the common synclist should be put in a given location for Linux nodes or specified by the osimage.

xdcp command supplies the basic Syncing File function. If the -F synclist option is specified in the xdcp command, it syncs files configured in the synclist to the nodes. If the -i <install image path> option is specified with -F synclist, it syncs files to the root image located in the <install image path> directory.

Note

The -i <install image path> option is only supported for Linux nodes

xdcp supports hierarchy where service nodes are used. If a node is serviced by a service node, xdcp will sync the files to the service node first, then sync the files from service node to the compute node. The files are placed in an intermediate directory on the service node defined by the SNsyncfiledir attribute in the site table. The default is /var/xcat/syncfiles.

Since updatenode -F calls the xdcp to handle the Syncing File function, the updatenode -F also supports the hierarchy.

For a new-installing nodes, the Syncing File action will be triggered when running the postscripts for the nodes. A special postscript named syncfiles is used to initiate the Syncing File process.

The postscript syncfiles is located in the /install/postscripts/. When running, it sends a message to the xcatd on the management node or service node, then the xcatd figures out the corresponding synclist file for the node and calls the xdcp command to sync files in the synclist to the node.

If installing nodes in a hierarchical configuration, you must sync the service nodes first to make sure they are updated. The compute nodes will be synced from their service nodes. You can use the updatenode <computenodes> -f command to sync all the service nodes for range of compute nodes provided.

For an installed nodes, the Syncing File action happens when performing the updatenode -F or xdcp -F synclist command to update a nodes. While performing the updatenode -F, it figures out the location of the synclist files for all the nodes and groups the nodes which will be using the same synclist file and then calls the xdcp -F synclist to sync files to the nodes.

The synclist file
The Format of synclist file

The synclist file contains the configuration entries that specify where the files should be synced to. In the synclist file, each line is an entry which describes the location of the source files and the destination location of files on the target node.

The basic entry format looks like following:

path_of_src_file1 -> path_of_dst_file1
path_of_src_file1 -> path_of_dst_directory
path_of_src_file1 path_of_src_file2 ... -> path_of_dst_directory

The path_of_src_file* should be the full path of the source file on the Management Node.

The path_of_dst_file* should be the full path of the destination file on target node. Make sure path_of_dst_file* is not a existing directory on target node, otherwise, the file sync with updatenode -r /usr/bin/scp or xdcp -r /usr/bin/scp will fail.

The path_of_dst_directory should be the full path of the destination directory. Make sure path_of_dst_directory is not a existing file on target node, otherwise, the file sync with updatenode -r /usr/bin/scp or xdcp -r /usr/bin/scp will fail.

If no target node is specified, the files will be synced to all nodes in the cluster. See “Support nodes in synclist file” below for how to specify a noderange.

The following synclist formats are supported:

sync file /etc/file2 to the file /etc/file2 on the node (with same file name)

/etc/file2 -> /etc/file2

sync file /etc/file2 to the file /etc/file3 on the node (with different file name)

/etc/file2 -> /etc/file3

sync file /etc/file4 to the file /etc/tmp/file5 on the node (different file name and directory). The directory will be automatically created for you.

/etc/file4 -> /etc/tmp/file5

sync the multiple files /etc/file1, /etc/file2, /etc/file3, … to the directory /tmp/etc (/tmp/etc must be a directory when multiple files are synced at one time). If the directory does not exist, it will be created.

/etc/file1 /etc/file2 /etc/file3 -> /tmp/etc

sync file /etc/file2 to the file /etc/file2 on the node

/etc/file2 -> /etc/

sync all files, including subdirectories, in /home/mikev to directory /home/mikev on the node

/home/mikev/* -> /home/mikev/
           or
/home/mikev -> /home/mikev/

Note

Don’t try to sync files to the read only directory on the target node.

An example of synclist file

Sync the file /etc/common_hosts to the two places on the target node: put one to the /etc/hosts, the other to the /tmp/etc/hosts. Following configuration entries should be added

/etc/common_hosts -> /etc/hosts
/etc/common_hosts -> /tmp/etc/hosts

Sync files in the directory /tmp/prog1 to the directory /prog1 on the target node, and the postfix .tmpl needs to be removed on the target node. (directory /tmp/prog1/ contains two files: conf1.tmpl and conf2.tmpl) Following configuration entries should be added

/tmp/prog1/conf1.tmpl -> /prog1/conf1
/tmp/prog1/conf2.tmpl -> /prog1/conf2

Sync the files in the directory /tmp/prog2 to the directory /prog2 with same name on the target node. (directory /tmp/prog2 contains two files: conf1 and conf2) Following configuration entries should be added:

/tmp/prog2/conf1 /tmp/prog2/conf2 -> /prog2

Sample synclist file

/etc/common_hosts -> /etc/hosts
/etc/common_hosts -> /tmp/etc/hosts
/tmp/prog1/conf1.tmpl -> /prog1/conf1
/tmp/prog1/conf2.tmpl -> /prog1/conf2
/tmp/prog2/conf1 /tmp/prog2/conf2 -> /prog2
/tmp/* -> /tmp/
/etc/testfile -> /etc/

If the above syncfile is used by the updatenode/xdcp commands, or used in a node installation process, the following files will exist on the target node with the following contents.

/etc/hosts(It has the same content with /etc/common_hosts on the MN)
/tmp/etc/hosts(It has the same content with /etc/common_hosts on the MN)
/prog1/conf1(It has the same content with /tmp/prog1/conf1.tmpl on the MN)
/prog1/conf2(It has the same content with /tmp/prog1/conf2.tmpl on the MN)
/prog2/conf1(It has the same content with /tmp/prog2/conf1 on the MN)
/prog2/conf2(It has the same content with /tmp/prog2/conf2 on the MN)
Support nodes in synclist file

Starting with xCAT 2.9.2 on AIX and with xCAT 2.12 on Linux, xCAT supports a new format for syncfile. The new format is

file -> (noderange for permitted nodes) file

The noderange can have several formats. Following examples show that /etc/hosts file is synced to the nodes which are specified before the file name

/etc/hosts -> (node1,node2) /etc/hosts            # The /etc/hosts file is synced to node1 and node2
/etc/hosts -> (node1-node4) /etc/hosts            # The /etc/hosts file is synced to node1,node2,node3 and node4
/etc/hosts -> (node[1-4]) /etc/hosts              # The /etc/hosts file is synced to node1, node2, node3 and node4
/etc/hosts -> (node1,node[2-3],node4) /etc/hosts  # The /etc/hosts file is synced to node1, node2, node3 and node4
/etc/hosts -> (group1) /etc/hosts                 # The /etc/hosts file is synced to nodes in group1
/etc/hosts -> (group1,group2) /etc/hosts          # The /etc/hosts file is synced to nodes in group1 and group2
Advanced synclist file features

EXECUTE

The EXECUTE clause is used to list all the postsync scripts (<filename>.post) you would like to run after the files are synced, only if the file <filename> is updated. For hierarchical clusters, the postsync files in this list must also be added to the list of files to sync. It is optional for non-hierarchical clusters. If noderange is used in the synclist for the file listed in the EXECUTE clause, the postsync script will only be executed on the nodes in that noderange. The EXECUTE clause is not supported oif -r /usr/bin/scp option is used with xdcp or updatenode command.

EXECUTEALWAYS

The EXECUTEALWAYS clause is used to list all the postsync scripts you would like to run after the files are synced, whether or not any file is actually updated. The files in this list must be added to the list of files to sync. If noderange is used in the synclist for the file listed in the EXECUTEALWAYS clause, the script will only be exectuted on the nodes in that noderange.

Note

The path to the file to EXECUTE or EXECUTEALWAYS, is the location of the file on the MN.

For example, your syncfile may look like this.:

/tmp/share/file2  -> /tmp/file2
/tmp/share/file2.post -> /tmp/file2.post (required for hierarchical clusters)
/tmp/share/file3 -> /tmp/filex
/tmp/share/file3.post -> /tmp/file3.post (required for hierarchical clusters)
/tmp/myscript1 -> /tmp/myscript1
/tmp/myscript2 -> /tmp/myscript2
# Postscripts
EXECUTE:
/tmp/share/file2.post
/tmp/share/file3.post
EXECUTEALWAYS:
/tmp/myscript1
/tmp/myscript2

If /tmp/file2 is updated on the node in /tmp/file2, then /tmp/file2.post is automatically executed on that node. If /tmp/file3 is updated on the node in /tmp/filex, then /tmp/file3.post is automatically executed on that node.

APPEND

The APPEND clause is used to append the contents of the input file to an existing file on the node. The file to be appended must already exist on the node and not be part of the synclist that contains the APPEND clause.

For example, your synclist file may look like this:

/tmp/share/file2  ->  /tmp/file2
/tmp/share/file2.post -> /tmp/file2.post
/tmp/share/file3  ->  /tmp/filex
/tmp/share/file3.post -> /tmp/file3.post
/tmp/myscript -> /tmp/myscript
# Postscripts
EXECUTE:
/tmp/share/file2.post
/tmp/share/file3.post
EXECUTEALWAYS:
/tmp/myscript
APPEND:
/etc/myappenddir/appendfile -> /etc/mysetup/setup
/etc/myappenddir/appendfile2 -> /etc/mysetup/setup2

When you use the APPEND clause, the source file to the left of the arrow is appended to the file to the right of the arrow. In this example, /etc/myappenddir/appendfile is appended to /etc/mysetup/setup file, which must already exist on the node. The /opt/xcat/share/xcat/scripts/xdcpappend.sh is used to accomplish this.

The script creates a backup of the original file on the node in the directory defined by the site table nodesyncfiledir attribute, which is /var/xcat/node/syncfiles by default. To update the original file when using the function, you need to sync a new original file to the node, removed the old original from the /var/xcat/node/syncfiles/org directory. If you want to cleanup all the files for the append function on the node, you can use xdsh -c command. See man page for xdsh.

MERGE (supported on Linux only).

The MERGE clause is used to append the contents of the input file to either the /etc/passwd, /etc/shadow or /etc/group files. They are the only supported files. You must not put the /etc/passwd, /etc/shadow, /etc/group files in an APPEND clause if using a MERGE clause. For these three files you should use the MERGE clause. The APPEND will add the information to the end of the file. The MERGE will add or replace the information and ensure that there are no duplicate entries in these files.

For example, your synclist file may look like this

/tmp/share/file2  ->  /tmp/file2
/tmp/share/file2.post -> /tmp/file2.post
/tmp/share/file3  ->  /tmp/filex
/tmp/share/file3.post -> /tmp/file3.post
/tmp/myscript -> /tmp/myscript
# Postscripts
EXECUTE:
/tmp/share/file2.post
/tmp/share/file3.post
EXECUTEALWAYS:
/tmp/myscript
MERGE:
/etc/mydir/mergepasswd -> /etc/passwd
/etc/mydir/mergeshadow -> /etc/shadow
/etc/mydir/mergegroup -> /etc/group

When you use the MERGE clause, the source file to the left of the arrow is merged into the file to the right of the arrow. It will replace any common userids found in those files and add new userids. The /opt/xcat/share/xcat/scripts/xdcpmerge.sh is used to accomplish this.

Note

no order of execution may be assumed by the order of EXECUTE, EXECUTEALWAYS, APPEND and MERGE clauses in the synclist file.

The location of synclist file for updatenode and install process

In the installation process or updatenode process, xCAT needs to figure out the location of the synclist file automatically, so the synclist should be put into the specified place with the proper name.

If the provisioning method for the node is an osimage name, then the path to the synclist will be read from the osimage definition synclists attribute. You can display this information by running the following command, supplying your osimage name.

lsdef -t osimage -l <os>-<arch>-netboot-compute

Object name: <os>-<arch>-netboot-compute
exlist=/opt/xcat/share/xcat/netboot/<os>/compute.exlist
imagetype=linux
osarch=<arch>
osname=Linux
osvers=<os>
otherpkgdir=/install/post/otherpkgs/<os>/<arch>
pkgdir=/install/<os>/<arch>
pkglist=/opt/xcat/share/xcat/netboot/<os>/compute.pkglist
profile=compute
provmethod=netboot
rootimgdir=/install/netboot/<os>/<arch>/compute
**synclists=/install/custom/netboot/compute.synclist**

You can set the synclist path using the following command

chdef -t osimage -o  <os>-<arch>-netboot-compute synclists="/install/custom/netboot/compute.synclist

If the provisioning method for the node is install, or netboot then the path to the synclist should be in the following format

/install/custom/<inst_type>/<distro>/<profile>.<os>.<arch>.synclist

<inst_type>: "install", "netboot"
<distro>:    "rh", "centos", "fedora", "sles"
<profile>, <os> and <arch> are what you set for the node

For example: The location of synclist file for the diskful installation of RedHat 7.5 with compute as the profile

/install/custom/install/rh/compute.rhels7.5.synclist

The location of synclist file for the diskless netboot of SLES 12.3 with service as the profile

/install/custom/netboot/sles/service.sles12.3.synclist
Run xdcp command to perform Syncing File action

xdcp command supplies three options -F , -s, and -i to support the Syncing File function.

  • -F|–File <synclist input file>

Specifies the full path to the synclist file

  • -s

Specifies to sync to the service nodes only for the input compute noderange.

  • -i | –rootimg <install image for Linux>

Specifies the full path to the install image on the local node.

By default, if the -F option is specified, the rsync command is used to perform the syncing file function. Only the ssh remote shell is supported for rsync. xdcp uses the -Lpotz as the default flags to call the rsync command. More flags for rsync command can be specified by adding -o flag to the call to xdcp.

For example you can use xdcp -F option to sync files which are listed in the /install/custom/commonsyncfiles/<profile>.synclist file to the node group named compute. If the node group compute is serviced by servicenodes, then the files will be automatically staged to the correct service nodes, and then synced to the compute nodes from those service nodes. The files will be stored in /var/xcat/syncfiles directory on the service nodes by default, or in the directory indicated in the site.SNsyncfiledir attribute. See -s option below.

xdcp compute -F /install/custom/commonsynfiles/<profile>.synclist

For Linux nodes, you can use xdcp command -i option with -F to sync files specified in the /install/custom/<inst_type>/<os><profile>.synclist file to the osimage in the directory /install/<inst_type>/<os>/<arch>/<profile>/rootimg:

xdcp -i /install/<inst_type>/<os>/<arch>/<profile>/rootimg -F /install/custom/<inst_type>/<os>/<profile>.synclist

You can use the xdcp -s option to sync the files only to the service nodes for the node group named compute. The files will be placed in the default /var/xcat/syncfiles directory or in the directory as indicated in the site.SNsyncfiledir attribute. If you want the files synched to the same directory on the service node that they come from on the Management Node, set site.SNsyncfiledir=/ attribute. This can be setup before a node install, to have the files available to be synced during the install:

xdcp compute -s -F /install/custom/<inst_type>/<os>/<profile>.synclist
Synchronizing Files during the installation process

The policy table must have the entry to allow syncfiles postscript to access the Management Node. Make sure this entry is in your policy table:

#priority,name,host,commands,noderange,parameters,time,rule,comments,disable
.
.
"4.6",,,"syncfiles",,,,"allow",,
.
.
Hierarchy and Service Nodes

If using Service nodes to manage you nodes, you should make sure that the service nodes have been synchronized with the latest files from the Management Node before installing. If you have a group of compute nodes compute that are going to be installed that are serviced by SN1, then run the following before the install to sync the current files to SN1.:

updatenode compute -f

Note

updatenode will figure out which service nodes need updating.

Diskful installation

The syncfiles postscript is in the defaults section of the postscripts table. To enable the syncfiles postscript to sync files to the nodes during install the user need to do the following:

Make sure your postscripts table has the syncfiles postscript listed:

#node,postscripts,postbootscripts,comments,disable
"xcatdefaults","syslog,remoteshell,syncfiles","otherpkgs",,
Diskless Installation

The diskless boot is similar with the diskful installation for the synchronizing files operation, except that the packimage command will sync files to the root directories of image during the creating image process.

Creating the synclist file as the steps in Diskful installation section, then the synced files will be synced to the os image during the packimage and mkdsklsnode commands running.

Also the files will always be re-synced during the booting up of the diskless node.

Run the Sync’ing File action in the creating diskless image process

Different approaches are used to create the diskless image. The Sync’ing File action is also different.

The packimage command is used to prepare the root image files and package the root image. The Syncing File action is performed here.

Steps to make the Sync’ing File working in the packimage command:

  1. Prepare the synclist file and put it into the appropriate location as describe above in (refer The location of synclist file for updatenode and install process)
  2. Run packimage as is normally done.
Run the Syncing File action in the updatenode process

Running updatenode command with -F option, will sync files configured in the synclist file to the nodes. updatenode does not sync images, use xdcp -i -F command to sync images.

updatenode can be used to sync files to to diskful or diskless nodes. updatenode cannot be used to sync files to statelite nodes.

Steps to make the Syncing File working in the updatenode -F command:

  1. Create the synclist file with the entries indicating which files should be synced. (refer to The Format of synclist file)
  2. Put the synclist into the proper location (refer to The location of synclist file for updatenode and install process).
  3. Run the updatenode <noderange> -F command to initiate the Syncing File action.

Note

Since Syncing File action can be initiated by the updatenode -F, the updatenode -P does NOT support to re-run the syncfiles postscript, even if you specify the syncfiles postscript on the updatenode command line or set the syncfiles in the postscripts.postscripts attribute.

Run the Syncing File action periodically

If the admins want to run the Syncing File action automatically or periodically, the xdcp -F, xdcp -i -F and updatenode -F commands can be used in the script, crontab or FAM directly.

For example:

Use the cron daemon to sync files in the /install/custom/<inst_type>/<distro>/<profile>.<os>.synclist to the nodegroup compute every 10 minutes with the xdcp command by adding this to crontab. :

*/10 * * * * root /opt/xcat/bin/xdcp compute -F /install/custom/<inst_type>/<distro>/<profile>.<distro>.synclist

Use the cron daemon to sync files for the nodegroup compute every 10 minutes with updatenode command.

*/10 * * * * root /opt/xcat/bin/updatenode compute -F
Add Additional Software Packages
Overview

The name of the packages that will be installed on the node are stored in the packages list files. There are two kinds of package list files:

  • The package list file contains the names of the packages that comes from the os distro. They are stored in .pkglist file.
  • The other package list file contains the names of the packages that do NOT come from the os distro. They are stored in .otherpkgs.pkglist file.

The path to the package lists will be read from the osimage definition. Which osimage a node is using is specified by the provmethod attribute. To display this value for a node:

lsdef node1 -i provmethod
Object name: node
provmethod=<osimagename>

You can display this details of this osimage by running the following command, supplying your osimage name:

lsdef -t osimage <osimagename>
Object name: <osimagename>
exlist=/opt/xcat/share/xcat/<inst_type>/<os>/<profile>.exlist
imagetype=linux
osarch=<arch>
osname=Linux
osvers=<os>
otherpkgdir=/install/post/otherpkgs/<os>/<arch>
otherpkglist=/install/custom/<inst_type>/<distro>/<profile>.otherpkgs.pkglist
pkgdir=/install/<os>/<arch>
pkglist=/opt/xcat/share/xcat/<inst_type>/<os>/<profile>.pkglist
postinstall=/opt/xcat/share/xcat/<inst_type>/<distro>/<profile>.<os>.<arch>.postinstall
profile=<profile>
provmethod=<profile>
rootimgdir=/install/<inst_type>/<os>/<arch>/<profile>
synclists=/install/custom/<inst_type>/<profile>.synclist

You can set the pkglist and otherpkglist using the following command:

chdef -t osimage <osimagename> pkglist=/opt/xcat/share/xcat/<inst_type>/<distro>/<profile>.pkglist\
                                         otherpkglist=/install/custom/<inst_type>/<distro>/my.otherpkgs.pkglist
Install Additional OS Packages for RHEL and SLES
Install Additional Packages using OS Packages steps

For rpms from the OS distro, add the new rpm names (without the version number) in the .pkglist file. For example, file /install/custom/<inst_type>/<os>/<profile>.pkglist will look like this after adding perl-DBI:

bash
nfs-utils
openssl
dhcpcd
kernel-smp
openssh
procps
psmisc
resmgr
wget
rsync
timezone
perl-DBI

For the format of the .pkglist file, see File Format for .ospkgs.pkglist File

If you have newer updates to some of your operating system packages that you would like to apply to your OS image, you can place them in another directory, and add that directory to your osimage pkgdir attribute. For example, with the osimage defined above, if you have a new openssl package that you need to update for security fixes, you could place it in a directory, create repository data, and add that directory to your pkgdir:

mkdir -p /install/osupdates/<os>/<arch>
cd /install/osupdates/<os>/<arch>
cp <your new openssl rpm>  .
createrepo .
chdef -t osimage <os>-<arch>-<inst_type>-<profile> pkgdir=/install/<os>/<arch>,/install/osupdates/<os>/<arch>

Note:If the objective node is not installed by xCAT, make sure the correct osimage pkgdir attribute so that you could get the correct repository data.

File Format for .ospkgs.pkglist File

The .pklist file is used to specify the rpm and the group/pattern names from os distro that will be installed on the nodes. It can contain the following types of entries:

* rpm name without version numbers
* group/pattern name marked with a '@' (for full install only)
* rpms to removed after the installation marked with a "-" (for full install only)

These are described in more details in the following sections.

RPM Names

A simple .pkglist file just contains the name of the rpm file without the version numbers.

For example

openssl
xntp
rsync
glibc-devel.i686
Include pkglist Files

The #INCLUDE statement is supported in the pkglist file.

You can group some rpms in a file and include that file in the pkglist file using #INCLUDE:<file># format.

openssl
xntp
rsync
glibc-devel.1686
#INCLUDE:/install/post/custom/<distro>/myotherlist#

where /install/post/custom/<distro>/myotherlist is another package list file that follows the same format.

Note: the trailing “#” character at the end of the line. It is important to specify this character for correct pkglist parsing.

Group/Pattern Names

It is only supported for stateful deployment.

In Linux, a groups of rpms can be packaged together into one package. It is called a group on RedHat, CentOS, Fedora and Scientific Linux. To get the a list of available groups, run

  • [RHEL]

    yum grouplist
    
  • [SLES]

    zypper se -t pattern
    

You can specify in this file the group/pattern names by adding a ‘@’ and a space before the group/pattern names. For example:

@ base
Remove RPMs After Installing

It is only supported for stateful deployment.

You can specify in this file that certain rpms to be removed after installing the new software. This is done by adding ‘-’ before the rpm names you want to remove. For example:

-ntp
Install Additional Other Packages for RHEL and SLES
Install Additional Other Packages Steps

If you have additional rpms (rpms not in the distro) that you also want installed, make a directory to hold them, create a list of the rpms you want installed, and add that information to the osimage definition:

  • Create a directory to hold the additional rpms:

    mkdir -p /install/post/otherpkgs/<distro>/<arch>
    cd /install/post/otherpkgs/<distro>/<arch>
    cp /myrpms/* .
    createrepo .
    
  • Create a file that lists the additional rpms that should be installed. For example, in /install/custom/<inst_type>/<distro>/<profile>.otherpkgs.pkglist put:

    myrpm1
    myrpm2
    myrpm3
    
  • Add both the directory and the file to the osimage definition:

    chdef -t osimage mycomputeimage otherpkgdir=/install/post/otherpkgs/<os>/<arch> otherpkglist=/install/custom/<inst_type>/<os>/<profile>.otherpkgs.pkglist
    

If you add more rpms at a later time, you must run createrepo again. The createrepo command is in the createrepo rpm, which for RHEL is in the 1st DVD, but for SLES is in the SDK DVD.

If you have multiple sets of rpms that you want to keep separate to keep them organized, you can put them in separate sub-directories in the otherpkgdir. If you do this, you need to do the following extra things, in addition to the steps above:

  • Run createrepo in each sub-directory

  • In your otherpkgs.pkglist, list at least 1 file from each sub-directory. (During installation, xCAT will define a yum or zypper repository for each directory you reference in your otherpkgs.pkglist.) For example:

    xcat/xcat-core/xCATsn
    xcat/xcat-dep/<os>/<arch>/conserver-xcat
    

There are some examples of otherpkgs.pkglist in /opt/xcat/share/xcat/<inst_type>/<distro>/<profile>.*.otherpkgs.pkglist that show the format.

Note: the otherpkgs postbootscript should by default be associated with every node. Use lsdef to check:

lsdef node1 -i postbootscripts

If it is not, you need to add it. For example, add it for all of the nodes in the “compute” group:

chdef -p -t group compute postbootscripts=otherpkgs

For the format of the .Otherpkgs file,see File Format for .otherpkgs.pkglist File

File Format for .otherpkgs.pkglist File

The otherpkgs.pklist file can contain the following types of entries:

* rpm name without version numbers
* otherpkgs subdirectory plus rpm name
* blank lines
* comment lines starting with #
* #INCLUDE: <full file path># to include other pkglist files
* #NEW_INSTALL_LIST# to signify that the following rpms will be installed with a new rpm install command (zypper, yum, or rpm as determined by the function using this file)
* #ENV:<variable list># to specify environment variable(s) for a sparate rpm install command
* rpms to remove before installing marked with a "-"
* rpms to remove after installing marked with a "--"

These are described in more details in the following sections.

RPM Names

A simple otherpkgs.pkglist file just contains the name of the rpm file without the version numbers.

For example, if you put the following three rpms under /install/post/otherpkgs/<os>/<arch>/ directory,

rsct.core-2.5.3.1-09120.ppc.rpm
rsct.core.utils-2.5.3.1-09118.ppc.rpm
src-1.3.0.4-09118.ppc.rpm

The otherpkgs.pkglist file will be like this:

src
rsct.core
rsct.core.utils
RPM Names with otherpkgs Subdirectories

If you create a subdirectory under /install/post/otherpkgs/<os>/<arch>/, say rsct, the otherpkgs.pkglist file will be like this:

rsct/src
rsct/rsct.core
rsct/rsct.core.utils
Include Other pkglist Files

You can group some rpms in a file and include that file in the otherpkgs.pkglist file using #INCLUDE:<file># format.

rsct/src
rsct/rsct.core
rsct/rsct.core.utils
#INCLUDE:/install/post/otherpkgs/myotherlist#

where /install/post/otherpkgs/myotherlist is another package list file that follows the same format.

Note the trailing “#” character at the end of the line. It is important to specify this character for correct pkglist parsing.

Multiple Install Lists

You can specify that separate calls should be made to the rpm install program (zypper, yum, rpm) for groups of rpms by specifying the entry #NEW_INSTALL_LIST# on a line by itself as a separator in your pkglist file. All rpms listed up to this separator will be installed together. You can have as many separators as you wish in your pkglist file, and each sublist will be installed separately in the order they appear in the file.

For example:

compilers/vacpp.rte
compilers/vac.lib
compilers/vacpp.lib
compilers/vacpp.rte.lnk
#NEW_INSTALL_LIST#
pe/IBM_pe_license
Environment Variable List

You can specify environment variable(s) for each rpm install call by entry “#ENV:<variable list>#”. The environment variables also apply to rpm(s) remove call if there is rpm(s) needed to be removed in the sublist.

For example:

#ENV:INUCLIENTS=1 INUBOSTYPE=1#
rsct/rsct.core
rsct/rsct.core.utils
rsct/src

Be same as,

#ENV:INUCLIENTS=1#
#ENV:INUBOSTYPE=1#
rsct/rsct.core
rsct/rsct.core.utils
rsct/src
Remove RPMs Before Installing

You can also specify in this file that certain rpms to be removed before installing the new software. This is done by adding ‘-’ before the rpm names you want to remove. For example:

rsct/src
rsct/rsct.core
rsct/rsct.core.utils
#INCLUDE:/install/post/otherpkgs/myotherlist#
-perl-doc

If you have #NEW_INSTALL_LIST# separators in your pkglist file, the rpms will be removed before the install of the sublist that the "-<rpmname>" appears in.

Remove RPMs After Installing

You can also specify in this file that certain rpms to be removed after installing the new software. This is done by adding -- before the rpm names you want to remove. For example:

pe/IBM_pe_license
--ibm-java2-ppc64-jre

If you have #NEW_INSTALL_LIST# separators in your pkglist file, the rpms will be removed after the install of the sublist that the "--<rpmname>" appears in.

Install Additional Other Packages with Ubuntu official mirror

The Ubuntu ISO used to install the compute nodes only include packages to run a minimal base operating system, it is likely that users will want to install additional Ubuntu packages from the internet Ubuntu repositories or local repositories, this section describes how to install additional Ubuntu packages.

Compute nodes can access the internet
  1. Specify the repository

    Define the otherpkgdir attribute in osimage to use the internet repository directly.:

    chdef -t osimage <osimage name> otherpkgdir="http://us.archive.ubuntu.com/ubuntu/ \
    $(lsb_release -sc) main,http://us.archive.ubuntu.com/ubuntu/ $(lsb_release -sc)-update main"
    
  2. Define the otherpkglist file

    create an otherpkglist file: /install/custom/install/ubuntu/compute.otherpkgs.pkglist, add the package names into this file, and modify the otherpkglist attribute in the osimage.

    chdef -t osimage <osimage name> otherpkglist=/install/custom/install/ubuntu/compute.otherpkgs.pkglist
    
  3. Run updatenode <noderange> -S or updatenode <noderange> -P otherpkgs

    Run updatenode -S to install/update the packages on the compute nodes

    updatenode <noderange> -S
    

    Run updatenode -P otherpkgs to install/update the packages on the compute nodes

    updatenode <noderange> -P otherpkgs
    
Compute nodes can not access the internet

If compute nodes cannot access the internet, there are two ways to install additional packages

Use new kernel patch

This procedure assumes there are kernel RPM in /tmp, we take the osimage rhels7.3-ppc64le-install-compute as an example. The RPM names below are only examples, substitute your specific level and architecture.

  • [RHEL]
  1. The RPM kernel package is usually named: kernel-<kernelver>.rpm. Append new kernel packages directory to osimage pkgdir

    mkdir -p /install/kernels/<kernelver>
    cp /tmp/kernel-*.rpm /install/kernels/<kernelver>
    createrepo /install/kernels/<kernelver>/
    chdef -t osimage rhels7.3-ppc64le-install-compute -p pkgdir=/install/kernels/<kernelver>
    
  2. Inject the drivers from the new kernel RPMs into the initrd

    mkdef -t osdistroupdate kernelupdate dirpath=/install/kernels/<kernelver>/
    chdef -t osimage rhels7.3-ppc64le-install-compute osupdatename=kernelupdate
    chdef -t osimage rhels7.3-ppc64le-install-compute netdrivers=updateonly
    genitrd rhels7.3-ppc64le-install-compute --ignorekernelchk
    nodeset <CN> osimage=rhels7.3-ppc64le-install-compute --noupdateinitrd
    
  3. Boot CN from net normally.

Customize network adapter

This section describes how to configure network adapters with persistent configuration using xCAT. The confignetwork postscript can be used to configure the network interfaces on the compute nodes to support Ethernet adapters, VLAN, BONDs, and BRIDGES.

Configure Additional Network Interfaces - confignetwork

The confignetwork postscript can be used to configure the network interfaces on the compute nodes to support Ethernet adapters, VLAN, BONDs, and BRIDGES. confignetwork can be used in postscripts during OS privision, it can also be executed in updatenode. The way the confignetwork postscript decides what IP address to give the secondary adapter is by checking the nics table, in which the nic configuration information is stored. In order for the confignetwork postscript to run successfully, the following attributes must be configured for the node in the nics table:

  • nicips
  • nictypes
  • nicnetworks

If configuring VLAN, BOND, or BRIDGES, nicdevices in nics table must be configured. VLAN, BOND or BRIDGES is only supported on RHEL.

  • nicdevices - resolves the relationship among the physical network interface devices

The following scenarios are examples to configure Ethernet adapters/BOND/VLAN/Bridge.

Configure routes

There are 2 ways to configure OS route in xCAT:

  • makeroutes: command to add or delete routes on the management node or any given nodes.
  • setroute: script to replace/add the routes to the node, it can be used in postscripts/postbootscripts.

makeroutes or setroute will modify OS temporary route, it also modifies persistent route in /etc/sysconfig/static-routes file.

Before using makeroutes or setroute to configure OS route, details of the routes data such as routename, subnet, net mask and gateway should be stored in routes table.

Notes: the gateway in the networks table assigns gateway from DHCP to compute node, so if use makeroutes or setroute to configure OS static route for compute node, make sure there is no gateway for the specific network in networks table.

Configure routes table
  1. Store default route data in routes table:

    chdef -t route defaultroute net=default mask=255.0.0.0 gateway=10.0.0.101
    
  2. Store additional route data in routes table:

    chdef -t route 20net net=20.0.0.0 mask=255.0.0.0 gateway=0.0.0.0 ifname=eth1
    
  3. Check data in routes table:

    tabdump routes
    #routename,net,mask,gateway,ifname,comments,disable
    "30net","30.0.0.0","255.0.0.0","0.0.0.0","eth2",,
    "20net","20.0.0.0","255.0.0.0","0.0.0.0","eth1",,
    "defaultroute","default","255.0.0.0","10.0.0.101",,,
    
Use makeroutes to configure OS route on xCAT management node
  1. define the names of the routes to be setup on the management node in site table:

    chdef -t site mnroutenames="defaultroute,20net"
    lsdef -t site clustersite -i mnroutenames
        Object name: clustersite
            mnroutenames=defaultroute,20net
    
  2. add all routes from the mnroutenames to the OS route table for the management node:

    makeroutes
    
  3. add route 20net and 30net to the OS route table for the management node:

    makeroutes -r 20net,30net
    
  4. delete route 20net from the OS route table for the management node:

    makeroutes -d -r 20net
    
Use makeroutes to configure OS route for compute node
  1. define the names of the routes to be setup on the compute node:

    chdef -t cn1 routenames="defaultroute,20net"
    
  2. add all routes from the routenames to the OS route table for the compute node:

    makeroutes cn1
    
  3. add route 20net and 30net to the OS route table for the compute node:

    makeroutes cn1 -r 20net,30net
    
  4. delete route 20net from the OS route table for the compute node:

    makeroutes cn1,cn2 -d -r 20net
    
Use setroute to configure OS route for compute node
  1. define the names of the routes to be setup on the compute node:

    chdef -t cn1 routenames="defaultroute,20net"
    
  2. If adding setroute [replace | add] into the node’s postscripts list, setroute will be executed during OS deployment on compute node to replace/add routes from routenames:

    chdef cn1 -p postscripts="setroute replace"
    
  3. Or if the compute node is already running, use updatenode command to run setroute [replace | add] postscript:

    updatenode cn1 -P "setroute replace"
    
Check result
  1. Use route command in xCAT management node to check OS route table.
  2. Use xdsh cn1 route to check compute node OS route table.
Initialize the Compute for Deployment

XCAT use ‘nodeset’ command to associate a specific image to a node which will be installed with this image.

nodeset <nodename> osimage=<osimage>

There are more attributes of nodeset used for some specific purpose or specific machines, for example:

  • runimage: If you would like to run a task after deployment, you can define that task with this attribute.
  • runcmd: This instructs the node to boot to the xCAT nbfs environment and proceed to configure BMC for basic remote access. This causes the IP, netmask, gateway, username, and password to be programmed according to the configuration table.
  • shell: This instructs the node to boot to the xCAT genesis environment, and present a shell prompt on console. The node will also be able to be sshed into and have utilities such as wget, tftp, scp, nfs, and cifs. It will have storage drivers available for many common systems.

Choose such additional attribute of nodeset according to your requirement, if want to get more information about nodeset, refer to nodeset’s man page.

Start the OS Deployment

Start the deployment involves two key operations. First specify the boot device of the next boot to be network, then reboot the node:

For Power servers, those two operations can be completed by one command rnetboot:

rnetboot <node>

For x86_64 servers, those two operations need two independent commands.

  1. set the next boot device to be from the “network”

    rsetboot <node> net
    
  2. Reboot the xSeries server: ::

    rpower <node> reset
    
Diskless Installation
Select or Create an osimage Definition

Before creating an image on xCAT, the distro media should be prepared. That can be ISOs or DVDs.

XCAT uses copycds command to create an image which will be available to install nodes. copycds will copy all contents of Distribution DVDs/ISOs or Service Pack DVDs/ISOs to a destination directory, and create several relevant osimage definitions by default.

If using an ISO, copy it to (or NFS mount it on) the management node, and then run:

copycds <path>/<specific-distro>.iso

Note

While sle15 contains installer medium and packages medium, need copycds copy all contents of DVD1 of the installer medium and DVD1 of the packages medium, for example:

copycds SLE-15-Installer-DVD-ppc64le-GM-DVD1.iso SLE-15-Packages-ppc64le-GM-DVD1.iso

If using a DVD, put it in the DVD drive of the management node and run:

copycds /dev/<dvd-drive-name>

To see the list of osimages:

lsdef -t osimage

To see the attributes of a particular osimage:

lsdef -t osimage <osimage-name>

Initially, some attributes of osimage are assigned default values by xCAT - they all can work correctly because the files or templates invoked by those attributes are shipped with xCAT by default. If you need to customize those attributes, refer to the next section Customize osimage

Below is an example of osimage definitions created by copycds:

# lsdef -t osimage
rhels7.2-ppc64le-install-compute  (osimage)
rhels7.2-ppc64le-install-service  (osimage)
rhels7.2-ppc64le-netboot-compute  (osimage)
rhels7.2-ppc64le-stateful-mgmtnode  (osimage)

In these osimage definitions shown above

  • <os>-<arch>-install-compute is the default osimage definition used for diskful installation
  • <os>-<arch>-netboot-compute is the default osimage definition used for diskless installation
  • <os>-<arch>-install-service is the default osimage definition used for service node deployment which shall be used in hierarchical environment

Note

Additional steps are needed for ubuntu ppc64le osimages:

For pre-16.04.02 version of Ubuntu for ppc64el, the initrd.gz shipped with the ISO does not support network booting. In order to install Ubuntu with xCAT, you need to follow the steps to complete the osimage definition.

[Tips 1]

If this is the same distro version as what your management node uses, create a .repo file in /etc/yum.repos.d with contents similar to:

[local-<os>-<arch>]
name=xCAT local <os> <version>
baseurl=file:/install/<os>/<arch>
enabled=1
gpgcheck=0

This way, if you need to install some additional RPMs into your MN later, you can simply install them with yum. Or if you are installing a software on your MN that depends some RPMs from this disto, those RPMs will be found and installed automatically.

[Tips 2]

You can create/modify an osimage definition easily with any existing osimage definition, the command is

mkdef -t osimage -o <new osimage> --template <existing osimage> [<attribute>=<value>, ...]

Except the specified attributes <attribute>, the attributes of <new osimage> will inherit the values of template osimage <existing osimage>.

As an example, the following command creates a new osimage myosimage.rh7.compute.netboot based on the existing osimage rhels7.4-ppc64le-netboot-compute with some customized attributes

mkdef -t osimage -o myosimage.rh7.compute.netboot --template rhels7.4-ppc64le-netboot-compute synclists=/tmp/synclist otherpkgdir=/install/custom/osimage/myosimage.rh7.compute.netboot/3rdpkgs/ otherpkglist=/install/custom/osimage/myosimage.rh7.compute.netboot/3rd.pkglist
Customize osimage (Optional)

Optional means all the subitems in this page are not necessary to finish an OS deployment. If you are new to xCAT, you can just jump to Initialize the Compute for Deployment.

Load Additional Drivers
Overview

During the installing or netbooting of a node, the drivers in the initrd will be used to drive the devices like network cards and IO devices to perform the installation/netbooting tasks. But sometimes the drivers for the new devices were not included in the default initrd shipped by Red Hat or Suse. A solution is to inject the new drivers into the initrd to drive the new device during the installation/netbooting process.

Generally there are two approaches to inject the new drivers: Driver Update Disk and Drive RPM package.

A “Driver Update Disk” is media which contains the drivers, firmware and related configuration files for certain devices. The driver update disk is always supplied by the vendor of the device. One driver update disk can contain multiple drivers for different OS releases and different hardware architectures. Red Hat and Suse have different driver update disk formats.

The ‘Driver RPM Package’ is the rpm package which includes the drivers and firmware for the specific devices. The Driver RPM is the rpm package which is shipped by the Vendor of the device for a new device or a new kernel version.

xCAT supports both. But for ‘Driver RPM Package’ is only supported in xCAT 2.8 and later.

No matter which approach chosen, there are two steps to make new drivers work. one is locate the new driver’s path, another is inject the new drivers into the initrd.

Locate the New Drivers
For Driver Update Disk

There are two approaches for xCAT to find the driver disk (pick one):

  1. Specify the location of the driver disk in the osimage object (This is ONLY supported in xCAT 2.8 and later)

The value for the ‘driverupdatesrc’ attribute is a comma separated driver disk list. The tag ‘dud’ must be specified before the full path of ‘driver update disk’ to specify the type of the file:

chdef -t osimage <osimagename> driverupdatesrc=dud:<full path of driver disk>
  1. Put the driver update disk in the directory <installroot>/driverdisk/<os>/<arch> (example: /install/driverdisk/sles11.1/x86_64).

    During the running of the genimage, geninitrd, or nodeset commands, xCAT will look for driver update disks in the directory <installroot>/driverdisk/<os>/<arch>.

For Driver RPM Packages

The Driver RPM packages must be specified in the osimage object.

Three attributes of osimage object can be used to specify the Driver RPM location and Driver names. If you want to load new drivers in the initrd, the ‘netdrivers’ attribute must be set. And one or both of the ‘driverupdatesrc’ and ‘osupdatename’ attributes must be set. If both of ‘driverupdatesrc’ and ‘osupdatename’ are set, the drivers in the ‘driverupdatesrc’ have higher priority.

  • netdrivers - comma separated driver names that need to be injected into the initrd. The postfix ‘.ko’ can be ignored.

The ‘netdrivers’ attribute must be set to specify the new driver list. If you want to load all the drivers from the driver rpms, use the keyword allupdate. Another keyword for the netdrivers attribute is updateonly, which means only the drivers located in the original initrd will be added to the newly built initrd from the driver rpms. This is useful to reduce the size of the new built initrd when the distro is updated, since there are many more drivers in the new kernel rpm than in the original initrd. Examples:

chdef -t osimage <osimagename> netdrivers=megaraid_sas.ko,igb.ko
chdef -t osimage <osimagename> netdrivers=allupdate
chdef -t osimage <osimagename> netdrivers=updateonly,igb.ko,new.ko
  • driverupdatesrc - comma separated driver rpm packages (full path should be specified)

A tag named ‘rpm’ can be specified before the full path of the rpm to specify the file type. The tag is optional since the default format is ‘rpm’ if no tag is specified. Example:

chdef -t osimage <osimagename> driverupdatesrc=rpm:<full path of driver disk1>,rpm:<full path of driver disk2>
  • osupdatename - comma separated ‘osdistroupdate’ objects. Each ‘osdistroupdate’ object specifies a Linux distro update.

When geninitrd is run, kernel-*.rpm will be searched in the osdistroupdate.dirpath to get all the rpm packages and then those rpms will be searched for drivers. Example:

mkdef -t osdistroupdate update1 dirpath=/install/<os>/<arch>
chdef -t osimage <osimagename> osupdatename=update1

If ‘osupdatename’ is specified, the kernel shipped with the ‘osupdatename’ will be used to load the newly built initrd, then only the drivers matching the new kernel will be kept in the newly built initrd. If trying to use the ‘osupdatename’, the ‘allupdate’ or ‘updateonly’ should be added in the ‘netdrivers’ attribute, or all the necessary driver names for the new kernel need to be added in the ‘netdrivers’ attribute. Otherwise the new drivers for the new kernel will be missed in newly built initrd. ..

Inject the Drivers into the initrd
For Driver Update Disk
  • If specifying the driver disk location in the osimage

Run the following command:

genimage <osimagename>
  • If putting the driver disk in <installroot>/driverdisk/<os>/<arch>:

Running ‘genimage’ in anyway will load the driver disk ..

For Driver RPM Packages

Run the following command:

genimage <osimagename> [--ignorekernelchk]

The option ‘–ignorekernelchk’ is used to skip the kernel version checking when injecting drivers from osimage.driverupdatesrc. To use this flag, you should make sure the drivers in the driver rpms are usable for the target kernel. ..

Notes
  • If the drivers from the driver disk or driver rpm are not already part of the installed or booted system, it’s necessary to add the rpm packages for the drivers to the .pkglist or .otherpkglist of the osimage object to install them in the system.
  • If a driver rpm needs to be loaded, the osimage object must be used for the ‘nodeset’ and ‘genimage’ command, instead of the older style profile approach.
  • Both a Driver disk and a Driver rpm can be loaded in one ‘nodeset’ or ‘genimage’ invocation.
Prescripts and Postscripts
Using Postscript
Postscript Execution Order Summary
Diskless
Stage Scripts Execute Order
Install/Create postinstall genimage, after packages are installed
Boot/Reboot postscripts 1 postscripts.xcatdefaults
2 osimage
3 node
postbootscripts 4 postscripts.xcatdefaults
5 osimage
6 node

xCAT automatically runs a few postscripts and postbootscripts that are delivered with xCAT to set up the nodes. You can also add your own scripts to further customize the nodes.

Types of scripts

There are two types of scripts in the postscripts table ( postscripts and postbootscripts). The types are based on when in the install process they will be executed. Run the following for more information:

man postscripts
  • postscripts attribute - List of scripts that should be run on this node after diskful installation or diskless boot.

    • [RHEL]

    Postscripts will be run before the reboot.

    • [SLES]

    Postscripts will be run after the reboot but before the init.d process. For Linux diskless deployment, the postscripts will be run at the init.d time, and xCAT will automatically add the list of postscripts from the postbootscripts attribute to run after postscripts list.

  • postbootscripts attribute - list of postbootscripts that should be run on this Linux node at the init.d time after diskful installation reboot or diskless boot

  • xCAT, by default, for diskful installs only runs the postbootscripts on the install and not on reboot. In xCAT a site table attribute runbootscripts is available to change this default behavior. If set to yes, then the postbootscripts will be run on install and on reboot.

Note

xCAT automatically adds the postscripts from the xcatdefaults.postscripts attribute of the table to run first on the nodes after install or diskless boot.

Adding your own postscripts

To add your own script, place it in /install/postscripts on the management node. Make sure it is executable and world readable. Then add it to the postscripts table for the group of nodes you want it to be run on (or the all group if you want it run on all nodes).

To check what scripts will be run on your node during installation:

lsdef node1 | grep scripts
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles

You can pass parameters to the postscripts. For example:

script1 p1 p2,script2,....

p1 p2 are the parameters to script1.

Postscripts could be placed in the subdirectories in /install/postscripts on management node, and specify subdir/postscriptname in the postscripts table to run the postscripts in the subdirectories. This feature could be used to categorize the postscripts for different purposes. For example:

mkdir -p /install/postscripts/subdir1
mkdir -p /install/postscripts/subdir2
cp postscript1 /install/postscripts/subdir1/
cp postscript2 /install/postscripts/subdir2/
chdef node1 -p postscripts=subdir1/postscript1,subdir2/postscript2
updatenode node1 -P

If some of your postscripts will affect the network communication between the management node and compute node, like restarting network or configuring bond, the postscripts execution might not be able to be finished successfully because of the network connection problems. Even if we put this postscript be the last postscript in the list, xCAT still may not be able to update the node status to be booted. The recommendation is to use the Linux at mechanism to schedule this network-killing postscript to be run at a later time. For example:

The user needs to add a postscript to customize the nics bonding setup, the nics bonding setup will break the network between the management node and compute node. User could use at to run this nic bonding postscripts after all the postscripts processes have been finished.

Write a script, /install/postscripts/nicbondscript, the nicbondscript simply calls the confignicsbond using at:

[root@xcatmn ~]#cat /install/postscripts/nicbondscript
#!/bin/bash
at -f ./confignicsbond now + 1 minute
[root@xcatmn ~]#

Then

chdef <nodename> -p postbootscripts=nicbondscript
PostScript/PostbootScript execution

When your script is executed on the node, all the attributes in the site table are exported as variables for your scripts to use. You can add extra attributes for yourself. See the sample mypostscript file below.

To run the postscripts, a script is built, so the above exported variables can be input. You can usually find that script in /xcatpost on the node and in the Linux case it is call mypostscript. A good way to debug problems is to go to the node and just run mypostscript and see errors. You can also check the syslog on the Management Node for errors.

When writing you postscripts, it is good to follow the example of the current postscripts and write errors to syslog and in shell. See Suggestions for writing scripts.

All attributes in the site table are exported and available to the postscript/postbootscript during execution. See the mypostscript file, which is generated and executed on the nodes to run the postscripts.

Example of mypostscript

#subroutine used to run postscripts
run_ps () {
logdir="/var/log/xcat"
mkdir -p $logdir
logfile="/var/log/xcat/xcat.log"
if [_-f_$1_]; then
 echo "Running postscript: $@" | tee -a $logfile
 ./$@ 2>&1 | tee -a $logfile
else
 echo "Postscript $1 does NOT exist." | tee -a $logfile
fi
}
# subroutine end
AUDITSKIPCMDS='tabdump,nodels'
export AUDITSKIPCMDS
TEST='test'
export TEST
NAMESERVERS='7.114.8.1'
export NAMESERVERS
NTPSERVERS='7.113.47.250'
export NTPSERVERS
INSTALLLOC='/install'
export INSTALLLOC
DEFSERIALPORT='0'
export DEFSERIALPORT
DEFSERIALSPEED='19200'
export DEFSERIALSPEED
DHCPINTERFACES="'xcat20RRmn|eth0;rra000-m|eth1'"
export DHCPINTERFACES
FORWARDERS='7.113.8.1,7.114.8.2'
export FORWARDERS
NAMESERVER='7.113.8.1,7.114.47.250'
export NAMESERVER
DB='postg'
export DB
BLADEMAXP='64'
export BLADEMAXP
FSPTIMEOUT='0'
export FSPTIMEOUT
INSTALLDIR='/install'
export INSTALLDIR
IPMIMAXP='64'
export IPMIMAXP
IPMIRETRIES='3'
export IPMIRETRIES
IPMITIMEOUT='2'
export IPMITIMEOUT
CONSOLEONDEMAND='no'
export CONSOLEONDEMAND
SITEMASTER=7.113.47.250
export SITEMASTER
MASTER=7.113.47.250
export MASTER
MAXSSH='8'
export MAXSSH
PPCMAXP='64'
export PPCMAXP
PPCRETRY='3'
export PPCRETRY
PPCTIMEOUT='0'
export PPCTIMEOUT
SHAREDTFTP='1'
export SHAREDTFTP
SNSYNCFILEDIR='/var/xcat/syncfiles'
export SNSYNCFILEDIR
TFTPDIR='/tftpboot'
export TFTPDIR
XCATDPORT='3001'
export XCATDPORT
XCATIPORT='3002'
export XCATIPORT
XCATCONFDIR='/etc/xcat'
export XCATCONFDIR
TIMEZONE='America/New_York'
export TIMEZONE
USENMAPFROMMN='no'
export USENMAPFROMMN
DOMAIN='cluster.net'
export DOMAIN
USESSHONAIX='no'
export USESSHONAIX
NODE=rra000-m
export NODE
NFSSERVER=7.113.47.250
export NFSSERVER
INSTALLNIC=eth0
export INSTALLNIC
PRIMARYNIC=eth1
OSVER=fedora9
export OSVER
ARCH=x86_64
export ARCH
PROFILE=service
export PROFILE
PATH=`dirname $0`:$PATH
export PATH
NODESETSTATE='netboot'
export NODESETSTATE
UPDATENODE=1
export UPDATENODE
NTYPE=service
export NTYPE
MACADDRESS='00:14:5E:5B:51:FA'
export MACADDRESS
MONSERVER=7.113.47.250
export MONSERVER
MONMASTER=7.113.47.250
export MONMASTER
OSPKGS=bash,openssl,dhclient,kernel,openssh-server,openssh-clients,busybox-anaconda,vim-
minimal,rpm,bind,bind-utils,ksh,nfs-utils,dhcp,bzip2,rootfiles,vixie-cron,wget,vsftpd,ntp,rsync
OTHERPKGS1=xCATsn,xCAT-rmc,rsct/rsct.core,rsct/rsct.core.utils,rsct/src,yaboot-xcat
export OTHERPKGS1
OTHERPKGS_INDEX=1
export OTHERPKGS_INDEX
export NOSYNCFILES
# postscripts-start-here\n
run_ps ospkgs
run_ps script1 p1 p2
run_ps script2
# postscripts-end-here\n

The mypostscript file is generated according to the mypostscript.tmpl file.

Using the mypostscript template
Using the mypostscript template

xCAT provides a way for the admin to customize the information that will be provided to the postscripts/postbootscripts when they run on the node. This is done by editing the mypostscript.tmpl file. The attributes that are provided in the shipped mypostscript.tmpl file should not be removed. They are needed by the default xCAT postscripts.

The mypostscript.tmpl, is shipped in the /opt/xcat/share/xcat/mypostscript directory.

If the admin customizes the mypostscript.tmpl, they should copy the mypostscript.tmpl to /install/postscripts/mypostscript.tmpl, and then edit it. The mypostscript for each node will be named mypostscript.<nodename>. The generated mypostscript.<nodename>. will be put in the /tftpboot/mypostscripts directory.

site table precreatemypostscripts attribute

If the site table precreatemypostscripts attribute is set to 1 or yes, it will instruct xCAT at nodeset and updatenode time to query the db once for all of the nodes passed into the command and create the mypostscript file for each node and put them in a directory in $TFTPDIR (for example /tftpboot). The created mypostscript.<nodename>. file in the /tftpboot/mypostscripts directory will not be regenerated unless another nodeset or updatenode command is run to that node. This should be used when the system definition has stabilized. It saves time on the updatenode or reboot by not regenerating the mypostscript file.

If the precreatemyposcripts attribute is yes, and a database change is made or xCAT code is upgraded, then you should run a new nodeset or updatenode to regenerate the /tftpboot/mypostscript/mypostscript.<nodename> file to pick up the latest database setting. The default for precreatemypostscripts is no/0.

When you run nodeset or updatenode, it will search the /install/postscripts/mypostscript.tmpl first. If the /install/postscripts/mypostscript.tmpl exists, it will use that template to generate the mypostscript for each node. Otherwise, it will use /opt/xcat/share/xcat/mypostscript/mypostscript.tmpl.

Content of the template for mypostscript

Note

The attributes that are defined in the shipped mypostscript.tmpl file should not be removed. The xCAT default postscripts rely on that information to run successfully.

The following will explain the entries in the mypostscript.tmpl file.

The SITE_TABLE_ALL_ATTRIBS_EXPORT line in the file directs the code to export all attributes defined in the site table. The attributes are not always defined exactly as in the site table to avoid conflict with other table attributes of the same name. For example, the site table master attribute is named SITEMASTER in the generated mypostscript file.

#SITE_TABLE_ALL_ATTRIBS_EXPORT#

The following line exports ENABLESSHBETWEENNODES by running the internal xCAT routine (enablesshbetweennodes).

ENABLESSHBETWEENNODES=#Subroutine:xCAT::Template::enablesshbetweennodes:$NODE#
export ENABLESSHBETWEENNODES

tabdump(<TABLENAME>) is used to get all the information in the <TABLENAME> table

tabdump(networks)

These line export the node name based on its definition in the database.

NODE=$NODE
export NODE

These lines get a comma separated list of the groups to which the node belongs.

GROUP=#TABLE:nodelist:$NODE:groups#
export GROUP

These lines reads the nodesres table, the given attributes (nfsserver, installnic, primarynic, xcatmaster, routenames) for the node ($NODE), and exports it.

NFSSERVER=#TABLE:noderes:$NODE:nfsserver#
export NFSSERVER
INSTALLNIC=#TABLE:noderes:$NODE:installnic#
export INSTALLNIC
PRIMARYNIC=#TABLE:noderes:$NODE:primarynic#
export PRIMARYNIC
MASTER=#TABLE:noderes:$NODE:xcatmaster#
export MASTER
NODEROUTENAMES=#TABLE:noderes:$NODE:routenames#
export NODEROUTENAMES

The following entry exports multiple variables from the routes table. Not always set.

#ROUTES_VARS_EXPORT#

The following lines export nodetype table attributes.

OSVER=#TABLE:nodetype:$NODE:os#
export OSVER
ARCH=#TABLE:nodetype:$NODE:arch#
export ARCH
PROFILE=#TABLE:nodetype:$NODE:profile#
export PROFILE
PROVMETHOD=#TABLE:nodetype:$NODE:provmethod#
export PROVMETHOD

The following adds the current directory to the path for the postscripts.

PATH=`dirname $0`:$PATH
export PATH

The following sets the NODESETSTATE by running the internal xCAT getnodesetstate script.

NODESETSTATE=#Subroutine:xCAT::Postage::getnodesetstate:$NODE#
export NODESETSTATE

The following says the postscripts are not being run as a result of updatenode. (This is changed =1, when updatenode runs).

UPDATENODE=0
export UPDATENODE

The following sets the NTYPE to compute, service or MN.

NTYPE=$NTYPE
export NTYPE

The following sets the mac address.

MACADDRESS=#TABLE:mac:$NODE:mac#
export MACADDRESS

If vlan is setup, then the #VLAN_VARS_EXPORT# line will provide the following exports:

VMNODE='YES'
export VMNODE
VLANID=vlan1...
export VLANID
VLANHOSTNAME=..
  ..
#VLAN_VARS_EXPORT#

If monitoring is setup, then the #MONITORING_VARS_EXPORT# line will provide:

MONSERVER=11.10.34.108
export MONSERVER
MONMASTER=11.10.34.108
export MONMASTER
#MONITORING_VARS_EXPORT#

The #OSIMAGE_VARS_EXPORT# line will provide, for example:

OSPKGDIR=/install/<os>/<arch>
export OSPKGDIR
OSPKGS='bash,nfs-utils,openssl,dhclient,kernel,openssh-server,openssh-clients,busybox,wget,rsyslog,dash,vim-minimal,ntp,rsyslog,rpm,rsync,
  ppc64-utils,iputils,dracut,dracut-network,e2fsprogs,bc,lsvpd,irqbalance,procps,yum'
export OSPKGS

#OSIMAGE_VARS_EXPORT#

THE #NETWORK_FOR_DISKLESS_EXPORT# line will provide diskless networks information, if defined.

NETMASK=255.255.255.0
export NETMASK
GATEWAY=8.112.34.108
export GATEWAY
..
#NETWORK_FOR_DISKLESS_EXPORT#

Note

The #INCLUDE_POSTSCRIPTS_LIST# and the #INCLUDE_POSTBOOTSCRIPTS_LIST# sections in /tftpboot/mypostscript(mypostbootscripts) on the Management Node will contain all the postscripts and postbootscripts defined for the node. When running an updatenode command for only some of the scripts , you will see in the /xcatpost/mypostscript file on the node, the list has been redefined during the execution of updatenode to only run the requested scripts. For example, if you run updatenode <nodename> -P syslog.

The #INCLUDE_POSTSCRIPTS_LIST# flag provides a list of postscripts defined for this $NODE.

#INCLUDE_POSTSCRIPTS_LIST#

For example, you will see in the generated file the following stanzas:

# postscripts-start-here
# defaults-postscripts-start-here
syslog
remoteshell
# defaults-postscripts-end-here
# node-postscripts-start-here
syncfiles
# node-postscripts-end-here

The #INCLUDE_POSTBOOTSCRIPTS_LIST# provides a list of postbootscripts defined for this $NODE.

#INCLUDE_POSTBOOTSCRIPTS_LIST#

For example, you will see in the generated file the following stanzas:

# postbootscripts-start-here
# defaults-postbootscripts-start-here
otherpkgs
# defaults-postbootscripts-end-here
# node-postbootscripts-end-here
# postbootscripts-end-here
Kinds of variables in the template

Type 1: For the simple variable, the syntax is as follows. The mypostscript.tmpl has several examples of this. $NODE is filled in by the code. UPDATENODE is changed to 1, when the postscripts are run by updatenode. $NTYPE is filled in as either compute, service or MN.

NODE=$NODE
export NODE
UPDATENODE=0
export UPDATENODE
NTYPE=$NTYPE
export NTYPE

Type 2: This is the syntax to get the value of one attribute from the <tablename> and its key is $NODE. It does not support tables with two keys. Some of the tables with two keys are litefile, prodkey, deps, monsetting, mpa, networks. It does not support tables with keys other than $NODE. Some of the tables that do not use $NODE as the key, are passwd, rack, token

VARNAME=#TABLE:tablename:$NODE:attribute#

For example, to get the new updatestatus attribute from the nodelist table:

UPDATESTATUS=#TABLE:nodelist:$NODE:updatestatus#
export UPDATESTATUS

Type 3: The syntax is as follows:

VARNAME=#Subroutine:modulename::subroutinename:$NODE#
or
VARNAME=#Subroutine:modulename::subroutinename#

Examples in the mypostscript.tmpl are the following:

NODESETSTATE=#Subroutine:xCAT::Postage::getnodesetstate:$NODE#
export NODESETSTATE
ENABLESSHBETWEENNODES=#Subroutine:xCAT::Template::enablesshbetweennodes:$NODE#
export ENABLESSHBETWEENNODES

Note

Type 3 is not an open interface to add extensions to the template.

Type 4: The syntax is #FLAG#. When parsing the template, the code generates all entries defined by #FLAG#, if they are defined in the database. For example: To export all values of all attributes from the site table. The tag is

#SITE_TABLE_ALL_ATTRIBS_EXPORT#

For the #SITE_TABLE_ALL_ATTRIBS_EXPORT# flag, the related subroutine will get the attributes’ values and deal with the special case. such as : the site.master should be exported as "SITEMASTER". And if the noderes.xcatmaster exists, the noderes.xcatmaster should be exported as "MASTER", otherwise, we also should export site.master as the "MASTER".

Other examples are:

#VLAN_VARS_EXPORT#  - gets all vlan related items
#MONITORING_VARS_EXPORT#  - gets all monitoring configuration and setup da ta
#OSIMAGE_VARS_EXPORT# - get osimage related variables, such as ospkgdir, ospkgs ...
#NETWORK_FOR_DISKLESS_EXPORT# - gets diskless network information
#INCLUDE_POSTSCRIPTS_LIST# - includes the list of all postscripts for the node
#INCLUDE_POSTBOOTSCRIPTS_LIST# - includes the list of all postbootscripts for the node

Note

Type4 is not an open interface to add extensions to the template.

Type 5: Get all the data from the specified table. The <TABLENAME> should not be a node table, like nodelist. This should be handles with TYPE 2 syntax to get specific attributes for the $NODE. tabdump would result in too much data for a nodetype table. Also the auditlog, eventlog should not be in tabdump for the same reason. site table should not be specified, it is already provided with the #SITE_TABLE_ALL_ATTRIBS_EXPORT# flag. It can be used to get the data from the two key tables (like switch). The syntax is:

tabdump(<TABLENAME>)
Edit mypostscript.tmpl

Add new attributes into mypostscript.tmpl

When you add new attributes into the template, you should edit the /install/postscripts/mypostscript.tmpl which you created by copying /opt/xcat/share/xcat/mypostscript/mypostscript.tmpl. Make all additions before the # postscripts-start-here section. xCAT will first look in /install/postscripts/mypostscript.tmpl for a file and then, if not found, will use the one in /opt/xcat/share/xcat/mypostcript/mypostscript.tmpl.

For example:

UPDATESTATUS=#TABLE:nodelist:$NODE:updatestatus#
export UPDATESTATUS
...
# postscripts-start-here
#INCLUDE_POSTSCRIPTS_LIST#
## The following flag postscripts-end-here must not be deleted.
# postscripts-end-here

Note

If you have a hierarchical cluster, you must copy your new mypostscript.tmpl to /install/postscripts/mypostscript.tmpl on the service nodes, unless /install/postscripts directory is mounted from the MN to the service node.

Remove attribute from mypostscript.tmpl

If you want to remove an attribute that you have added, you should remove all the related lines or comment them out with ##. For example, comment out the added lines.

##UPDATESTATUS=#TABLE:nodelist:$NODE:updatestatus#
##export UPDATESTATUS
Test the new template

There are two quick ways to test the template.

  1. If the node is up

    updatenode <nodename> -P syslog
    

Check your generated mypostscript on the compute node:

vi /xcatpost/mypostscript
  1. Set the precreatemypostscripts option

    chdef -t site -o clustersite precreatemypostscripts=1
    

Then run

nodeset <nodename> ....

Check your generated mypostscript

vi /tftpboot/mypostscripts/mypostscript.<nodename>
Sample /xcatpost/mypostscript

This is an example of the generated postscript for a servicenode install. It is found in /xcatpost/mypostscript on the node.

# global value to store the running status of the postbootscripts,the value
#is non-zero if one postbootscript failed
return_value=0
# subroutine used to run postscripts
run_ps () {
 local ret_local=0
 logdir="/var/log/xcat"
 mkdir -p $logdir
 logfile="/var/log/xcat/xcat.log"
 if [ -f $1 ]; then
  echo "`date` Running postscript: $@" | tee -a $logfile
  #./$@ 2>&1 1> /tmp/tmp4xcatlog
  #cat /tmp/tmp4xcatlog | tee -a $logfile
  ./$@ 2>&1 | tee -a $logfile
  ret_local=${PIPESTATUS[0]}
  if [ "$ret_local" -ne "0" ]; then
    return_value=$ret_local
  fi
  echo "Postscript: $@ exited with code $ret_local"
 else
  echo "`date` Postscript $1 does NOT exist." | tee -a $logfile
  return_value=-1
 fi
 return 0
}
# subroutine end
SHAREDTFTP='1'
export SHAREDTFTP
TFTPDIR='/tftpboot'
export TFTPDIR
CONSOLEONDEMAND='yes'
export CONSOLEONDEMAND
PPCTIMEOUT='300'
export PPCTIMEOUT
VSFTP='y'
export VSFTP
DOMAIN='cluster.com'
export DOMAIN
XCATIPORT='3002'
export XCATIPORT
DHCPINTERFACES="'xcatmn2|eth1;service|eth1'"
export DHCPINTERFACES
MAXSSH='10'
export MAXSSH
SITEMASTER=10.2.0.100
export SITEMASTER
TIMEZONE='America/New_York'
export TIMEZONE
INSTALLDIR='/install'
export INSTALLDIR
NTPSERVERS='xcatmn2'
export NTPSERVERS
EA_PRIMARY_HMC='c76v2hmc01'
export EA_PRIMARY_HMC
NAMESERVERS='10.2.0.100'
export NAMESERVERS
SNSYNCFILEDIR='/var/xcat/syncfiles'
export SNSYNCFILEDIR
DISJOINTDHCPS='0'
export DISJOINTDHCPS
FORWARDERS='8.112.8.1,8.112.8.2'
export FORWARDERS
VLANNETS='|(\d+)|10.10.($1+0).0|'
export VLANNETS
XCATDPORT='3001'
export XCATDPORT
USENMAPFROMMN='no'
export USENMAPFROMMN
DNSHANDLER='ddns'
export DNSHANDLER
ROUTENAMES='r1,r2'
export ROUTENAMES
INSTALLLOC='/install'
export INSTALLLOC
ENABLESSHBETWEENNODES=YES
export ENABLESSHBETWEENNODES
NETWORKS_LINES=4
 export NETWORKS_LINES
NETWORKS_LINE1='netname=public_net||net=8.112.154.64||mask=255.255.255.192||mgtifname=eth0||gateway=8.112.154.126||dhcpserver=||tftpserver=8.112.154.69||nameservers=8.112.8.1||ntpservers=||logservers=||dynamicrange=||staticrange=||staticrangeincrement=||nodehostname=||ddnsdomain=||vlanid=||domain=||mtu=||disable=||comments='
export NETWORKS_LINE2
NETWORKS_LINE3='netname=sn21_net||net=10.2.1.0||mask=255.255.255.0||mgtifname=eth1||gateway=<xcatmaster>||dhcpserver=||tftpserver=||nameservers=10.2.1.100,10.2.1.101||ntpservers=||logservers=||dynamicrange=||staticrange=||staticrangeincrement=||nodehostname=||ddnsdomain=||vlanid=||domain=||mtu=||disable=||comments='
export NETWORKS_LINE3
NETWORKS_LINE4='netname=sn22_net||net=10.2.2.0||mask=255.255.255.0||mgtifname=eth1||gateway=10.2.2.100||dhcpserver=10.2.2.100||tftpserver=10.2.2.100||nameservers=10.2.2.100||ntpservers=||logservers=||dynamicrange=10.2.2.120-10.2.2.250||staticrange=||staticrangeincrement=||nodehostname=||ddnsdomain=||vlanid=||domain=||mtu=||disable=||comments='
export NETWORKS_LINE4
NODE=xcatsn23
export NODE
NFSSERVER=10.2.0.100
export NFSSERVER
INSTALLNIC=eth0
export INSTALLNIC
PRIMARYNIC=eth0
export PRIMARYNIC
MASTER=10.2.0.100
export MASTER
OSVER=sles11
export OSVER
ARCH=ppc64
export ARCH
PROFILE=service-xcattest
export PROFILE
PROVMETHOD=netboot
export PROVMETHOD
PATH=`dirname $0`:$PATH
export PATH
NODESETSTATE=netboot
export NODESETSTATE
UPDATENODE=1
export UPDATENODE
NTYPE=service
export NTYPE
MACADDRESS=16:3d:05:fa:4a:02
export MACADDRESS
NODEID=EA163d05fa4a02EA
export NODEID
MONSERVER=8.112.154.69
export MONSERVER
MONMASTER=10.2.0.100
export MONMASTER
MS_NODEID=0360238fe61815e6
export MS_NODEID
OSPKGS='kernel-ppc64,udev,sysconfig,aaa_base,klogd,device-mapper,bash,openssl,nfs- utils,ksh,syslog-ng,openssh,openssh-askpass,busybox,vim,rpm,bind,bind-utils,dhcp,dhcpcd,dhcp-server,dhcp-client,dhcp-relay,bzip2,cron,wget,vsftpd,util-linux,module-init-tools,mkinitrd,apache2,apache2-prefork,perl-Bootloader,psmisc,procps,dbus-1,hal,timezone,rsync,powerpc-utils,bc,iputils,uuid-runtime,unixODBC,gcc,zypper,tar'
export OSPKGS
OTHERPKGS1='xcat/xcat-core/xCAT-rmc,xcat/xcat-core/xCATsn,xcat/xcat-dep/sles11/ppc64/conserver,perl-DBD-mysql,nagios/nagios-nsca-client,nagios/nagios,nagios/nagios-plugins-nrpe,nagios/nagios-nrpe'
export OTHERPKGS1
OTHERPKGS_INDEX=1
export OTHERPKGS_INDEX
## get the diskless networks information. There may be no information.
NETMASK=255.255.255.0
export NETMASK
GATEWAY=10.2.0.100
export GATEWAY
# NIC related attributes for the node for confignetwork postscript
NICIPS=""
export NICIPS
NICHOSTNAMESUFFIXES=""
export NICHOSTNAMESUFFIXES
NICTYPES=""
export NICTYPES
NICCUSTOMSCRIPTS=""
export NICCUSTOMSCRIPTS
NICNETWORKS=""
export NICNETWORKS
NICCOMMENTS=
export NICCOMMENTS
# postscripts-start-here
# defaults-postscripts-start-here
run_ps test1
run_ps syslog
run_ps remoteshell
run_ps syncfiles
run_ps confNagios
run_ps configrmcnode
# defaults-postscripts-end-here
# node-postscripts-start-here
run_ps servicenode
run_ps configeth_new
# node-postscripts-end-here
run_ps setbootfromnet
# postscripts-end-here
# postbootscripts-start-here
# defaults-postbootscripts-start-here
run_ps otherpkgs
# defaults-postbootscripts-end-here
# node-postbootscripts-start-here
run_ps test
# The following line node-postbootscripts-end-here must not be deleted.
# node-postbootscripts-end-here
# postbootscripts-end-here
exit $return_value
Using postinstall scripts

While running genimage to generate diskless or statelite osimage, you may want to customize the root image after the package installation step. The postinstall attribute of the osimage definition provides a hook to run user specified script(s), in non-chroot mode, against the directory specified by rootimgdir attribute.

xCAT ships a default postinstall script for the diskless/statelite osimages that must be executed to ensure a successful provisioning of the OS:

lsdef -t osimage -o rhels7.3-ppc64le-netboot-compute -i postinstall
Object name: rhels7.3-ppc64le-netboot-compute
postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall

Customizing the postinstall script, can be done by either one of the methods below:

  • Append your own postinstall scripts

    chdef -t osimage -o <osimage> -p postinstall=/install/custom/postinstall/rh7/mypostscript
    
  • Create your own postinstall script based on the default postinstall script

    cp /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall /install/custom/postinstall/rh7/mypostscript
    # edit /install/custom/postinstall/rh7/mypostscript
    chdef -t osimage -o <osimage> postinstall=/install/custom/postinstall/rh7/mypostscript
    
Common questions about the usage of postinstall scripts:
When do postinstall scripts run?

High level flow of genimage process:

  1. install the packages specified by pkglist into rootimgdir directory
  2. cumstomize the rootimgdir directory
  3. generate the initrd based on the rootimgdir directory

The postinstall scripts are executed in step b).

Do postinstall scripts execute in chroot mode under rootimgdir directory?

No. Unlike postscripts and postbootscripts, the postinstall scripts are run in non-chroot environment, directly on the management node. In the postinstall scripts, all the paths of the directories and files are based on / of the management node. To reference inside the rootimgdir, use the $IMG_ROOTIMGDIR environment variable, exported by genimage.

What are some of the environment variables available to my customized postinstall scripts?

Environment variables, available to be used in the postinstall scripts are listed in postinstall attribute section of linuximage

Synchronizing Files
Add Additional Software Packages
Customize network adapter

This section describes how to configure network adapters with persistent configuration using xCAT. The confignetwork postscript can be used to configure the network interfaces on the compute nodes to support Ethernet adapters, VLAN, BONDs, and BRIDGES.

Configure Additional Network Interfaces - confignetwork

The confignetwork postscript can be used to configure the network interfaces on the compute nodes to support Ethernet adapters, VLAN, BONDs, and BRIDGES. confignetwork can be used in postscripts during OS privision, it can also be executed in updatenode. The way the confignetwork postscript decides what IP address to give the secondary adapter is by checking the nics table, in which the nic configuration information is stored. In order for the confignetwork postscript to run successfully, the following attributes must be configured for the node in the nics table:

  • nicips
  • nictypes
  • nicnetworks

If configuring VLAN, BOND, or BRIDGES, nicdevices in nics table must be configured. VLAN, BOND or BRIDGES is only supported on RHEL.

  • nicdevices - resolves the relationship among the physical network interface devices

The following scenarios are examples to configure Ethernet adapters/BOND/VLAN/Bridge.

Configure routes

There are 2 ways to configure OS route in xCAT:

  • makeroutes: command to add or delete routes on the management node or any given nodes.
  • setroute: script to replace/add the routes to the node, it can be used in postscripts/postbootscripts.

makeroutes or setroute will modify OS temporary route, it also modifies persistent route in /etc/sysconfig/static-routes file.

Before using makeroutes or setroute to configure OS route, details of the routes data such as routename, subnet, net mask and gateway should be stored in routes table.

Notes: the gateway in the networks table assigns gateway from DHCP to compute node, so if use makeroutes or setroute to configure OS static route for compute node, make sure there is no gateway for the specific network in networks table.

Configure routes table
  1. Store default route data in routes table:

    chdef -t route defaultroute net=default mask=255.0.0.0 gateway=10.0.0.101
    
  2. Store additional route data in routes table:

    chdef -t route 20net net=20.0.0.0 mask=255.0.0.0 gateway=0.0.0.0 ifname=eth1
    
  3. Check data in routes table:

    tabdump routes
    #routename,net,mask,gateway,ifname,comments,disable
    "30net","30.0.0.0","255.0.0.0","0.0.0.0","eth2",,
    "20net","20.0.0.0","255.0.0.0","0.0.0.0","eth1",,
    "defaultroute","default","255.0.0.0","10.0.0.101",,,
    
Use makeroutes to configure OS route on xCAT management node
  1. define the names of the routes to be setup on the management node in site table:

    chdef -t site mnroutenames="defaultroute,20net"
    lsdef -t site clustersite -i mnroutenames
        Object name: clustersite
            mnroutenames=defaultroute,20net
    
  2. add all routes from the mnroutenames to the OS route table for the management node:

    makeroutes
    
  3. add route 20net and 30net to the OS route table for the management node:

    makeroutes -r 20net,30net
    
  4. delete route 20net from the OS route table for the management node:

    makeroutes -d -r 20net
    
Use makeroutes to configure OS route for compute node
  1. define the names of the routes to be setup on the compute node:

    chdef -t cn1 routenames="defaultroute,20net"
    
  2. add all routes from the routenames to the OS route table for the compute node:

    makeroutes cn1
    
  3. add route 20net and 30net to the OS route table for the compute node:

    makeroutes cn1 -r 20net,30net
    
  4. delete route 20net from the OS route table for the compute node:

    makeroutes cn1,cn2 -d -r 20net
    
Use setroute to configure OS route for compute node
  1. define the names of the routes to be setup on the compute node:

    chdef -t cn1 routenames="defaultroute,20net"
    
  2. If adding setroute [replace | add] into the node’s postscripts list, setroute will be executed during OS deployment on compute node to replace/add routes from routenames:

    chdef cn1 -p postscripts="setroute replace"
    
  3. Or if the compute node is already running, use updatenode command to run setroute [replace | add] postscript:

    updatenode cn1 -P "setroute replace"
    
Check result
  1. Use route command in xCAT management node to check OS route table.
  2. Use xdsh cn1 route to check compute node OS route table.
Enable kdump Over Ethernet
Overview

kdump is an feature of the Linux kernel that allows the system to be booted from the context of another kernel. This second kernel reserves a small amount of memory and its only purpose is to capture the core dump in the event of a kernel crash. The ability to analyze the core dump helps to determine causes of system failures.

xCAT Interface

The following attributes of an osimage should be modified to enable kdump:

  • pkglist
  • exlist
  • postinstall
  • dump
  • crashkernelsize
  • postscripts
Configure the pkglist file

The pkglist for the osimage needs to include the appropriate RPMs. The following list of RPMs are provided as a sample, always refer to the Operating System specific documentation to ensure the required packages are there for kdump support.

  • [RHELS]

    kexec-tools
    crash
    
  • [SLES]

    kdump
    kexec-tools
    makedumpfile
    
  • [Ubuntu]

    <TODO>
    
Modify the exlist file

The default diskless image created by copycds excludes the /boot directory in the exclude list file, but this is required for kdump.

Update the exlist for the target osimage and remove the line /boot:

./boot*  # <-- remove this line

Run packimage to update the diskless image with the changes.

The postinstall file

The kdump will create a new initrd which is used in the dumping stage. The /tmp or /var/tmp directory will be used as the temporary directory. These two directories are only allocated 10M space by default. You need to enlarge it to 200M. Modify the postinstall file to increase /tmp space.

  • [RHELS]

    tmpfs   /var/tmp    tmpfs   defaults,size=500m   0 2
    
  • [SLES11]

    tmpfs   /tmp    tmpfs   defaults,size=500m       0 2
    
  • [Ubuntu]

    <TODO>
    
The dump attribute

To support kernel dumps, the dump attribute must be set in the osimage definition. If not set, kdump service will not be enabled. The dump attribute defines the NFS remote path where the crash information is to be stored.

Use the chdef command to set a value of the dump attribute:

chdef -t osimage <image name> dump=nfs://<nfs_server_ip>/<kdump_path>

If the NFS server is the Service Node or Management Node, the server can be left out:

chdef -t osimage <image name> dump=nfs:///<kdump_path>

Note

Only NFS is currently supported as a storage location. Make sure the NFS remote path (nfs://<nfs_server_ip>/<kdump_path>) is exported and it is read-writeable on the node where kdump service is enabled.

The crashkernelsize attribute

To allow the Operating System to automatically reserve the appropriate amount of memory for the kdump kernel, set crashkernelsize=auto.

For setting specific sizes, use the following example:

  • For System X machines, set the crashkernelsize using this format:

    chdef -t osimage <image name> crashkernelsize=<size>M
    
  • For Power System AC922, set the crashkernelsize using this format:

    chdef -t osimage <image name> crashkernelsize=<size>M
    
  • For System P machines, set the crashkernelsize using this format:

    chdef -t osimage <image name> crashkernelsize=<size>@32M
    

Note

The value of the crashkernelsize depends on the total physical memory size on the machine. For more about size, refer to Appedix

If kdump start displays error like this:

Your running kernel is using more than 70% of the amount of space you reserved for kdump, you should consider increasing your crashkernel

The crashkernelsize is not large enough, you should increase the crashkernelsize until the error message disappears.

The enablekdump postscript

xCAT provides a postscript enablekdump that can be added to the node definition to automatically start the kdump service when the node boots.

chdef -t node <node range> -p postscripts=enablekdump
Manually trigger a kernel panic on Linux

Normally, kernel panic() will trigger booting into capture kernel. Once the kernel panic is triggered, the node will reboot into the capture kernel, and a kernel dump (vmcore) will be automatically saved to the directory on the specified NFS server (<nfs_server_ip>).

Check your Operating System specific documentation for the path where the kernel dump is saved. For example:

  • [RHELS6]

    <kdump_path>/var/crash/<node_ip>-<time>/
    
  • [SLES11]

    <kdump_path>/<node hostname>/<date>
    

To trigger a dump, use the following commands:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

This will force the Linux kernel to crash, and the address-YYYY-MM-DD-HH:MM:SS/vmcore file should be copied to the location you set on the NFS server.

Dump Analysis

Once the system has returned from recovering the crash, you can analyze the kernel dump using the crash tool.

  1. Locate the recent vmcore dump file.

  2. Locate the kernel file for the crash server. The kernel is under /tftpboot/xcat/netboot/<OS name="">/<ARCH>/<profile>/kernel on the management node.

  3. Once you have located a vmcore dump file and kernel file, call crash:

    crash <vmcore_dump_file> <kernel_file>
    

Note

If crash cannot find any files, make sure you have the kernel-debuginfo package installed.

Installing a New Kernel in the Diskless Image

[TODO : Verify on ppc64le]

Note: This procedure assumes you are using xCAT 2.6.1 or later.

To add a new kernel, create a directory named <kernelver> under /install/kernels directory, and genimage will pick them up from there.

The following examples assume you have the kernel RPM in /tmp and is using a new kernel in the directory /install/kernels/<kernelver>.

The RPM names below are only examples, substitute your specific level and architecture.

  • [RHEL]

The RPM kernel package is usually named: kernel-<kernelver>.rpm. For example, kernel-3.10.0-229.ael7b.ppc64le.rpm means kernelver=3.10.0-229.ael7b.ppc64le.

mkdir -p /install/kernels/3.10.0-229.ael7b.ppc64le
cp /tmp/kernel-3.10.0-229.ael7b.ppc64le.rpm /install/kernels/3.10.0-229.ael7b.ppc64le
createrepo /install/kernels/3.10.0-229.ael7b.ppc64le/

Append kernel directory /install/kernels/<kernelver> in pkgdir of specific osimage.

chdef -t osimage <imagename> -p pkgdir=/install/kernels/3.10.0-229.ael7b.ppc64le/

Run genimage/packimage to update the image with the new kernel. Note: If downgrading the kernel, you may need to first remove the rootimg directory.

genimage <imagename> -k 3.10.0-229.ael7b.ppc64le
packimage <imagename>
  • [SLES]

The RPM kernel package is usually separated into two parts: kernel-<arch>-base and kernel<arch>. For example, /tmp contains the following two RPMs:

kernel-default-3.12.28-4.6.ppc64le.rpm
kernel-default-base-3.12.28-4.6.ppc64le.rpm
kernel-default-devel-3.12.28-4.6.ppc64le.rpm

3.12.28-4.6.ppc64le is NOT the kernel version,3.12.28-4-ppc64le is the kernel version. The “4.6.ppc64le” is replaced with “4-ppc64le”:

mkdir -p /install/kernels/3.12.28-4-ppc64le/
cp /tmp/kernel-default-3.12.28-4.6.ppc64le.rpm /install/kernels/3.12.28-4-ppc64le/
cp /tmp/kernel-default-base-3.12.28-4.6.ppc64le.rpm /install/kernels/3.12.28-4-ppc64le/
cp /tmp/kernel-default-devel-3.12.28-4.6.ppc64le.rpm /install/kernels/3.12.28-4-ppc64le/

Append kernel directory /install/kernels/<kernelver> in pkgdir of specific osimage.

chdef -t osimage <imagename> -p pkgdir=/install/kernels/3.12.28-4-ppc64le/

Run genimage/packimage to update the image with the new kernel. Note: If downgrading the kernel, you may need to first remove the rootimg directory.

Since the kernel version name is different from the kernel rpm package name, the -k flag MUST to be specified on the genimage command.

genimage <imagename> -k 3.12.28-4-ppc64le 3.12.28-4.6
packimage <imagename>
Installing New Kernel Drivers to Diskless Initrd

The kernel drivers in the diskless initrd are used for the devices during the netboot. If you are missing one or more kernel drivers for specific devices (especially for the network device), the netboot process will fail. xCAT offers two approaches to add additional drivers to the diskless initrd during the running of genimage.

Use the ‘-n’ flag to add new drivers to the diskless initrd:

genimage <imagename> -n <new driver list>

Generally, the genimage command has a default driver list which will be added to the initrd. But if you specify the ‘-n’ flag, the default driver list will be replaced with your <new driver list>. That means you need to include any drivers that you need from the default driver list into your <new driver list>.

The default driver list:

rh-x86:   tg3 bnx2 bnx2x e1000 e1000e igb mlx_en virtio_net be2net
rh-ppc:   e1000 e1000e igb ibmveth ehea
rh-ppcle: ext3 ext4
sles-x86: tg3 bnx2 bnx2x e1000 e1000e igb mlx_en be2net
sels-ppc: tg3 e1000 e1000e igb ibmveth ehea be2net
sles-ppcle: scsi_mod libata scsi_tgt jbd2 mbcache crc16 virtio virtio_ring libahci crc-t10dif scsi_transport_srp af_packet ext3 ext4 virtio_pci virtio_blk scsi_dh ahci megaraid_sas sd_mod ibmvscsi

Note: With this approach, xCAT will search for the drivers in the rootimage. You need to make sure the drivers have been included in the rootimage before generating the initrd. You can install the drivers manually in an existing rootimage (using chroot) and run genimage again, or you can use a postinstall script to install drivers to the rootimage during your initial genimage run.

Use the driver rpm package to add new drivers from rpm packages to the diskless initrd. Refer to the Configure Additional Network Interfaces - confignetwork for details.

Accelerating the diskless initrd and rootimg generating

Generating diskless initrd with genimage and compressed rootimg with packimage and liteimg is a time-consuming process, it can be accelerated by enabling parallel compression tool pigz on the management node with multiple processors and cores. See Appendix for an example on packimage performance optimized with pigz enabled.

Enabling the pigz for diskless initrd and rootimg generating

The parallel compression tool pigz can be enabled by installing pigz package on the management server or diskless rootimg. Depending on the method of generating the initrd and compressed rootimg, the steps differ in different Linux distributions.

  • [RHEL]

    The package pigz is shipped in Extra Packages for Enterprise Linux (or EPEL) instead of RedHat iso, this involves some complexity.

    Extra Packages for Enterprise Linux (or EPEL) is a Fedora Special Interest Group that creates, maintains, and manages a high quality set of additional packages for Enterprise Linux, including, but not limited to, Red Hat Enterprise Linux (RHEL), CentOS and Scientific Linux (SL), Oracle Linux (OL).

    EPEL has an epel-release package that includes gpg keys for package signing and repository information. Installing this package for your Enterprise Linux version should allow you to use normal tools such as yum to install packages and their dependencies.

    Refer to the http://fedoraproject.org/wiki/EPEL for more details on EPEL

    1. Enabling the pigz in genimage (only supported in RHEL 7 or above)

      pigz should be installed in the diskless rootimg. Download pigz package from https://dl.fedoraproject.org/pub/epel/ , then customize the diskless osimage to install pigz as the additional packages, see Install Additional Other Packages for more details.

    2. Enabling the pigz in packimage

      pigz should be installed on the management server. Download pigz package from https://dl.fedoraproject.org/pub/epel/ , then install the pigz with yum or rpm.

  • [UBUNTU]

    Make sure the pigz is installed on the management node with the following command:

    dpkg -l|grep pigz
    

    If not, pigz can be installed with the following command:

    apt-get install pigz
    
  • [SLES]

    1. Enabling the pigz in genimage (only supported in SLES12 or above)

      pigz should be installed in the diskless rootimg, since``pigz`` is shipped in the SLES iso, this can be done by adding pigz into the pkglist of diskless osimage.

    2. Enabling the pigz in packimage

      Make sure the pigz is installed on the management node with the following command:

      rpm -qa|grep pigz
      

      If not, pigz can be installed with the following command:

      zypper install pigz
      
Appendix: An example on packimage performance optimization with “pigz” enabled

This is an example on performance optimization with pigz enabled.

In this example, a xCAT command packimage rhels7-ppc64-netboot-compute is run on a Power 7 machine with 4 cores.

The system info:

# uname -a
Linux c910f03c01p03 3.10.0-123.el7.ppc64 #1 SMP Mon May 5 11:18:37 EDT 2014 ppc64 ppc64 ppc64 GNU/Linux

# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.0 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.0"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.0 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.0:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION=7.0

The CPU info:

# cat /proc/cpuinfo
processor       : 0
cpu             : POWER7 (architected), altivec supported
clock           : 3550.000000MHz
revision        : 2.0 (pvr 003f 0200)

processor       : 1
cpu             : POWER7 (architected), altivec supported
clock           : 3550.000000MHz
revision        : 2.0 (pvr 003f 0200)

processor       : 2
cpu             : POWER7 (architected), altivec supported
clock           : 3550.000000MHz
revision        : 2.0 (pvr 003f 0200)

processor       : 3
cpu             : POWER7 (architected), altivec supported
clock           : 3550.000000MHz
revision        : 2.0 (pvr 003f 0200)

timebase        : 512000000
platform        : pSeries
model           : IBM,8233-E8B
machine         : CHRP IBM,8233-E8B

The time spent on packimage with gzip:

# time packimage rhels7-ppc64-netboot-compute
Packing contents of /install/netboot/rhels7/ppc64/compute/rootimg
compress method:gzip


real    1m14.896s
user    0m0.159s
sys     0m0.019s

The time spent on packimage with pigz:

# time packimage rhels7-ppc64-netboot-compute
Packing contents of /install/netboot/rhels7/ppc64/compute/rootimg
compress method:pigz

real    0m23.177s
user    0m0.176s
sys     0m0.016s
Trim diskless rootimg

To reduce the memory and boot-up time for the diskless node, the initrd and rootimg.gz should be kept as compact as possible under the premise of meeting the user’s requirements.

Exclude list

xCAT provides an attribute exlist in the osimage object definition, that allows the user to select files to exclude when building the rootimg.gz file for the diskless node.

Take the osimage sles12.1-ppc64le-netboot-compute for example:

# lsdef -t osimage -o sles12.1-ppc64le-netboot-compute -i exlist
Object name: sles12.1-ppc64le-netboot-compute
    exlist=/opt/xcat/share/xcat/netboot/sles/compute.sles12.ppc64le.exlist
Content of the Exclude List file

The file specified in linuximage.exlist includes relative path of the directories and files that will be excluded from the rootimg.gz generated by packimage. The relative path assumes the rootimg directory, /install/netboot/sles12.1/ppc64le/compute/rootimg here, to be the base directory. [1]

The following is a sample of exlist file

...
./usr/share/X11/locale/*
./usr/lib/perl[0-9]/[0-9.]*/ppc64le-linux-thread-multi/Encode/JP*
+./usr/share/X11/locale/C*
...

The content above presents some syntax supported in exlist file:

  • Exclude files:

    ./usr/share/X11/locale/*
    

    All the files and subdirectories under rootimg/usr/share/X11/locale/ will be excluded.

  • Exclude Files using Patterns [2]:

    ./usr/lib/perl[0-9]/[0-9.]*/ppc64le-linux-thread-multi/Encode/JP*
    

    Use regular expression to easily exclude files. The above example will exclude any Perl library installed under /usr/lib/ matching ppc64le-linux-thread-multi/Encode/JP*

  • Include files:

    +./usr/share/locale/C*
    

    It is useful to include files following an exclude entry to quickly remove a larger set of files using a wildcard and then adding back the few necessary files using the + sign. In the above example, all the files and sub-directories matching the pattern /usr/share/locale/C* will be included in the rootimg.gz file.

Customize the exlist file and the osimage definition

Check the default exlist file and make sure:

  • all files and directories you do not want in the image will be excluded from the rootimg.
  • no file or directory you need will be excluded from the rootimg.

If you want to customize the osimage sles12.1-ppc64le-netboot-compute with your own exlist file, follow the following steps:

#create a customized exlist file based on the default one
cp /opt/xcat/share/xcat/netboot/sles/compute.sles12.ppc64le.exlist /install/custom/netboot/sles/compute.sles12.ppc64le.exlist

#edit the newly created exlist file according to your need
vi /install/custom/netboot/sles/compute.sles12.ppc64le.exlist

#specify the newly created exlist file in the osimage definition
chdef -t osimage -o sles12.1-ppc64le-netboot-compute exlist=/install/custom/netboot/sles/compute.sles12.ppc64le.exlist
[1]The exlist file entry should not end with a slash /, For example, this entry will never match anything: ./usr/lib/perl[0-9]/[0-9.]*/ppc64le-linux-thread-multi/Encode/.
[2]Pattern match test applies to the whole file name, starting from one of the start points specified in the exlist file entry. The regex syntax should comply with the regex syntax of system command find -path, refer to its doc for details.
Enabling the localdisk option

Note

You can skip this section if not using the localdisk option in your litefile table.

Define how to partition the local disk

When a node is deployed, the local hard disk needs to be partitioned and formatted before it can be used. This section explains how provide a configuration file that tells xCAT to partition a local disk and make it ready to use for the directories listed in the litefile table.

The configuration file needs to be specified in the partitionfile attribute of the osimage definition. The configuration file includes several sections:

  • Global parameters to control enabling or disabling the function
  • [disk] section to control the partitioning of the disk
  • [localspace] section to control which partition will be used to store the localdisk directories listed in the litefile table
  • [swapspace] section to control the enablement of the swap space for the node.

An example localdisk configuration file:

enable=yes
enablepart=no

[disk]
dev=/dev/sda
clear=yes
parts=10,20,30

[disk]
dev=/dev/sdb
clear=yes
parts=100M-200M,1G-2G

[disk]
dev=/dev/sdc
ptype=gpt
clear=yes
parts=10,20,30

[localspace]
dev=/dev/sda1
fstype=ext4

[swapspace]
dev=/dev/sda2

The two global parameters enable and enablepart can be used to control the enabling/disabling of the functions:

  • enable: The localdisk feature only works when enable is set to yes. If it is set to no, the localdisk configuration will not be run.
  • enablepart: The partition action (refer to the [disk] section) will be run only when enablepart=yes.

The [disk] section is used to configure how to partition a hard disk:

  • dev: The path of the device file.
  • clear: If set to yes it will clear all the existing partitions on this disk.
  • ptype: The partition table type of the disk. For example, msdos or gpt, and msdos is the default.
  • fstype: The file system type for the new created partitions. ext4 is the default.
  • parts: A comma separated list of space ranges, one for each partition that will be created on the device. The valid format for each space range is <startpoint>-<endpoint> or <percentage of the disk>. For example, you could set it to 100M-10G or 50. If set to 50, 50% of the disk space will be assigned to that partition.

The [localspace] section is used to specify which partition will be used as local storage for the node.

  • dev: The path of the partition.
  • fstype: The file system type on the partition.

the [swapspace] section is used to configure the swap space for the statelite node.

  • dev: The path of the partition file which will be used as the swap space.

To enable the local disk capability, create the configuration file (for example in /install/custom) and set the path in the partitionfile attribute for the osimage:

chdef -t osimage <osimage> partitionfile=/install/custom/cfglocaldisk

Now all nodes that use this osimage (i.e. have their provmethod attribute set to this osimage definition name), will have its local disk configured.

Configure the files in the litefile table

For the files/directories to store on the local disk, add an entry in the litefile table:

"ALL","/tmp/","localdisk",,

Note

you do not need to specify the swap space in the litefile table. Just putting it in the partitionfile config file is enough.

Add an entry in policy table to permit the running of the getpartition command from the node

chtab priority=7.1 policy.commands=getpartition policy.rule=allow

Run genimage and packimage for the osimage

Note

enablepart=yes in partition file will partition the local disk at every boot. If you want to preserve the contents on local disk at next boot, change to enablepart=no after the initial provision. A log file /.sllocal/log/localdisk.log on the target node can be used for debugging.

Generate Diskless Image

The copycds command copies the contents of the Linux media to /install/<os>/<arch> so that it will be available for installing nodes or creating diskless images. After executing copycds, there are several osimage definitions created by default. Run tabdump osimage to view these images:

tabdump osimage

The output should be similar to the following:

"rhels7.1-ppc64le-install-compute",,"compute","linux",,"install",,"rhels7.1-ppc64le",,,"Linux","rhels7.1","ppc64le",,,,,,,,
"rhels7.1-ppc64le-install-service",,"service","linux",,"install",,"rhels7.1-ppc64le",,,"Linux","rhels7.1","ppc64le",,,,,,,,
"rhels7.1-ppc64le-stateful-mgmtnode",,"compute","linux",,"install",,"rhels7.1-ppc64le",,,"Linux","rhels7.1","ppc64le",,,,,,,,
"rhels7.1-ppc64le-netboot-compute",,"compute","linux",,"netboot",,"rhels7.1-ppc64le",,,"Linux","rhels7.1","ppc64le",,,,,,,,

The netboot-compute is the default diskless osimage created rhels7.1 ppc64le. Run genimage to generate a diskless image based on the “rhels7.1-ppc64le-netboot-compute” definition:

genimage rhels7.1-ppc64le-netboot-compute

Before packing the diskless image, you have the opportunity to change any files in the image by changing to the rootimgdir and making modifications. (e.g. /install/netboot/rhels7.1/ppc64le/compute/rootimg).

However it’s recommended that all changes to the image are made via post install scripts so that it’s easily repeatable. Although, instead, we recommend that you make all changes to the image via your postinstall script, so that it is repeatable. Refer to Prescripts and Postscripts for more details.

Pack Diskless Image

After you run genimage to create the image, you can go ahead to pack the image to create the ramdisk:

packimage rhels7.1-ppc64le-netboot-compute
Export and Import Image
Overview

Note: There is a current restriction that exported 2.7 xCAT images cannot be imported on 2.8 xCAT https://sourceforge.net/p/xcat/bugs/3813/. This is no longer a restrictions, if you are running xCAT 2.8.3 or later.

We want to create a system of making xCAT images more portable so that they can be shared and prevent people from reinventing the wheel. While every install is unique there are some things that can be shared among different sites to make images more portable. In addition, creating a method like this allows us to create snap shots of images we may find useful to revert to in different situations.

Image exporting and importing are supported for stateful (diskful) and stateless (diskless) clusters. The following documentation will show how to use imgexport to export images and imgimport to import images.

Exporting an image
1, The user has a working image and the image is defined in the osimage table and linuximage table.

example:

lsdef -t osimage myimage
Object name: myimage
exlist=/install/custom/netboot/sles/compute1.exlist
imagetype=linux
netdrivers=e1000
osarch=ppc64le
osname=Linux
osvers=sles12
otherpkgdir=/install/post/otherpkgs/sles12/ppc64
otherpkglist=/install/custom/netboot/sles/compute1.otherpkgs.pkglist
pkgdir=/install/sles11/ppc64le
pkglist=/install/custom/netboot/sles/compute1.pkglist
postinstall=/install/custom/netboot/sles/compute1.postinstall
profile=compute1
provmethod=netboot
rootimgdir=/install/netboot/sles12/ppc64le/compute1
synclists=/install/custom/netboot/sles/compute1.list
2, The user runs the imgexport command.

example:

imgexport myimage -p node1 -e /install/postscripts/myscript1 -e /install/postscripts/myscript2
(-p and -e are optional)

A bundle file called myimage.tgz will be created under the current directory. The bundle file contains the ramdisk, boot kernel, the root image and all the configuration files for generating the image for a diskless cluster. For diskful, it contains the kickstart/autoyast configuration file. (see appendix). The -p flag puts the names of the postscripts for node1 into the image bundle. The -e flags put additional files into the bundle. In this case two postscripts myscript1 and myscript2 are included. This image can now be used on other systems.

Importing an image
  1. User downloads a image bundle file from somewhere. (Sumavi.com will be hosting many of these).
  2. User runs the imgimport command.

example:

imgimport myimage.tgz -p group1
(-p is optional)

This command fills out the osimage and linuximage tables, and populates file directories with appropriate files from the image bundle file such as ramdisk, boot kernel, root image, configuration files for diskless. Any additional files that come with the bundle file will also be put into the appropriate directories. If -p flag is specified, the postscript names that come with the image will be put the into the postscripts table for the given node or group.

Copy an image to a new image name on the MN

Very often, the user wants to make a copy of an existing image on the same xCAT mn as a start point to make modifications. In this case, you can run imgexport first as described on chapter 2, then run imgimport with -f flag to change the profile name of the image. That way the image will be copied into a different directory on the same xCAT mn.

example:

imgimport myimage.tgz -p group1 -f compute2
Modify an image (optional)

Skip this section if you want to use the image as is.

1, The use can modify the image to fit his/her own need. The following can be modified.

  • Modify .pkglist file to add or remove packages that are from the os distro
  • Modify .otherpkgs.pkglist to add or remove packages from other sources. Refer to Using_Updatenode for details
  • For diskful, modify the .tmpl file to change the kickstart/autoyast configuration
  • Modify .synclist file to change the files that are going to be synchronized to the nodes
  • Modify the postscripts table for the nodes to be deployed
  • Modify the osimage and/or linuximage tables for the location of the source rpms and the rootimage location

2, Run genimage:

genimage image_name

3, Run packimage:

packimage image_name
Deploying nodes

You can change the provmethod of the node to the new image_name if different:

chdef <noderange> provmethod=<image_name>
nodeset <noderange> osimage=<image_name>

and the node is ready to deploy.

Appendix

You can only export/import one image at a time. Each tarball will have the following simple structure:

manifest.xml
<files>
extra/ (optional)
manifest.xml

The manifest.xml will be analogous to an autoyast or windows unattend.xml file where it tells xCAT how to store the items. The following is an example for a diskless cluster:

manifest.xml:

<?xml version="1.0"?>
<xcatimage>
  <exlist>/install/custom/netboot/sles/compute1.exlist</exlist>
  <extra>
    <dest>/install/postscripts</dest>
    <src>/install/postscripts/myscript1</src>
  </extra>
  <imagename>myimage</imagename>
  <imagetype>linux</imagetype>
  <kernel>/install/netboot/sles12/ppc64le/compute1/kernel</kernel>
  <netdrivers>e1000</netdrivers>
  <osarch>ppc64le</osarch>
  <osname>Linux</osname>
  <osvers>sles12</osvers>
  <otherpkgdir>/install/post/otherpkgs/sles12/ppc64</otherpkgdir>
  <otherpkglist>/install/custom/netboot/sles/compute1.otherpkgs.pkglist</otherpkglist>
  <pkgdir>/install/sles12/ppc64le</pkgdir>
  <pkglist>/install/custom/netboot/sles/compute1.pkglist</pkglist>
  <postbootscripts>my4,otherpkgs,my3,my4</postbootscripts>
  <postinstall>/install/custom/netboot/sles/compute1.postinstall</postinstall>
  <postscripts>syslog,remoteshell,my1,configrmcnode,syncfiles,my1,my2</postscripts>
  <profile>compute1</profile>
  <provmethod>netboot</provmethod>
  <ramdisk>/install/netboot/sles12/ppc64le/compute1/initrd-diskless.gz</ramdisk>
  <rootimg>/install/netboot/sles12/ppc64le/compute1/rootimg.gz</rootimg>
  <rootimgdir>/install/netboot/sles12/ppc64le/compute1</rootimgdir>
  <synclists>/install/custom/netboot/sles/compute1.list</synclists>
</xcatimage>

In the above example, we have a directive of where the files came from and what needs to be processed.

Note that even though source destination information is included, all files that are standard will be copied to the appropriate place that xCAT thinks they should go.

Exported files

The following files will be exported, assuming x is the profile name:

For diskful:

x.pkglist
x.otherpkgs.pkglist
x.tmpl
x.synclist

For diskless:

kernel
initrd.gz
rootimg.gz
x.pkglist
x.otherpkgs.pkglist
x.synclist
x.postinstall
x.exlist

Note: Although the postscripts names can be exported by using the -p flag. The postscripts themselves are not included in the bundle file by default. The use has to use -e flag to get them included one by one if needed.

Initialize the Compute for Deployment

XCAT use ‘nodeset’ command to associate a specific image to a node which will be installed with this image.

nodeset <nodename> osimage=<osimage>

There are more attributes of nodeset used for some specific purpose or specific machines, for example:

  • runimage: If you would like to run a task after deployment, you can define that task with this attribute.
  • runcmd: This instructs the node to boot to the xCAT nbfs environment and proceed to configure BMC for basic remote access. This causes the IP, netmask, gateway, username, and password to be programmed according to the configuration table.
  • shell: This instructs the node to boot to the xCAT genesis environment, and present a shell prompt on console. The node will also be able to be sshed into and have utilities such as wget, tftp, scp, nfs, and cifs. It will have storage drivers available for many common systems.

Choose such additional attribute of nodeset according to your requirement, if want to get more information about nodeset, refer to nodeset’s man page.

Start the OS Deployment

Start the deployment involves two key operations. First specify the boot device of the next boot to be network, then reboot the node:

For Power servers, those two operations can be completed by one command rnetboot:

rnetboot <node>

For x86_64 servers, those two operations need two independent commands.

  1. set the next boot device to be from the “network”

    rsetboot <node> net
    
  2. Reboot the xSeries server: ::

    rpower <node> reset
    
Statelite Installation

Overview

This document details the design and setup for the statelite solution of xCAT. Statelite is an intermediate mode between diskful and diskless.

Statelite provides two kinds of efficient and flexible solutions, most of the OS image can be NFS mounted read-only, or the OS image can be in the ramdisk with tmpfs type. Different from the stateless solution, statelite provides a configurable list of directories and files that can be read-write. These read-write directories and files can be configured to either persist or not persist across reboots.

Solutions

There are two solutions: NFSROOT-based and RAMdisk-based.

  1. NFSROOT-based(default):
    1. rootfstype in the osimage xCAT data objects is left as blank, or set to nfs, the NFSROOT-base statelite solution will be enabled.
    2. the ROOTFS is NFS mounted read-only.
  2. RAMdisk-based:
    1. rootfstype in the osimage xCAT data objects is set to ramdisk.
    2. one image file will be downloaded when the node is booting up, and the file will be extracted to the ramdisk, and used as the ROOTFS.

Advantages

Statelite offers the following advantages over xCAT’s stateless (RAMdisk) implementation:

  1. Some files can be made persistent over reboot. This is useful for license files or database servers where some state is needed. However, you still get the advantage of only having to manage a single image.
  2. Changes to hundreds of machines can take place instantly, and automatically, by updating one main image. In most cases, machines do not need to reboot for these changes to take affect. This is only for the NFSROOT-based solution.
  3. Ease of administration by being able to lock down an image. Many parts of the image can be read-only, so no modifications can transpire without updating the central image.
  4. Files can be managed in a hierarchical manner. For example: Suppose you have a machine that is in one lab in Tokyo and another in London. You could set table values for those machines in the xCAT database to allow machines to sync from different places based on their attributes. This allows you to have one base image with multiple sources of file overlay.
  5. Ideal for virtualization. In a virtual environment, you may not want a disk image (neither stateless nor stateful) on every virtual node as it consumes memory and disk. Virtualizing with the statelite approach allows for images to be smaller, easier to manage, use less disk, less memory, and more flexible.

Disadvantages

However, there’re still several disadvantages, especially for the NFSROOT-based solution.

  1. NFS Root requires more network traffic to run as the majority of the disk image runs over NFS. This may depend on your workload, but can be minimized. Since the bulk of the image is read-only, NFS caching on the server helps minimize the disk access on the server, and NFS caching on the client helps reduce the network traffic.
  2. NFS Root can be complex to set up. As more files are created in different places, there are greater chances for failures. This flexibility is also one of the great virtues of Statelite. The image can work in nearly any environment.
Configuration
Statelite configuration is done using the following tables in xCAT:
  • litefile
  • litetree
  • statelite
  • policy
  • noderes
litefile table

The litefile table specifies the directories and files on the statelite nodes that should be read/write, persistent, or read-only overlay. All other files in the statelite nodes come from the read-only statelite image.

  1. The first column in the litefile table is the image name this row applies to. It can be an exact osimage definition name, an osimage group (set in the groups attribute of osimages), or the keyword ALL.

  2. The second column in the litefile table is the full path of the directory or file on the node that you are setting options for.

  3. The third column in the litefile table specifies options for the directory or file:

    1. tmpfs - It provides a file or directory for the node to use when booting, its permission will be the same as the original version on the server. In most cases, it is read-write; however, on the next statelite boot, the original version of the file or directory on the server will be used, it means it is non-persistent. This option can be performed on files and directories.
    2. rw - Same as above. Its name “rw” does NOT mean it always be read-write, even in most cases it is read-write. Do not confuse it with the “rw” permission in the file system.
    3. persistent - It provides a mounted file or directory that is copied to the xCAT persistent location and then over-mounted on the local file or directory. Anything written to that file or directory is preserved. It means, if the file/directory does not exist at first, it will be copied to the persistent location. Next time the file/directory in the persistent location will be used. The file/directory will be persistent across reboots. Its permission will be the same as the original one in the statelite location. It requires the statelite table to be filled out with a spot for persistent statelite. This option can be performed on files and directories.
    4. con - The contents of the pathname are concatenated to the contents of the existing file. For this directive the searching in the litetree hierarchy does not stop when the first match is found. All files found in the hierarchy will be concatenated to the file when found. The permission of the file will be “-rw-r–r–”, which means it is read-write for the root user, but readonly for the others. It is non-persistent, when the node reboots, all changes to the file will be lost. It can only be performed on files. Do not use it for one directory.
    5. ro - The file/directory will be overmounted read-only on the local file/directory. It will be located in the directory hierarchy specified in the litetree table. Changes made to this file or directory on the server will be immediately seen in this file/directory on the node. This option requires that the file/directory to be mounted must be available in one of the entries in the litetree table. This option can be performed on files and directories.
    6. tmpfs,rw - Only for compatibility it is used as the default option if you leave the options column blank. It has the same semantics with the link option, so when adding new items into the _litefile table, the link option is recommended.
    7. link - It provides one file/directory for the node to use when booting, it is copied from the server, and will be placed in tmpfs on the booted node. In the local file system of the booted node, it is one symbolic link to one file/directory in tmpfs. And the permission of the symbolic link is “lrwxrwxrwx”, which is not the real permission of the file/directory on the node. So for some application sensitive to file permissions, it will be one issue to use “link” as its option, for example, “/root/.ssh/”, which is used for SSH, should NOT use “link” as its option. It is non-persistent, when the node is rebooted, all changes to the file/directory will be lost. This option can be performed on files and directories.
    8. link,ro - The file is readonly, and will be placed in tmpfs on the booted node. In the local file system of the booted node, it is one symbolic link to the tmpfs. It is non-persistent, when the node is rebooted, all changes to the file/directory will be lost. This option requires that the file/directory to be mounted must be available in one of the entries in the litetree table. The option can be performed on files and directories.
    9. link,con - Similar to the “con” option. All the files found in the litetree hierarchy will be concatenated to the file when found. The final file will be put to the tmpfs on the booted node. In the local file system of the booted node, it is one symbolic link to the file/directory in tmpfs. It is non-persistent, when the node is rebooted, all changes to the file will be lost. The option can only be performed on files.
    10. link,persistent - It provides a mounted file or directory that is copied to the xCAT persistent location and then over-mounted to the tmpfs on the booted node, and finally the symbolic link in the local file system will be linked to the over-mounted tmpfs file/directory on the booted node. The file/directory will be persistent across reboots. The permission of the file/directory where the symbolic link points to will be the same as the original one in the statelite location. It requires the statelite table to be filled out with a spot for persistent statelite. The option can be performed on files and directories.
    11. localdisk - The file or directory will be stored in the local disk of the statelite node. Refer to the section To enable the localdisk option to enable the ‘localdisk’ support.

Currently, xCAT does not handle the relative links very well. The relative links are commonly used by the system libraries, for example, under /lib/ directory, there will be one relative link matching one .so file. So, when you add one relative link to the litefile table (Not recommend), make sure the real file also be included, or put its directory name into the litefile table.

Note

It is recommended that you specify at least the entries listed below in the litefile table, because most of these files need to be writeable for the node to boot up successfully. When any changes are made to their options, make sure they won’t affect the whole system. If you want to run a command like /bin/ping using non-root users, add this command into litefile, then root user have privilege to authorize the command for non-root users.

Sample Data for Redhat statelite setup

This is the minimal list of files needed, you can add additional files to the litefile table.

#image,file,options,comments,disable
"ALL","/etc/adjtime","tmpfs",,
"ALL","/etc/securetty","tmpfs",,
"ALL","/etc/lvm/","tmpfs",,
"ALL","/etc/ntp.conf","tmpfs",,
"ALL","/etc/rsyslog.conf","tmpfs",,
"ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",,
"ALL","/etc/rsyslog.d/","tmpfs",,
"ALL","/etc/udev/","tmpfs",,
"ALL","/etc/ntp.conf.predhclient","tmpfs",,
"ALL","/etc/resolv.conf","tmpfs",,
"ALL","/etc/yp.conf","tmpfs",,
"ALL","/etc/resolv.conf.predhclient","tmpfs",,
"ALL","/etc/sysconfig/","tmpfs",,
"ALL","/etc/ssh/","tmpfs",,
"ALL","/etc/inittab","tmpfs",,
"ALL","/tmp/","tmpfs",,
"ALL","/var/","tmpfs",,
"ALL","/opt/xcat/","tmpfs",,
"ALL","/xcatpost/","tmpfs",,
"ALL","/etc/systemd/system/multi-user.target.wants/","tmpfs",,
"ALL","/root/.ssh/","tmpfs",,
"ALL","/etc/rc3.d/","tmpfs",,
"ALL","/etc/rc2.d/","tmpfs",,
"ALL","/etc/rc4.d/","tmpfs",,
"ALL","/etc/rc5.d/","tmpfs",,
Sample Data for SLES statelite setup

This is the minimal list of files needed, you can add additional files to the litefile table.

#image,file,options,comments,disable
"ALL","/etc/lvm/","tmpfs",,
"ALL","/etc/ntp.conf","tmpfs",,
"ALL","/etc/ntp.conf.org","tmpfs",,
"ALL","/etc/resolv.conf","tmpfs",,
"ALL","/etc/hostname","tmpfs",,
"ALL","/etc/ssh/","tmpfs",,
"ALL","/etc/sysconfig/","tmpfs",,
"ALL","/etc/syslog-ng/","tmpfs",,
"ALL","/etc/inittab","tmpfs",,
"ALL","/tmp/","tmpfs",,
"ALL","/etc/init.d/rc3.d/","tmpfs",,
"ALL","/etc/init.d/rc5.d/","tmpfs",,
"ALL","/var/","tmpfs",,
"ALL","/etc/yp.conf","tmpfs",,
"ALL","/etc/fstab","tmpfs",,
"ALL","/opt/xcat/","tmpfs",,
"ALL","/xcatpost/","tmpfs",,
"ALL","/root/.ssh/","tmpfs",,
"ALL","/etc/systemd/system/","tmpfs",,
"ALL","/etc/adjtime","tmpfs",,
litetree table

The litetree table controls where the initial content of the files in the litefile table come from, and the long term content of the ro files. When a node boots up in statelite mode, it will by default copy all of its tmpfs files from the .default directory of the root image, for example /install/netboot/rhels7.3/x86_64/compute/rootimg/.default, so there is not required to set up a litetree table. If you decide that you want some of the files pulled from different locations that are different per node, you can use this table.

You can choose to use the defaults and not set up a litetree table.

statelite table

The statelite table specifies location on an NFS server where a nodes persistent files are stored. This is done by entering the information into the statelite table.

In the statelite table, the node or nodegroups in the table must be unique; that is a node or group should appear only once in the first column table. This makes sure that only one statelite image can be assigned to a node. An example would be:

"compute",,"<nfssvr_ip>:/gpfs/state",,

Any nodes in the compute node group will have their state stored in the /gpfs/state directory on the machine with <nfssvr_ip> as its IP address.

When the node boots up, then the value of the statemnt attribute will be mounted to /.statelite/persistent. The code will then create the following subdirectory /.statelite/persistent/<nodename>, if there are persistent files that have been added in the litefile table. This directory will be the root of the image for this node’s persistent files. By default, xCAT will do a hard NFS mount of the directory. You can change the mount options by setting the mntopts attribute in the statelite table.

Also, to set the statemnt attribute, you can use variables from xCAT database. It follows the same grammar as the litetree table. For example:

#node,image,statemnt,mntopts,comments,disable
"cn1",,"$noderes.nfsserver:/lite/state/$nodetype.profile","soft,timeo=30",,

Note: Do not name your persistent storage directory with the node name, as the node name will be added in the directory automatically. If you do, then a directory named /state/cn1 will have its state tree inside /state/cn1/cn1.

Policy

Ensure policies are set up correctly in the Policy Table. When a node boots up, it queries the xCAT database to get the litefile and litetree table information. In order for this to work, the commands (of the same name) must be set in the policy table to allow nodes to request it. This should happen automatically when xCAT is installed, but you may want to verify that the following lines are in the policy table:

chdef -t policy -o 4.7 commands=litefile rule=allow
chdef -t policy -o 4.8 commands=litetree rule=allow
noderes

noderes.nfsserver attribute can be set for the NFSroot server. If this is not set, then the default is the Management Node.

noderes.nfsdir can be set. If this is not set, the default is /install

Provision statelite
Show current provisioning method

To determine the current provisioning method of your node, execute:

lsdef <noderange> -i provmethod

Note: syncfiles is not currently supported for statelite nodes.

Generate default statelite image from distoro media

In this example, we are going to create a new compute node osimage for rhels7.3 on ppc64le. We will set up a test directory structure that we can use to create our image. Later we can just move that into production.

Use the copycds command to copy the appropriate iso image into the /install directory for xCAT. The copycds commands will copy the contents to /install/rhels7.3/<arch>. For example:

copycds RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso

The contents are copied into /install/rhels7.3/ppc64le/

The configuration files pointed to by the attributes are the defaults shipped with xCAT. We will want to copy them to the /install directory, in our example the /install/test directory and modify them as needed.

Statelite Directory Structure

Each statelite image will have the following directories:

/.statelite/tmpfs/
/.default/
/etc/init.d/statelite

All files with link options, which are symbolic links, will link to /.statelite/tmpfs.

tmpfs files that are persistent link to /.statelite/persistent/<nodename>/, /.statelite/persistent/<nodename> is the directory where the node’s individual storage will be mounted to.

/.default is where default files will be copied to from the image to tmpfs if the files are not found in the litetree hierarchy.

Customize your statelite osimage
Create the osimage definition

Setup your osimage/linuximage tables with new test image name, osvers,osarch, and paths to all the files for building and installing the node. So using the above generated rhels7.3-ppc64le-statelite-compute as an example, I am going to create my own image. The value for the provisioning method attribute is osimage in my example.:

mkdef rhels7.3-custom-statelite -u profile=compute provmethod=statelite

Check your setup:

lsdef -t osimage rhels7.3-custom-statelite

Customize the paths to your pkglist, syncfile, etc to the osimage definition, that you require. Note, if you modify the files on the /opt/xcat/share/... path then copy to the appropriate /install/custom/... path. Remember all files must be under /install if using hierarchy (service nodes).

Copy the sample *list files and modify as needed:

mkdir -p /install/test/netboot/rh
cp -p /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.pkglist \
/install/test/netboot/rh/compute.rhels7.ppc64le.pkglist
cp -p /opt/xcat/share/xcat/netboot/rh/compute.exlist \
/install/test/netboot/rh/compute.exlist

chdef -t osimage -o rhels7.3-custom-statelite \
    pkgdir=/install/rhels7.3/ppc64le \
    pkglist=/install/test/netboot/rh/compute.rhels7.ppc64le.pkglist \
    exlist=/install/test/netboot/rh/compute.exlist \
    rootimgdir=/install/test/netboot/rh/ppc64le/compute
Setup pkglists

In the above example, you have defined your pkglist to be in /install/test/netboot/rh/compute.rhels7.ppc64le.pkglist.

Edit compute.rhels7.ppc64le.pkglist and compute.exlist as needed.

vi /install/test/netboot/rh/compute.rhels7.ppc64le.pkglist
vi /install/test/netboot/rh/compute.exlist

Make sure nothing is excluded in compute.exlist that you need.

Install other specific packages

Make the directory to hold additional rpms to install on the compute node.

mkdir -p /install/test/post/otherpkgs/rh/ppc64le

Now copy all the additional OS rpms you want to install into /install/test/post/otherpkgs/rh/ppc64le.

At first you need to create one text file which contains the complete list of files to include in the repository. The name of the text file is rpms.list and must be in /install/test/post/otherpkgs/rh/ppc64le directory. Create rpms.list:

cd /install/test/post/otherpkgs/rh/ppc64le
ls *.rpm > rpms.list

Then, run the following command to create the repodata for the newly-added packages:

createrepo -i rpms.list /install/test/post/otherpkgs/rh/ppc64le

The createrepo command with -i rpms.list option will create the repository for the rpm packages listed in the rpms.list file. It won’t destroy or affect the rpm packages that are in the same directory, but have been included into another repository.

Or, if you create a sub-directory to contain the rpm packages, for example, named other in /install/test/post/otherpkgs/rh/ppc64le. Run the following command to create repodata for the directory /install/test/post/otherpkgs/rh/ppc64le.

createrepo /install/post/otherpkgs/<os>/<arch>/**other**

Note: Replace other with your real directory name.

Define the location of of your otherpkgs in your osimage:

chdef -t osimage -o rhels7.3-custom-statelite \
otherpkgdir=/install/test/post/otherpkgs/rh/ppc64le \
otherpkglist=/install/test/netboot/rh/compute.otherpkgs.pkglist

There are examples under /opt/xcat/share/xcat/netboot/<platform> of typical *otherpkgs.pkglist files that can used as an example of the format.

Set up Post scripts for statelite

The rules to create post install scripts for statelite image is the same as the rules for stateless/diskless install images.

There’re two kinds of postscripts for statelite (also for stateless/diskless).

The first kind of postscript is executed at genimage time, it is executed again the image itself on the MN . It was setup in The postinstall file section before the image was generated.

The second kind of postscript is the script that runs on the node during node deployment time. During init.d timeframe, /etc/init.d/gettyset calls /opt/xcat/xcatdsklspost that is in the image. This script uses wget to get all the postscripts under mn:/install/postscripts and copy them to the /xcatpost directory on the node. It uses openssl or stunnel to connect to the xcatd on the mn to get all the postscript names for the node from the postscripts table. It then runs the postscripts for the node.

Setting up postinstall files (optional)

Using postinstall files is optional. There are some examples shipped in /opt/xcat/share/xcat/netboot/<platform>.

If you define a postinstall file to be used by genimage, then

chdef -t osimage -o rhels7.3-custom-statelite postinstall=<your postinstall file path>.
</