In the end the only reply I got was from our Sun partner, Martin Pre_laber, and thankfully through his several further suggestions we found an answer. To get a script in the cluster framework, specifically in our case one that starts and stops TSM's dsm scheduler, several steps were needed. The most critical for me was to stop following the tsm manual where it was telling me that all scripts for starting and stopping the tsm scheduler plus all configurations files *must* be on shared storage. This simply doesn't work. The dsm.opt file for each TSM node (note that a TSM node is different to, and *not* a cluster node!) can and generally should be on shared storage, mainly for consistency. The scripts for starting, stopping and probing the tsm services however need to be local and present on every node at all times. This availability of the scripts is what the cluster framework needs in order to add the resource into the cluster. If the script wasn't available on all nodes when I tried to create the resource, cluster spat the dummy... After setting up the scripts and manually testing the tsm client to make sure the configuration is correct on all nodes, it is possible to add a new resource to the cluster of type SUNW.gds - a general data service. To add the scripts as a gds resource into the cluster, the following command does the job: # clrs create -g www-rg -t SUNW.gds -p Start_command="/etc/init.d/dsm.scheduler.cluster.sh /zones/webdata/tsm/dsm.opt start" -p Probe_command="/etc/init.d/dsm.scheduler.cluster.sh webdata probe" -p Stop_command="/etc/init.d/dsm.scheduler.cluster.sh webdata stop" -p Network_aware=false webdata-backup-rs So in this example, the script /etc/init.d/dsm.scheduler.cluster.sh is on local storage on all nodes and is identical across all nodes. The script is below. The file /zones/webdata/tsm/dsm.opt is on shared storage and switches between nodes in the event of a failover. When the rg starts on a different node, the script is run and the resource comes online. Curiously, the dsmcad daemon process doesn't need to be killed in the event of a failover, the cluster framework seems to take care of this, killing the process and allowing a clean failover. Also, making the resource not network aware removed the need for a logical hostname for the resource group. The script to start, stop, and probe the dsm client is below. It could definitely be done better, however it works. Also, what I've noticed, it may also be possible to directly start and stop the scheduler process, dsmc, using the script. I haven't tried this, however I'm sure it would work. Note that I include this script for informational purposes only, I don't promise that it will work for you ;-) #!/bin/ksh # Generally, we should start up with something like this: # /opt/tivoli/tsm/client/ba/bin/dsmcad -optfile=/zstorage/build-test/tsm/dsm.opt # set the necessary environment variables so that TSM doesn't vomit LC_CTYPE="en_US" export LC_CTYPE LANG="en_US" export LANG LC_LANG="en_US" export LC_LANG LC_ALL="en_US" export LC_ALL # work out which argument is the command and which the config file case "$1" in 'start'|'stop'|'probe') COMMAND=$1 DSM_CONFIG=$2 ;; *) COMMAND=$2 DSM_CONFIG=$1 esac # now check what we want to do. case "$COMMAND" in 'start') # echo "starting" # There has to be a better way to do this test....... if test -f $DSM_CONFIG ; then true else echo "Config file $DSM_CONFIG does not exist, exiting." exit 1 fi export DSM_CONFIG # Check if there is already a dsmcad process running, if so, ignore the start command PS=`ps -ef | grep -v grep | grep -v vi | grep -v probe | grep -v zoneadmd | grep -v "dsm.scheduler.cluster.sh" | grep -c "$DSM_CONFIG"` if test "$PS" -eq "1" ; then echo "dsmcad is already started for $DSM_CONFIG, will not start another." ps -ef | grep -v grep | grep -v vi | grep -v probe | grep -v zoneadmd | grep -v "dsm.scheduler.cluster.sh" | grep "$DSM_CONFIG" exit 0 elif test "$PS" -gt "1" ; then echo "Seems to be too many processes running for dsmcad for $DSM_CONFIG, please check it." exit 1 fi /opt/tivoli/tsm/client/ba/bin/dsmcad -optfile=$DSM_CONFIG if test "$?" -ne "0" ; then echo "Failed to start the dsm scheduler, exiting" exit 1 fi ;; 'stop') # echo "stopping" # For the most part, we ignore a stop command as the dsmcad should work out itself # that it has to stop it's child process when the directory with it's password # isn't available. exit 0 ;; 'probe') # echo "probing" # WARNING: The following would produce a bug if "vi" is in the arguments... # So make sure you avoid it, OK? PS=`ps -ef | grep -v grep | grep -v vi | grep -v probe | grep -v zoneadmd | grep -c "$DSM_CONFIG"` if test "$PS" -gt "0" ; then # echo "Found $PS processes" exit 0 else echo "Found no processes" exit 1 fi ;; *) # otherwise an invalid command was received, vomit. echo "options { start | stop | probe }" exit 1 esac So I hope I've written something that is useful. If anyone has questions, feel free to contact me. regards On Thursday 14 August 2008, 17:07 Markus Mayer wrote: > Hi all, > > I've been pulling my hair out on this one for a few days now, even with > support from our Sun partner, we have not come up with a solution. > > I have a cluster, Sun Cluster 3.2 on two V445's, five resource groups each > containing an own zpool, and a number of zones. Each zpool and the zones > are configured as a resouce within the group, as is necessary for cluster. > Each resource group is configured for failover operations. From the > cluster view, everything works as it should. > > Enter the desire to make a backup with TSM. Backup services will be run > from the global zone. According to the TSM manual (IBM TSM for unix and > linux, backup-archive clients installation and user's guide, page 543-549) > we need to have an own TSM server node for each shared disk resource to > back up the shared resources. This is configured. Each TSM client node > will backup the data only on the shared disks within each resource group. > > >>From the client side, cluster, we need a simple script that runs as a > >> resource > > within the resource group. This script meets the requirements of cluster, > having exit values of 0, 100 and 201 depending on circumstances, and the > functions start, stop, and probe. As required by TSM, this script resides > on shared storage that switches between nodes, in our case an own zfs file > system on the zpool. When a failover occurs, the script should be started > (backup service/resource brought online) in the same way that any other > resource within the group would be started or brought online. > > Therein lies the problem. How can I define a resource that is a simple > shell script or program, which should then be added to an existing resource > group in cluster? It sounds simple enough, but it would seem it's not > so... > > Our Sun partner gave me the following link to follow, which I did. > http://docs.sun.com/app/docs/doc/819-2972/gds-25?a=view > In short, it says enbable SUNW.gds (already done), create a resourcegroup > that will contain the resource and failover service itself, create a > logical hostname, then the resource. This is where some confusion comes in > for me. > > I already have resource groups defined, one being comms-rg containing two > resources, comms-storage-rs and commssuite-zone-rs. The "backup" resource, > named for example comms-backup-rs, from my point of view should then come > into this resource group. If I try to add a logical hostname to this > resourcegroup, I get an error: > > # clreslogicalhostname create -g comms-rg commslhname > clreslogicalhostname: commslhname cannot be mapped to an IP address. > > So as suggested by our Sun patner, I tried adding an IP address for the > logical host name and putting it in the /etc/inet/hosts files on both > nodes. The result was: > > # clreslogicalhostname create -g comms-rg commslhname > clreslogicalhostname: specified hostname(s) cannot be hosted by any > adapter on wallaby > clreslogicalhostname: Hostname(s): commslhname > > getent returned valid information on both nodes. > # getent hosts 172.16.241.54 > 172.16.241.54 commslhname commslhname.nowhere.nothing.invalid > > OK, so it seems that I have to define a new resource group especially for > this one resource which contains one simple script, which makes no sence to > me because I already have a resource group into which the resource should > go. Why then can't I add this new script as a resource in an existing > resource group? The problem here is too, that I need to define an > additional resource group for every other resource group that I have, > currently five, meaning a total of ten resource groups, all of which need > affinities in order to correctly fail over and start the resources. > Additionally, the backup resource needs, according to the manual, to have > network resources defined, and a port list defines, although it needs only > to start a shell script. > > It seems much more complicated than it should be. I find nothing else in > the documentation about this, but it has to be simple, I can't imagine that > it could be so complicated.... > > The alternative, should such a resource definition not be possible, is to > have a TSM client in every zone, and one in the global zone of each node. > This is however not what I'm looking for. > > Could it be that I'm barking up the wrong tree here? Does anyone have any > suggestions as to how I can achieve this? > > Thanks > Markus > _______________________________________________ > sunmanagers mailing list > sunmanagers@sunmanagers.org > http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Tue Sep 2 10:45:20 2008
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:12 EST