NetWorker: Red Hat 클러스터 서비스 문제에 대한 문제 해결 가이드

Summary: 이 문서에서는 Red Hat pcs(Pacemaker) 클러스터에 구축된 NetWorker 서버의 NetWorker 서비스 시작 문제에 접근하는 방법을 개괄적으로 설명합니다. 이 문서는 NetWorker 백업 관리자 및 NetWorker 지원을 통해 이러한 문제를 해결하는 데 적합합니다.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Instructions

NetWorker 서버는 심박 조정기(pcs) 서비스를 사용하여 Red Hat 노드의 클러스터 페일오버 구성으로 구축할 수 있습니다. 이 구성 유형에서 NetWorker는 2개 이상의 노드에 설치되며 NetWorker 서버 데이터베이스는 심박 조정기에서 "활성" 노드인 노드에 따라 노드 간에 전달되는 공유 스토리지 위치에 상주합니다. NetWorker 서버는 공유 클러스터 이름 및 IP 주소를 사용하여 서비스를 호스팅하는 노드에 관계없이 명명 및 주소 지정이 일관되도록 합니다. 클러스터에서 NetWorker를 설정하는 방법에 대한 자세한 내용은 NetWorker 클러스터 통합 가이드를 참조하십시오. 이 가이드는 Dell 지원 제품 페이지에서 확인할 수 있습니다.

클러스터 토폴로지:

이 문서에서는 NetWorker 클러스터 토폴로지와 같은 구성

의 클러스터 예를 사용합니다.

호스트 이름	IP 주소	기능
lnx-node1.amer.lan	192.168.9.108	물리적 노드 1
lnx-node2.amer.lan	192.168.9.109	물리적 노드 2
lnx-nwcluster.amer.lan	192.168.9.110	NetWorker에서 사용하는 논리적 이름

노드의 파일 시스템은 심볼 링크를 사용하여 NetWorker를 관리합니다.

활성 노드:

NetWorker 서버가 시작되는 활성 노드는 /nsr를 공유 스토리지 위치에 심볼로 연결합니다.

root@lnx-node1:~# ls -l / | grep nsr
lrwxrwxrwx.   1 root root     14 Oct  5 10:49 nsr -> /nsr_share/nsr
drwxr-xr-x.  11 root root    116 Aug 31 17:20 nsr.NetWorker.local
drwxr-xr-x.   3 root root     17 Aug 31 17:23 nsr_share

패시브 노드:

"패시브" 노드는 /nsr를 /nsr에 심볼로 연결합니다. NetWorker.local:

root@lnx-node2:~# ls -l / | grep nsr
lrwxrwxrwx.   1 root root     20 Oct  3 17:08 nsr -> /nsr.NetWorker.local
drwxr-xr-x.  11 root root    116 Aug 31 17:19 nsr.NetWorker.local
drwxr-xr-x.   2 root root      6 Aug 31 17:18 nsr_share

노드가 패시브 상태인 경우 nsrexecd(NetWorker Client) 소프트웨어는 항상 /nsr을 사용하여 실행됩니다. NetWorker.local. 각 물리적 노드에는 물리적 노드의 DNS 확인 가능한 이름과 IP 주소를 사용하여 자체 클라이언트 리소스가 있습니다. NetWorker 서버는 공유 스토리지(/nsr_share)를 사용하여만 실행되며 공유 IP 주소와 호스트 이름을 사용합니다. 한 번에 하나의 노드에서만 활성화할 수 있습니다.

다음 심박 조정기(pcs) 명령을 사용하여 심박 조정기 구성 및 상태를 개괄적으로 설명합니다.

클러스터 구성:

pcs status

예:

root@lnx-node1:~# pcs status 
Cluster name: rhelclus 
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-10-05 10:59:19 -04:00) 
Cluster Summary: 
  * Stack: corosync 
  * Current DC: lnx-node1.amer.lan (version 2.1.5-9.3.el8_8-a3f44794f94) - partition with quorum 
  * Last updated: Thu Oct 5 10:59:20 2023 
  * Last change: Thu Oct 5 10:59:13 2023 by root via cibadmin on lnx-node1.amer.lan 
  * 2 nodes configured 
  * 3 resource instances configured 

Node List: 
  * Online: [ lnx-node1.amer.lan lnx-node2.amer.lan ] 

Full List of Resources: 
  * Resource Group: NW_group: 
    * fs (ocf::heartbeat:Filesystem): Started lnx-node1.amer.lan 
    * ip (ocf::heartbeat:IPaddr): Started lnx-node1.amer.lan 
    * nws (ocf::EMC_NetWorker:Server): Started lnx-node1.amer.lan 

Daemon Status: 
  corosync: active/enabled 
  pacemaker: active/enabled 
  pcsd: active/enabled

위의 출력에서 클러스터에 있는 노드 수와 오프라인 상태 또는 대기 상태인 노드 수를 확인할 수 있습니다. 출력에는 공유 파일 시스템(fs), 클러스터 리소스 IP 주소(ip) 및 NetWorker 서비스(nws)를 호스팅하는 노드도 표시됩니다. 여기에 사용된 리소스 이름은 NetWorker 클러스터 통합 가이드에 사용되는 기본값입니다. 하지만 서로 다른 이름을 사용할 수 있습니다. 다른 이름을 사용하는 경우 이 문서의 지침에 따라 리소스 이름을 기록하고 필요에 따라 교체하십시오.

Pacemaker 리소스 구성:

pcs resource config

예:

root@lnx-node1:~# pcs resource config 
Group: NW_group 
  Resource: fs (class=ocf provider=heartbeat type=Filesystem)
    Attributes: fs-instance_attributes 
      device=/dev/sdb1 
      directory=/nsr_share 
      fstype=xfs 
    Operations: 
      monitor: fs-monitor-interval-20 
        interval=20 
        timeout=300 
      start: fs-start-interval-0s 
        interval=0s 
        timeout=60s 
      stop: fs-stop-interval-0s interval=0s timeout=60s 
  Resource: ip (class=ocf provider=heartbeat type=IPaddr) 
    Attributes: ip-instance_attributes 
      cidr_netmask=24 
      ip=192.1xx.9.1x0 
      nic=ens192 
    Operations: 
      monitor: ip-monitor-interval-15 
        interval=15 
        timeout=120 
      start: ip-start-interval-0s 
        interval=0s 
        timeout=20s 
      stop: ip-stop-interval-0s 
        interval=0s 
        timeout=20s 
  Resource: nws (class=ocf provider=EMC_NetWorker type=Server) 
    Meta Attributes: nws-meta_attributes 
      is-managed=true 
    Operations: 
      meta-data: nws-meta-data-interval-0 
        interval=0 
        timeout=10 
      migrate_from: nws-migrate_from-interval-0 
        interval=0 
        timeout=120
      migrate_to: nws-migrate_to-interval-0 
        interval=0 
        timeout=60 
      monitor: nws-monitor-interval-100 
        interval=100 
        timeout=1200 
      start: nws-start-interval-0 
        interval=0 
        timeout=600 
      stop: nws-stop-interval-0 
        interval=0 
        timeout=600 
      validate-all: nws-validate-all-interval-0 
        interval=0 
        timeout=10

위의 명령은 각 PCs 리소스 구성에 대해 자세히 설명합니다. 초기 개요에서 기록해야 할 중요한 사항:

FS 리소스 "device=": 노드 파일 시스템의 공유 스토리지에 대한 마운트 지점으로 사용되는 디바이스입니다. 이 디바이스는 각 노드에서 동일해야 합니다. 이 내용은 이 KB에서 나중에 설명합니다.
FS 리소스 "directory=": 공유 NetWorker 스토리지에서 사용하는 디렉토리입니다. 디렉토리는 "device=" 필드의 마운트 지점으로 연결되어야 합니다. 이 내용은 이 KB에서 나중에 설명합니다.
IP 리소스 "ip=": NetWorker 서버에서 사용하는 논리적(공유) 호스트 이름과 연결된 IP 주소입니다. 이 IP 주소는 활성 노드에서 호스팅됩니다.

공유 주소 및 스토리지에 대한 Pacemaker 가시성:

lcmap

예:

root@lnx-node1:~# lcmap
type: NSR_CLU_TYPE;
clu_type: NSR_LC_TYPE;
interface version: 1.0;

type: NSR_CLU_VIRTHOST;
hostname: 192.168.9.110;
local: TRUE;
owned paths: /nsr_share;

clu_nodes: lnx-node1.amer.lan lnx-node2.amer.lan;

참고: 호스트 이름은 pcs 리소스 구성 "ip=" 필드에서 일치하는 IP 주소를 반환해야 합니다. 소유 경로는 pcs 리소스 구성 "directory=" 필드와 일치해야 합니다. 경우에 따라 시작 문제가 발견되면 lcmap 명령이 호스트 이름, 로컬 또는 소유 경로 필드를 반환하지 않습니다. 이는 문제를 나타냅니다.

초기 진단:

NetWorker 서비스가 시작되지 않는 경우 pcs 리소스 상태를 확인하여 실패하는 리소스를 확인합니다.

pcs status

예:

root@lnx-node1:~# pcs status 
... 
... 
Node List: 
  * Online: [ lnx-node1.amer.lan lnx-node2.amer.lan ] 

Full List of Resources: 
  * Resource Group: NW_group: 
    * fs    (ocf::heartbeat:Filesystem):   Started lnx-node1.amer.lan 
    * ip    (ocf::heartbeat:IPaddr):       Started lnx-node1.amer.lan 
    * nws   (ocf::EMC_NetWorker:Server):   Started lnx-node1.amer.lan 

Daemon Status: 
  corosync: active/enabled 
  pacemaker: active/enabled 
  pcsd: active/enabled

오류가 관찰되면 일반적인 장애 오류가 반환됩니다. 실패한 리소스가 FAILED로 표시됩니다.

FS(파일 시스템): 파일 시스템이 실패 상태인 경우 파일 시스템 장애에 대한 아래 섹션을 참조하십시오.
IP(IPaddr): IPaddr가 실패 상태인 경우 IPaddr 장애에 대한 아래 섹션을 참조하십시오.
NWS(서버): NetWorker 서버가 장애 상태인 경우 다음을 수행합니다.

NetWorker 서버의 daemon.raw에서 시작 중에 나타나는 오류 메시지를 검토합니다. 서버의 /nsr_share/nsr/daemon.raw는 공유 스토리지 경로에 있습니다. 물리적 노드 클라이언트 데몬이 /nsr에 있습니다. NetWorker.local/logs/daemon.raw. Dell 문서 NetWorker를 참조하십시오. nsr_render_log 사용 방법
기본 로깅이 충분하지 않은 경우 다음으로 디버그를 활성화합니다.
1. "서버" 리소스를 재시작합니다.

pcs resource cleanup nws

dbgcommand를 사용하여 nsrd 프로세스에서 디버그를 활성화합니다.

dbgcommand -n nsrd Debug=#

숫자 1~9를 사용하여 디버그 레벨을 설정합니다. 문제가 발생할 수 있는 추가 메시지가 있는지 daemon.raw를 모니터링합니다.

/var/log/pcsd/pcsd.log에서 오류가 있는지 검토합니다.
/var/log/pacemaker/pacemaker.log에서 오류가 있는지 검토합니다.
/var/log/messages 파일을 검토하여 오류가 있는지 확인합니다.

참고: pcsd, Pacemaker 및 메시지 로그를 검토할 때 NetWorker 서비스가 시작하려고 시도한 동일한 기간 동안 기록된 메시지를 찾습니다. 서비스 시작 실패와 일치하는 오류/오류를 검토합니다.

파일 시스템 장애:

Pacemaker 리소스 검토:

pcs resource

파일 시스템 리소스에 대한 심박 조정기 리소스 구성을 검토합니다.

pcs resource fs

예:

디바이스 경로, 디렉토리 경로 및 fstype을 기록합니다.

root@lnx-node1:~# pcs resource
  * Resource Group: NW_group:
    * fs        (ocf::heartbeat:Filesystem):     Started lnx-node1.amer.lan
    * ip        (ocf::heartbeat:IPaddr):         Started lnx-node1.amer.lan
    * nws       (ocf::EMC_NetWorker:Server):     Started lnx-node1.amer.lan
root@lnx-node1:~# pcs resource config fs
Resource: fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: fs-instance_attributes
    device=/dev/sdb1
    directory=/nsr_share
    fstype=xfs
  Operations:
    monitor: fs-monitor-interval-20
      interval=20
      timeout=300
    start: fs-start-interval-0s
      interval=0s
      timeout=60s
    stop: fs-stop-interval-0s
      interval=0s
      timeout=60s

디바이스가 FS에 마운트되어 있는지 확인합니다.

df -h

예:

root@lnx-node1:~# df -h | grep /nsr_share /dev/sdb1                                     94G  1.5G   92G   2% /nsr_share

마운트 지점이 올바르게 구성되어 있는지 확인합니다. 디바이스를 경로와 연결:

lsblk

예:

root@lnx-node1:~# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda             8:0    0   40G  0 disk
├─sda1          8:1    0  600M  0 part /boot/efi
├─sda2          8:2    0    1G  0 part /boot
└─sda3          8:3    0 38.4G  0 part
  ├─rhel-root 253:0    0 34.4G  0 lvm  /
  └─rhel-swap 253:1    0    4G  0 lvm  [SWAP]
sdb             8:16   0  100G  0 disk
└─sdb1          8:17   0 93.1G  0 part /nsr_share
sr0            11:0    1 1024M  0 rom

디바이스에서 사용하는 파일 시스템이 올바른지 확인합니다.

blkid

예:

root@lnx-node1:~# blkid 
/dev/mapper/rhel-root: UUID="7cf2f957-18d8-45b8-bf8f-6361aadc3517" BLOCK_SIZE="512" TYPE="xfs" 
/dev/sda3: UUID="QpZ2hK-OuE2-igN0-Ryba-EwMN-uxq1-LE48hD" TYPE="LVM2_member" PARTUUID="1193db91-4b63-4b33-a4d4-03a22317e064" 
/dev/sda1: UUID="F243-AD41" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6c81bd63-0249-4bdf-afdb-cdde72034162" 
/dev/sda2: UUID="7677ad6b-8191-4a45-8a8a-16cf7d00d72c" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="57481b7a-83ec-4cd8-bf2d-bca09ac27040" 
/dev/sdb1: UUID="600bca60-dd5d-4162-bf77-0537daa3b1e5" BLOCK_SIZE="512" TYPE="xfs" PARTLABEL="networker" PARTUUID="769aaac2-764b-431d-be21-3b5753d6a5d3" 
/dev/mapper/rhel-swap: UUID="537962b6-07d4-4a40-9687-deab2e488936" TYPE="swap"

fs(Filesystem) 리소스가 시작되지 않는 경우 이는 NetWorker 외부의 문제를 나타냅니다. 클러스터의 시스템 관리자는 클러스터의 파일 시스템 구성을 검토하고 심박 조정기가 사용하는 공유 스토리지에 문제가 없는지 확인해야 합니다. 시스템 또는 해당 디바이스의 장애와 관련하여 추가 시스템 로그를 검토합니다.

/var/log/pcsd/pcsd.log
/var/log/pacemaker/pacemaker.log
/var/log/messages

IPaddr 장애:

Pacemaker 리소스 검토:

pcs resource

파일 시스템 리소스에 대한 심박 조정기 리소스 구성을 검토합니다.

pcs resource config ip

예:

IP 주소 및 NIC를 기록합니다.

root@lnx-node1:~# pcs resource
  * Resource Group: NW_group:
    * fs (ocf::heartbeat:Filesystem): Started lnx-node1.amer.lan
    * ip (ocf::heartbeat:IPaddr): Started lnx-node1.amer.lan
    * nws (ocf::EMC_NetWorker:Server): Started lnx-node1.amer.lan
root@lnx-node1:~# pcs resource config ip
Resource: ip (class=ocf provider=heartbeat type=IPaddr)
  Attributes: ip-instance_attributes
    cidr_netmask=24
    ip=192.1xx.9.1x0
    nic=ens192
  Operations:
    monitor: ip-monitor-interval-15
      interval=15
      timeout=120
    start: ip-start-interval-0s
      interval=0s
      timeout=20s stop:
    ip-stop-interval-0s
      interval=0s
      timeout=20s

시스템에서 NIC를 사용할 수 있는지 확인합니다.

ifconfig -a

예:

root@lnx-node1:~# ifconfig -a 
ens192: flags=4163 mtu 1500
        inet 192.1xx.9.1x8 netmask 255.255.255.0 broadcast 192.1xx.9.255
        inet6 fe80::250:56ff:fea5:48e1 prefixlen 64 scopeid 0x20
        ether 00:50:56:a5:48:e1 txqueuelen 1000 (Ethernet)
        RX packets 953865 bytes 349705527 (333.5 MiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 1190983 bytes 179749786 (171.4 MiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 
lo: flags=73 mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0 
        inet6 ::1 prefixlen 128 scopeid 0x10
        loop txqueuelen 1000 (Local Loopback)
        RX packets 129798 bytes 13274289 (12.6 MiB)
        RX errors 0 dropped 0 overruns 0 frame 0 
        TX packets 129798 bytes 13274289 (12.6 MiB) 
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ifconfig에 표시된 IP 주소가 물리적 노드 이름과 일치합니다. 하지만 노드가 활성 상태일 때 클러스터링된 IP는 이 NIC를 통해 연결할 수 있습니다. 두 노드가 동일한 NIC 이름을 사용하도록 구성되어 있는지 확인합니다.

IP 주소가 NetWorker 서버에서 사용하는 올바른(논리적) 호스트 이름으로 확인됩니까?

nslookup ip 

nslookup logical_name_FQDN 

nslookup logical_name_short

예:

root@lnx-node1:~# nslookup 192.1xx.9.1x0 
110.9.1xx.1x2.in-addr.arpa name = lnx-nwcluster.amer.lan. 

root@lnx-node1:~# nslookup lnx-nwcluster.amer.lan. 
Server: 192.1xx.9.1x0 
Address: 192.1xx.9.100#53 

Name: lnx-nwcluster.amer.lan 
Address: 192.1xx.9.1x0 

root@lnx-node1:~# nslookup lnx-nwcluster 
Server: 192.1xx.9.1x0 
Address: 192.1xx.9.100#53 

Name: lnx-nwcluster.amer.lan 
Address: 192.1xx.9.1x0

또한 물리적 노드의 IP 주소, FQDN 및 단축 이름에 대해 동일한 단계를 수행하는 것이 좋습니다. DNS 및 이름 확인 문제 해결 Dell 문서를 참조하십시오.

ping을 사용하여 클러스터 IP 주소에 연결할 수 있습니까?

ping -c 4 ip

예:

root@lnx-node1:~# ping -c 4 192.1xx8.9.1x0 
PING 192.1xx8.9.1x0 (192.1xx.9.1x0) 56(84) bytes of data. 
64 bytes from 192.1xx.9.1x0: icmp_seq=1 ttl=64 time=0.051 ms 
64 bytes from 192.1xx.9.1x0: icmp_seq=2 ttl=64 time=0.043 ms 
64 bytes from 192.1xx.9.1x0: icmp_seq=3 ttl=64 time=0.033 ms 
64 bytes from 192.1xx.9.1x0: icmp_seq=4 ttl=64 time=0.034 ms 

--- 192.1xx.9.1x0 ping statistics --- 4 packets transmitted, 
4 received, 0% packet loss, time 3108ms rtt min/avg/max/mdev = 0.033/0.040/0.051/0.008 ms

IP(IPaddr) 리소스가 시작되지 않은 경우 이는 NetWorker 외부의 문제를 나타냅니다. 클러스터의 시스템 관리자와 네트워크 관리자가 참여하여 클러스터의 네트워크 구성을 검토하고 문제가 발생하지 않는지 확인해야 합니다. 시스템 또는 해당 디바이스의 장애와 관련하여 추가 시스템 로그를 검토합니다.

/var/log/pcsd/pcsd.log
/var/log/pacemaker/pacemaker.log
/var/log/messages

기타 PCS 명령:

Pacemaker or PCS version: pcs --version 

Enable resource: pcs resource enable resource_name 

Disable resource: pcs resource disable resource_name  

Cleanup (restart) resource: pcs resource cleanup resource_name 

Stop cluster: pcs stop cluster --force 

Start cluster: pcs start cluster --all 

Put the node in standby: pcs node standby node_name 

Take node out of standby: pcs node unstandby node_name

Affected Products

NetWorker

Products

NetWorker Family, NetWorker Series

Article Number: 000218281

Article Type: How To

Last Modified: 06 May 2024

Version: 4

Check if your device is covered by Support Services.

NetWorker: Red Hat 클러스터 서비스 문제에 대한 문제 해결 가이드

Instructions

클러스터 토폴로지:

초기 진단:

파일 시스템 장애:

IPaddr 장애:

기타 PCS 명령:

Affected Products

Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Welcome

Welcome to Dell

NetWorker: Red Hat 클러스터 서비스 문제에 대한 문제 해결 가이드

Detailed Article

Instructions

Affected Products

Instructions

클러스터 토폴로지:

초기 진단:

파일 시스템 장애:

IPaddr 장애:

기타 PCS 명령:

Affected Products

Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services