17.1. Configuring Primary Backup DR systems

This document describes how you can setup a backup Trisul Network Analytics system as a DR (Disaster Recovery) node.

Terminology : In some places , we use DR node to refer to the Backup node, and DC node to refer to the Primary node. DC stands for Data Center.

17.1.1 Prerequisities

If you are configuring Disaster Recovery for a pre-existing Trisul install, ensure you copy the data over manually using scp first. By default, the sync intially only copies the latest 7 days during intialization.

Per context DR

The Primary-Backup configuration needs to be configured on a per-context basis. The instructions on this page applies to the default context. Multi tenant customers with multiple contexts need to configure for each context separately.

Service names

The systemd service names are

trisul-hub-replicator
Run on the PRIMARY (DC) node. Sets up the incremental replication processes from the primary. For contexts other than the default context this service is called trisul-hub-replicator-contextname
trisul-hub-primary-health-check
Runs on the BACKUP (DR) node. Checks the PING reachability of PRIMARY nodes. For contexts other than the default context this service is called trisul-hub-primary-health-check-contextname

17.1.2 Configuring DR on Primary and Backup sites

Create home directories for the trisul user

On both the Primary and Backup sites , Trisul processes run under the username trisul.trisul. This will be used by the replication process to synchonize the data.

Repeat the following process on both the primary and backup site.

Edit /etc/passwd
Assign a shell to the trisul user as shown below


trisul:x:999:999:Trisul Daemon User:/home/trisul:/bin/bash

Ensure home directory exists for the trisul user


root@ubuntuDR:~# mkhomedir_helper trisul

Create a password


root@Ubuntu:~# passwd trisul
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully


Create a SSH key pair for automatic ssh

Do these for both PRIMARY-BACKUP and BACKUP-PRIMARY directions.

Use ssh-keygen

Login as the trisul user and use ssh-keygen to create a key pair. Press ENTER to skip the passphrase.



su trisul

trisul@Ubuntu:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/trisul/.ssh/id_rsa): 

skip the passphrase
..

Use ssh-copy-id to setup login

Use ssh-copy-id to create an automatic login for the trisul user. THis will be used by rsync to transfer data to the backupsite.

On Primary :


ssh-copy-id trisul@backup-ip-address

On Backup :


ssh-copy-id trisul@primary

At this point , both sides should be able to login to each other without a password.

17.1.3 Configure the primary site replication

On the primary site run the install-replication-primary.sh script to create the replication service.



root@Ubuntu:/usr/local/share/trisul-hub# ./install-replication-primary.sh 
Configuring this node as a PRIMARY site in a PRIMARY-BACKUP (DR-DC) configuration

You will be asked a series of questions. Hit ENTER to accept defaults

Enter username   (default trisul) : 
Enter domain    (default domain0) : 
Enter hub          (default hub0) : 
Enter context  (default context0) : 
Enter IP of BACKUP site hub       : 192.168.2.141
Thanks ! Configuring scripts for replication, now testing connection
--------------------------------------------------------------------
PING 192.168.2.141 (192.168.2.141) 56(84) bytes of data.
64 bytes from 192.168.2.141: icmp_seq=1 ttl=64 time=0.197 ms

--- 192.168.2.141 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.197/0.197/0.197/0.000 ms

  * Creating configuration file /usr/local/etc/trisul-hub/domain0/hub0/context0/DCDRReplicationSettings.conf
  * Checking rsync user home directory for RSYNC
  * Installing systemd services into trisul-hub-replicator.service
  * systemctl daemon-reload 
  * Use [systemctl enable trisul-hub-replicator] to enable on reboot
  * Login as trisul on this site and run [ssh-copy-id trisul@192.168.2.141]  to create automatic login for RSYNC
  * Customize in config file /usr/local/etc/trisul-hub/domain0/hub0/context0/DCDRReplicationSettings.conf

Further customization for PRI-BAK replication

Further customization can be done by editing the DCDRReplicationSettings.conf file shown above.

17.1.4 Configure the backup node health check

The backup node service performs a health check on pre-defined IP addresses in the primary site. If ALL the IPs are unrechable for a predefinied period of time [default 7 minutes], the service declares the PRIMARY site to be down. Then starts the Trisul Probe Processes on the backup site and the service stops.

Use install-replication-backup.sh to configure the backup node.



root@ubuntuDR:/usr/local/share/trisul-hub# ./install-replication-backup.sh 
Configuring this node as a BACKUP site in a PRIMARY-BACKUP (DR-DC) configuration

You will be asked a series of questions. Hit ENTER to accept defaults

Enter username       (default trisul) : 
Enter domain        (default domain0) : 
Enter hub              (default hub0) : 
Enter context      (default context0) : 
IPs for health check (space separated): 192.168.2.140
Enter IP of paired  PRIMARY hub       : 192.168.2.140

  * Creating configuration file /usr/local/etc/trisul-hub/domain0/hub0/context0/DRDCHealthCheck.conf
  * Checking rsync user home directory for RSYNC
  * Installing systemd services into trisul-hub-primary-health-check.service
  * Use [systemctl enable trisul-hub-primary-health-check] to enable on reboot
  * Login as trisul on this site and run [ssh-copy-id trisul@]  for restore sync 
  * You can adjust timeouts , retries, advanced RSYNC options in the config file below
  * Customize in config file /usr/local/etc/trisul-hub/domain0/hub0/context0/DRDCHealthCheck.conf

Further customization for PRI-BAK replication

Further customization can be done by editing the DRDCHealthCheck.conf file shown above.

17.1.5 Start services on primary and backup nodes

On the PRIMARY node, the trisul-hub-replicator service pushes the incremental changes continuously to the backup site.

On the PRIMARY node.



systemctl start trisul-hub-replicator

On the BACKUP node.



systemctl start trisul-hub-primary-health-check 

View log files about the status of the processes

on primary
journalctl -fu trisul-hub-replicator

on backup
journalctl -fu trisul-hub-primary-health-check

17.1.6 Failover

The Failover process is automatic.

When the PRIMARY site fails, the predefined IP addresess become unreachable from the backup site. The trisul-hub-primary-health-check service on the backup site starts the Trisul processes on the backup site and stops.

Packet Mode
arrange the packets to be sent to the backup system
Netflow mode
the Netflow streams have to be sent to the backup probe addresses. This can be accomplished by a load balancer or other methods.

17.1.7 Restoration

The restoration process needs manual intervention. First ensure that the conditions that caused the failure have been corrected.

The process is :

  • on PRIMARY : Bring up the PRIMARY system

    Boot the backup system and ensure it is working. Login to the system and make ensure :

    • able to PING the backup node
    • able to SSH into the backup node
  • on PRIMARY : Stop any running Trisul Processes

    Ensure all processes are stopped on the primary system

    
    	trisulctl_hub stop context all 
    
  • on BACKUP : Copy the data to the PRIMARY

    Now we need to copy the data from the running backup system onto the primary. This will include all the data that was collected during the downtime of the primary.

    This step can take some time as the delta is copied from the BACKUP back to the PRIMARY node. We recommend you use a screen session to run this.
    
    
    # preferably use a screen session
    
    cd /usr/local/share/trisul-hub
    
    ./restore_sync_dr_dc.sh /usr/local/etc/trisul-hub/domain0/hub0/context0/DRDCHealthCheck.conf
    
    

    This will run for a while, copying all the changes back to the DC site.

    when the copying finishes you will see a message like the following.

    
    
    ..
    sent 37 bytes  received 12 bytes  32.67 bytes/sec
    total size is 0  speedup is 0.00
    BACKUP WEBTRISUL DB
    --- Successfully Finished Sync ---
    --- The PRIMARY/DC site data is now ---
    --- synchronized with the BACKUP/DR site ---
    
    --- Now you can stop the BACKUP/DR ---
    --- and start the PRIMARY/DC site ---
    Wed Sep 23 07:07:27 EDT 2020
    root@ubuntuDR:/usr/local/share/trisul-hub# 
    

    Now you are ready to

  • on BACKUP : Stop the probes

    You can manually stop the probes on the backup node. You should not stop the hub nodes however. Run the following command to stop the default context on probe0.

    
    trisulctl_hub stop context default@probe0
    
    
  • on BACKUP : Switch back to BACKUP mode

    Run this to go back to BACKUP mode.

    
    systemctl start trisul-hub-primary-health-check
    
  • on PRIMARY : Start the processes

    On the primary node start up the processes

    
    
    trisulctl_hub 
    start context default 
    
  • on PRIMARY : switch back to PRIMARY mode

    This step will put the system back into PRIMARY BACKUP configuration.

    
    
    systemctl start trisul-hub-replicator 
    

This completes the process.

17.1.8 Viewing status of PRI BAK processes

To view the status of the Replicator and Health Check processes, login as Administrator to either the primary or backup node. To view the status of the default context, do :

Select Context: default > Admin Tasks > DC DR Status

The following information is shown:

Mode Whether the current node is configured as a PRIMARY (DC) or a BACKUP (DR node
Replication status Whether the replication process is running, enabled on boot
Replication log Last few lines of the replication log, you can check the timestamps
Health check status On the Backup (DR) node, whether the health check processes are active
Health check log Last few lines of the health check process log