A shared nothing cluster operates by creating incremental snapshots of data sets and then synchronising them between cluster nodes using ZFS send/receive over an ssh tunnel.
The ssh tunnel created and used by the synchronisation process needs to be passwordless and therefore the two nodes need to be ssh-bound. To configure ssh binding perform the following steps on each node:
Create your ssh keys as the root user (press return to accept the defaults for all prompts):
# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:2dGrTFvaGz8QJbVeVGS5sFv/deJRngPvSOr6v1SaMXc root@NodeA
The key's randomart image is:
+---[RSA 3072]----+
| ...B|
| ....= |
| . .o+ o|
| o ..= +.|
| S o o+*=E|
| o *.oX+*|
| = =*oo=|
| ..=o..|
| .+oooo. |
+----[SHA256]-----+
Once ssh-keygen has been run, a public key is saved to /root/.ssh/id_rsa.pub. This public key now needs to be added to the file /root/.ssh/authorized_keys on the other node (if the authorized_keys file does not exist simply create one).
Manually ssh NodeA > NodeB then NodeB > NodeA and accept the prompt to add each machine to the list of known hosts:
root@NodeA:~# ssh root@NodeB
The authenticity of host 'NodeB (10.10.10.2)' can't be established.
ED25519 key fingerprint is SHA256:EDmzS45TqKabZ53/35vXb4YyKTQuzJxNnbFuIwFj9UU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'NodeB,10.10.10.2' (ED25519) to the list of known hosts.
Last login: Tue Sep 12 09:54:49 2023 from 10.10.10.1
Oracle Solaris 11.4.42.111.0 Assembled December 2021
root@NodeB:~#
Once this process has been completed you should be able to ssh between nodes without being prompted for a password.
SSH login between nodes taking a long time
If ssh is taking a long time, try running ssh -v to see any errors
that may be causing the delay. A common issue is with GSS/Kerberos:
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_1000' not found
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_1000' not found
This can be disabled on the nodes by modifying /etc/ssh/ssh_config and
disabling all Host options that begin with GSS. For example:
Host *
# ForwardAgent no
# ForwardX11 no
# ForwardX11Trusted yes
# PasswordAuthentication yes
# HostbasedAuthentication no
GSSAPIAuthentication no
GSSAPIDelegateCredentials no
GSSAPIKeyExchange no
GSSAPITrustDNS no
# BatchMode no
# CheckHostIP yes
# AddressFamily any
# ConnectTimeout 0
# StrictHostKeyChecking ask
# IdentityFile ~/.ssh/id_rsa
# IdentityFile ~/.ssh/id_dsa
# IdentityFile ~/.ssh/id_ecdsa
# IdentityFile ~/.ssh/id_ed25519
# Port 22
# Ciphers aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc
# EscapeChar ~
# Tunnel no
# TunnelDevice any:any
# PermitLocalCommand no
# VisualHostKey no
# ProxyCommand ssh -q -W %h:%p gateway.example.com
# RekeyLimit 1G 1h
# UserKnownHostsFile ~/.ssh/known_hosts.d/%k
SendEnv LANG LC_*