TOPICS
LANGUAGE

Set Up a New Cluster

Note: This section applies only to FaunaDB Enterprise.

FaunaDB Enterprise clusters are made up of at least one logical datacenter with at least one node. A logical datacenter is a grouping of nodes with data partitioned across them. The nodes within a logical datacenter will ask one another for data before asking a node that might be further away.

So you’d want to setup logical datacenters across regions in order to maintain availability when another region goes down. You’d also set them up close to customer bases so they could query local replicas rather than having to cross the internet for data.

An Enterprise cluster can be used for entirely new data or as the target of a snapshot restore.

In this section we will walk through the steps of setting up a basic FaunaDB Enterprise cluster: three datacenters with a single node each. If you are setting up your Enterprise cluster in AWS EC2, read this first.

Dependencies

Before we get started, verify that the user running the FaunaDB service has read/write access to the config file (faunadb.yml), log path (log_path variable in faunadb.yml), and storage paths (storage_data_path variable in faunadb.yml).

You will need the following installed and set up:

You will also need curl (or something equivalent) for inspecting the public API.

FaunaDB Enterprise uses the following ports by default:

  • HTTP port 8443 for API requests (network_coordinator_http_port)
  • HTTP port 8444 for admin requests (network_admin_http_port)
  • Storage ports 7001/7501 for cross-datacenter traffic (network_peer_port/network_peer_secure_port), and
  • Note that if you set stats_host, stats will be written to port 8125 unless changed with stats_port.

Set Up a New Cluster

To set up a new Enterprise cluster, we will follow these steps:

  1. Determine your network topology.
  2. Set up the configuration file for your Enterprise cluster.
  3. Set up the distributed transaction log configuration for your Enterprise cluster.
  4. Start the first node and replication.
  5. Start the other nodes
  6. Start the datacenter
  7. Verify it worked

Determine Network Topology

Start by deciding on the basic network topology and logical datacenters that will make up your cluster. For a FaunaDB Enterprise cluster, you should have a minimum of three logical datacenters. Each logical datacenter will contain a full replica of the cluster’s dataset.

NOTE: Depending on your traffic expectations, either each logical datacenter should contain the same number of nodes in order to balance load and storage evenly across all nodes, or add additional nodes to high-traffic datacenters as needed.

The rest of this section will assume three logical datacenters named replica_1, replica_2, and replica_3.

Set Up Configuration File

Create a config file for each node and put it in /etc/faunadb.yml. You can see an example faunadb.yml in your Enterprise package.

At minimum, you will need to specify:

  • network_datacenter_name: This is the name you will use to group any additional nodes within a datacenter; for example: ‘replica_1’.
  • network_broadcast_address: the node’s IP address
  • network_listen_address: The interface address that FaunaDB will bind to for incoming requests.
  • auth_root_key: The root admin key for the FaunaDB Query API.
  • storage_transaction_log_nodes: The array of addresses of the nodes in the transaction log. (See more here.)

If your environment requires encrypted peer-to-peer communication, go configure peer-to-peer encryption before moving to the next step.

Configure Distributed Transaction Log

Choose a node to initialize the cluster from; we will call this node the first node. On this first node, navigate to faunadb.yml and locate the storage_transaction_log_nodes setting.

In the storage_transaction_log_nodes setting, you will specify one node per log partition in each datacenter via its network_broadcast_address (IP) in a single line of the array:

storage_transaction_log_nodes:
  - [ »Node 1 in Datacenter 1 IP«, »Node 1 in Datacenter 2 IP«, »Node 1 in Datacenter 3 IP« ]

In our example, we have three datacenters with a single node each. If we had larger datacenters with multiple nodes, it would look like:

storage_transaction_log_nodes:
  - [ »Node 1 in Datacenter 1 IP«, »Node 1 in Datacenter 2 IP«, »Node 1 in Datacenter 3 IP« ]
  - [ »Node 2 in Datacenter 1 IP«, »Node 2 in Datacenter 2 IP«, »Node 2 in Datacenter 3 IP« ]
  - [ »Node 3 in Datacenter 1 IP«, »Node 3 in Datacenter 2 IP«, »Node 3 in Datacenter 3 IP« ]

Each array of hosts represents the set of replicas for one log partition. You must specify at least one partition, and each partition you specify must contain at least one host. A host cannot be part of more than one partition. While adding and removing hosts is straightforward, you cannot add or remove the total number of partitions without stopping and restarting the entire cluster.

Yours will look something like this:

storage_transaction_log_nodes:
  - [ 1.1.1.1, 2.2.2.2, 3.3.3.3 ]

Start the First Node and Replication Configuration

Continuing on the first node, navigate to the FaunaDB install directory and start FaunaDB by running:

$ faunadb -c »Config filename«

For example:

$ faunadb -c /etc/faunadb.yml

Then initialize your new FaunaDB Enterprise cluster:

$ faunadb-admin init

Start Other Nodes

Once the first node is up, you will need to start FaunaDB by running the following on the other two nodes:

$ faunadb -c »Config filename«

For example:

$ faunadb -c /etc/faunadb.yml

Then join each node to the first node so that the nodes get the data they need:

$ faunadb-admin join »IP of first node«

Start the Datacenter

On the first node, set the cluster replication to include all initial datacenters:

$ faunadb-admin update-replication »Datacenter 1 name« »Datacenter 2 name« »Datacenter 3 name«

For example, if we use our three logical datacenters (replica_1, replica_2, and replica_3):

$ faunadb-admin update-replication replica_1 replica_2 replica_3

Use the status admin command to track the progress of replicating data to the new nodes:

$ faunadb-admin status

Verify

Verify the cluster is up and running by using the ping endpoint:

$ curl localhost:8443/ping

{ "resource": "Scope write is OK" }

If you do not receive the 200 OK response, get in touch with us to troubleshoot what went wrong.

Once your new FaunaDB Enterprise cluster has been set up, you should begin using your process manager of choice rather than faunadb to run and manage your day to day operations.

Set Up a Cluster in AWS EC2

We recommend using EC2 instances with locally attached, SSD storage. For production deployments, we recommend 8 cores and 30GB RAM with M3 or C3, which corresponds to the largest instance size: m3.2xlarge or c3.2xlarge +.

Since AWS is organized as multiple regions that are operationally independent of each other, communication between regions crosses the open internet and therefore peer-to-peer encryption between regions should be used.

For increased IO performance, you can choose to enable software RAID-0 if your instance has multiple local storage devices. Refer to your OS’s documentation for instructions on setting up software RAID.

Note: FaunaDB Enterprise is sensitive to I/O performance.

As with a standard FaunaDB Enterprise cluster, we recommended a FaunaDB deployment to 3 regions. This allows the cluster to continue to be available to clients if any single region becomes unavailable.