Elasticsearch Cluster

This folder contains a Terraform module to deploy an Elasticsearch cluster in AWS on top of an Auto Scaling Group. The idea is to create an Amazon Machine Image (AMI) that has Elasticsearch installed using the install-elasticsearch module.

In a non-production setting, you can install Elasticsearch tools such as Kibana and ElastAlert on the same AMI. In a production setting, Elasticsearch should be the sole service running on each Elasticsearch node.

How do you connect to the Elasticsearch cluster?

Connecting to Elasticsearch via Official Elasticsearch Clients

The preferred way to connect to Elasticsearch is to use one of the official Elasticsearch clients. All official Elasticsearch clients are designed to discover multiple Elasticsearch nodes and distribute reuqests across the various nodes.

Therefore, using a Load Balancer to talk to Elasticsearch APIs (e.g., via an SDK) is NOT recommended, so you will need to get the IPs of the individual nodes and connect to them directly. Since those nodes run in an Auto Scaling Group (ASG) where servers can be added/replaced/removed at any time, you can't get their IP addresses from Terraform. Instead, you'll need to look up the IPs using the AWS APIs.

The easiest way to do that is to use the AWS SDK to look up the servers using EC2 Tags. Each server deployed by the elasticsearch-cluster module has its Name and aws:autoscaling:groupName tag set to the value you pass in via the cluster_name parameter. You can also specify custom tags via the tags parameter. You can use the AWS SDK to find the IPs of all servers with those tags.

For example, using the AWS CLI, you can get the IPs for servers in us-east-1 with the tag Name=elasticsearch-example as follows:

aws ec2 describe-instances \
    --region "us-east-1" \
    --filter \
      "Name=tag:Name,Values=elasticsearch-example" \
      "Name=instance-state-name,Values=running"

This will return a bunch of JSON that contains the IPs of the servers. You can then use the Elasticsearch client for your programming language to connect to these IPs.

Connecting via the REST API

Elasticsearch exposes a RESTful API that you can directly access using curl or any other programming language feature that makes HTTP requests.

What's included in this module?

This module creates the following:

Auto Scaling Group
Load Balancer
Security Group
IAM Role and Permissions

What's Not Included

EBS Volumes

Auto Scaling Group

This module runs Elasticsearch on top of an Auto Scaling Group (ASG). Typically, you should run the ASG with multiple Instances spread across multiple Availability Zones. Each of the EC2 Instances should be running an AMI that has Elasticsearch and optional Elasticsearch tools installed via the install-elasticsearch, install-elastalert, install-kibana, and install-logstash scripts. You pass in the ID of the AMI to run using the ami_id input parameter.

Load Balancer

We use a Network Load Balancer (1) so that we can perform ongoing health checks on each Elasticsearch node, and (2) so that Kibana can be accessed via a single endpoint which will forward to a live Kibana endpoint at random.

Note that we do not need a Load Balancer to distribute traffic to Elasticsearch because all the official Elasticsearch clients are designed to discover all Elasticsearch nodes and distribute requests across the cluster. Using a Load Balancer for this reason would duplicate functionality Elasticsearch clients already give us.

Security Group

Each EC2 Instance in the ASG has a Security Group that allows minimal connectivity:

All outbound requests
Inbound SSH access from the CIDR blocks and security groups you specify

The ID of the security group is exported as an output variable, which you can use with the elasticsearch-security-group-rules, elastalert-security-group-rules, kibana-security-group-rules, and logstash-security-group-rules modules to open up all the ports necessary for Elasticsearch and the respective Elasticsearch tools.

Check out the Security section for more details.

IAM Role and Permissions

Each EC2 Instance in the ASG has an IAM Role attached. The IAM Role ARN and ID are exported as output variables if you need to add additional permissions.

EBS Volumes

Note that we do not use EBS Volumes, which are AWS's ultra-low-latency network-attached storage. Instead, per Elasticsearch docs on AWS Best Practices, we exclusively use Instance Stores.

Instance Stores have the major disadvantage that they do not survive the termination of an EC2 Instance. That is, when an EC2 Instance dies, all the data on an Instance Store dies with it and is unrecoverable. But Elasticsearch already has built in support for replica shards, so we already have redundancy available to us if an EC2 Instance should fail.

This enables us to take advantage of the benefits of Instance Stores, which are that they are significantly faster because I/O traffic is now all local. By contrast, I/O traffic with EBS Volumes must traverse the (admittedly ultra low- latency) network and are therefore much slower.

How do you roll out updates?

If you want to deploy a new version of Elasticsearch across the cluster, the best way to do that is to:

Rolling deploy:
1. Build a new AMI.
2. Set the ami_id parameter to the ID of the new AMI.
3. Run terraform apply.
4. Because the elasticsearch-cluster module uses the Gruntwork server-group modules under the hood, running terraform apply will automatically perform a zero-downtime rolling deployment. Specifically, one EC2 Instance at a time will be terminated, a new EC2 Instance will spawn in its place, and only once the new EC2 Instance passes the Load Balancer Health Checks will the next EC2 Instance be rolled out.
  Note that there will be a brief period of time during which EC2 Instances based on both the old ami_id and new ami_id will be running. Rolling upgrades docs suggest that this is acceptable for Elasticsearch version 5.6 and greater.
  TODO: Add support for automatically disabling shard allocation and performing a synced flush on an Elasticsearch node prior to terminating it (docs).
New cluster:
1. Build a new AMI.
2. Create a totally new ASG using the elasticsearch-cluster module with the ami_id set to the new AMI, but all other parameters the same as the old cluster.
3. Wait for all the nodes in the new ASG to join the cluster and catch up on replication.
4. Remove each of the nodes from the old cluster.
5. Remove the old ASG by removing that elasticsearch-cluster module from your code.

Security

Here are some of the main security considerations to keep in mind when using this module:

Encryption in transit
Encryption at rest
Dedicated instances
Security groups
SSH access

Encryption in transit

Elasticsearch can encrypt all of its network traffic. TODO: Should we recommend using X-Pack (official solution, but paid), an Nginx Reverse Proxy, a custom Elasticsearch plugin, or something else?

Encryption at rest

EC2 Instance Storage

The EC2 Instances in the cluster store their data in an EC2 Instance Store, which does not have native suport for encryption (unlike EBS Volume Encryption).

TODO: Should we implement encryption at rest uising the technique described at https://aws.amazon.com/blogs/security/how-to-protect-data-at-rest-with-amazon-ec2-instance-store-encryption/?

Elasticsearch Keystore

Some Elasticsearch settings may contain secrets and should be encrypted. You can use the Elasticsearch Keystore for such settings. The elasticsearch.keystore is created automatically upon boot of each node, and is available for use as described in the docs.

Dedicated instances

If you wish to use dedicated instances, you can set the tenancy parameter to "dedicated" in this module.

Security groups

This module attaches a security group to each EC2 Instance that allows inbound requests as follows:

SSH: For the SSH port (default: 22), you can use the allowed_ssh_cidr_blocks parameter to control the list of\ CIDR blocks that will be allowed access. You can use the allowed_inbound_ssh_security_group_ids parameter to control the list of source Security Groups that will be allowed access.
The ID of the security group is exported as an output variable, which you can use with the elasticsearch-security-group-rules, elastalert-security-group-rules, kibana-security-group-rules, and logstash-security-group-rules modules to open up all the ports necessary for Elasticsearch and the respective Elasticsearch tools.

SSH access

You can associate an EC2 Key Pair with each of the EC2 Instances in this cluster by specifying the Key Pair's name in the ssh_key_name variable. If you don't want to associate a Key Pair with these servers, set ssh_key_name to an empty string.

Reference

Inputs
Outputs

Required

ami_idstringrequired

The AMI id of our custom AMI with Elasticsearch installed

aws_regionstringrequired

The AWS region in which all resources will be created

cluster_sizenumberrequired

The number of nodes this cluster should have

elasticsearch_cluster_namestringrequired

The name you want to give to this Elasticsearch cluster

instance_typestringrequired

The instance type for each of the cluster members. eg: t2.micro

subnet_idslist(string)required

The ids of the subnets

vpc_idstringrequired

The id of the vpc into which we will deploy Elasticsearch

Optional

allow_api_from_security_group_idslist(string)optional

The IDs of security groups from which ES API connections will be allowed. If you update this variable, make sure to update num_api_security_group_ids too!

Default:[]

allow_node_discovery_from_security_group_idslist(string)optional

The IDs of security groups from which ES API connections will be allowed. If you update this variable, make sure to update num_node_discovery_security_group_ids too!

Default:[]

allowed_cidr_blockslist(string)optional

The CIDR blocks from which we can connect to nodes of this cluster

Default:[]

allowed_ssh_security_group_idslist(string)optional

A list of security group IDs from which the EC2 Instances will allow SSH connections

Default:[]

alowable_ssh_cidr_blockslist(string)optional

The CIDR blocks from which SSH connections will be allowed

Default:[]

api_portnumberoptional

This is the port that is used to access elasticsearch for user queries

Default:9200

backup_bucket_arnstringoptional

A list of Amazon S3 bucket ARNs to grant the Elasticsearch instances access to

Default:"*"

ebs_optimizedbooloptional

If true, the launched EC2 instance will be EBS-optimized.

Default:false

ebs_volumeslist(object(…))optional

A list that defines the EBS Volumes to create for each server. Each item in the list should be a map that contains the keys 'type' (one of standard, gp2, or io1), 'size' (in GB), and 'encrypted' (true or false). Each EBS Volume and server pair will get matching tags with a name of the format ebs-volume-xxx, where xxx is the index of the EBS Volume (e.g., ebs-volume-0, ebs-volume-1, etc). These tags can be used by each server to find and mount its EBS Volume(s).

Type Details

list(object({
    type      = string
    size      = number
    encrypted = bool
  }))

Default:[]

Examples

Example

   default = [
     {
       type      = "standard"
       size      = 100
       encrypted = false
     },
     {
       type      = "gp2"
       size      = 300
       encrypted = true
     }
   ]

key_namestringoptional

The name of the Amazon EC2 Key Pair you wish to use for accessing this instance. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html?icmpid=docs_ec2_console#having-ec2-create-your-key-pair

Default:null

node_discovery_portnumberoptional

This is the port that is used internally by elasticsearch for cluster node discovery

Default:9300

num_api_security_group_idsnumberoptional

The number of security group IDs in allow_api_from_security_group_ids. We should be able to compute this automatically, but due to a Terraform limitation, if there are any dynamic resources in allow_api_from_security_group_ids, then we won't be able to: https://github.com/hashicorp/terraform/pull/11482

Default:0

num_enis_per_nodenumberoptional

The number of ENIs each node in this cluster should have.

Default:1

num_node_discovery_security_group_idsnumberoptional

The number of security group IDs in allow_node_discovery_from_security_group_ids. We should be able to compute this automatically, but due to a Terraform limitation, if there are any dynamic resources in allow_node_discovery_from_security_group_ids, then we won't be able to: https://github.com/hashicorp/terraform/pull/11482

Default:0

root_volume_delete_on_terminationbooloptional

Whether the volume should be destroyed on instance termination.

Default:true

root_volume_sizenumberoptional

The size, in GB, of the root EBS volume.

Default:50

root_volume_typestringoptional

The type of volume. Must be one of: standard, gp2, or io1.

Default:"gp2"

skip_rolling_deploybooloptional

If set to true, skip the rolling deployment, and destroy all the servers immediately. You should typically NOT enable this in prod, as it will cause downtime! The main use case for this flag is to make testing and cleanup easier. It can also be handy in case the rolling deployment code has a bug.

Default:false

tagsmap(string)optional

A map of key value pairs that represent custom tags to propagate to the resources that correspond to this ElasticSearch cluster.

Default:{}

Examples

Example

   default = {
     foo = "bar"
   }

target_group_arnslist(string)optional

A list of target group ARNs to associate with the Elasticsearch cluster.

Default:[]

user_datastringoptional

The User Data script to run on each server when it is booting.

Default:null

eni_elastic_ips

eni_private_ips

iam_role_id

security_group_id

server_asg_names

Elasticsearch Cluster

How do you connect to the Elasticsearch cluster?​

Connecting to Elasticsearch via Official Elasticsearch Clients​

Connecting via the REST API​

What's included in this module?​

What's Not Included​

Auto Scaling Group​

Load Balancer​

Security Group​

IAM Role and Permissions​

EBS Volumes​

How do you roll out updates?​

Security​

Encryption in transit​

Encryption at rest​

EC2 Instance Storage​

Elasticsearch Keystore​

Dedicated instances​

Security groups​

SSH access​

Reference​

Required​

Optional​

How do you connect to the Elasticsearch cluster?

Connecting to Elasticsearch via Official Elasticsearch Clients

Connecting via the REST API

What's included in this module?

What's Not Included

Auto Scaling Group

Load Balancer

Security Group

IAM Role and Permissions

EBS Volumes

How do you roll out updates?

Security

Encryption in transit

Encryption at rest

EC2 Instance Storage

Elasticsearch Keystore

Dedicated instances

Security groups

SSH access

Reference

Required

Optional