Elasticsearch Cluster
This folder contains a Terraform module to deploy an Elasticsearch cluster in AWS on top of an Auto Scaling Group. The idea is to create an Amazon Machine Image (AMI) that has Elasticsearch installed using the install-elasticsearch module.
In a non-production setting, you can install Elasticsearch tools such as Kibana and ElastAlert on the same AMI. In a production setting, Elasticsearch should be the sole service running on each Elasticsearch node.
How do you connect to the Elasticsearch cluster?
Connecting to Elasticsearch via Official Elasticsearch Clients
The preferred way to connect to Elasticsearch is to use one of the official Elasticsearch clients. All official Elasticsearch clients are designed to discover multiple Elasticsearch nodes and distribute reuqests across the various nodes.
Therefore, using a Load Balancer to talk to Elasticsearch APIs (e.g., via an SDK) is NOT recommended, so you will need to get the IPs of the individual nodes and connect to them directly. Since those nodes run in an Auto Scaling Group (ASG) where servers can be added/replaced/removed at any time, you can't get their IP addresses from Terraform. Instead, you'll need to look up the IPs using the AWS APIs.
The easiest way to do that is to use the AWS SDK to look up the servers using EC2 Tags. Each server deployed by
the elasticsearch-cluster
module has its Name
and aws:autoscaling:groupName
tag set to the value you pass in via the
cluster_name
parameter. You can also specify custom tags via the tags
parameter. You can use the AWS SDK to find
the IPs of all servers with those tags.
For example, using the AWS CLI, you can get the IPs for servers in us-east-1
with
the tag Name=elasticsearch-example
as follows:
aws ec2 describe-instances \
--region "us-east-1" \
--filter \
"Name=tag:Name,Values=elasticsearch-example" \
"Name=instance-state-name,Values=running"
This will return a bunch of JSON that contains the IPs of the servers. You can then use the Elasticsearch client for your programming language to connect to these IPs.
Connecting via the REST API
Elasticsearch exposes a RESTful API that you can directly access using curl
or any other programming language feature
that makes HTTP requests.
What's included in this module?
This module creates the following:
What's Not Included
Auto Scaling Group
This module runs Elasticsearch on top of an Auto Scaling Group (ASG). Typically,
you should run the ASG with multiple Instances spread across multiple Availability
Zones. Each of the EC2
Instances should be running an AMI that has Elasticsearch and optional Elasticsearch tools installed via the
install-elasticsearch, install-elastalert, install-kibana, and install-logstash scripts. You pass in the ID of the AMI to
run using the ami_id
input parameter.
Load Balancer
We use a Network Load Balancer (1) so that we can perform ongoing health checks on each Elasticsearch node, and (2) so that Kibana can be accessed via a single endpoint which will forward to a live Kibana endpoint at random.
Note that we do not need a Load Balancer to distribute traffic to Elasticsearch because all the official Elasticsearch clients are designed to discover all Elasticsearch nodes and distribute requests across the cluster. Using a Load Balancer for this reason would duplicate functionality Elasticsearch clients already give us.
Security Group
Each EC2 Instance in the ASG has a Security Group that allows minimal connectivity:
- All outbound requests
- Inbound SSH access from the CIDR blocks and security groups you specify
The ID of the security group is exported as an output variable, which you can use with the elasticsearch-security-group-rules, elastalert-security-group-rules, kibana-security-group-rules, and logstash-security-group-rules modules to open up all the ports necessary for Elasticsearch and the respective Elasticsearch tools.
Check out the Security section for more details.
IAM Role and Permissions
Each EC2 Instance in the ASG has an IAM Role attached. The IAM Role ARN and ID are exported as output variables if you need to add additional permissions.
EBS Volumes
Note that we do not use EBS Volumes, which are AWS's ultra-low-latency network-attached storage. Instead, per Elasticsearch docs on AWS Best Practices, we exclusively use Instance Stores.
Instance Stores have the major disadvantage that they do not survive the termination of an EC2 Instance. That is, when an EC2 Instance dies, all the data on an Instance Store dies with it and is unrecoverable. But Elasticsearch already has built in support for replica shards, so we already have redundancy available to us if an EC2 Instance should fail.
This enables us to take advantage of the benefits of Instance Stores, which are that they are significantly faster because I/O traffic is now all local. By contrast, I/O traffic with EBS Volumes must traverse the (admittedly ultra low- latency) network and are therefore much slower.
How do you roll out updates?
If you want to deploy a new version of Elasticsearch across the cluster, the best way to do that is to:
Rolling deploy:
Build a new AMI.
Set the
ami_id
parameter to the ID of the new AMI.Run
terraform apply
.Because the elasticsearch-cluster module uses the Gruntwork server-group modules under the hood, running
terraform apply
will automatically perform a zero-downtime rolling deployment. Specifically, one EC2 Instance at a time will be terminated, a new EC2 Instance will spawn in its place, and only once the new EC2 Instance passes the Load Balancer Health Checks will the next EC2 Instance be rolled out.Note that there will be a brief period of time during which EC2 Instances based on both the old
ami_id
and newami_id
will be running. Rolling upgrades docs suggest that this is acceptable for Elasticsearch version 5.6 and greater.TODO: Add support for automatically disabling shard allocation and performing a synced flush on an Elasticsearch node prior to terminating it (docs).
New cluster:
- Build a new AMI.
- Create a totally new ASG using the
elasticsearch-cluster
module with theami_id
set to the new AMI, but all other parameters the same as the old cluster. - Wait for all the nodes in the new ASG to join the cluster and catch up on replication.
- Remove each of the nodes from the old cluster.
- Remove the old ASG by removing that
elasticsearch-cluster
module from your code.
Security
Here are some of the main security considerations to keep in mind when using this module:
Encryption in transit
Elasticsearch can encrypt all of its network traffic. TODO: Should we recommend using X-Pack (official solution, but paid), an Nginx Reverse Proxy, a custom Elasticsearch plugin, or something else?
Encryption at rest
EC2 Instance Storage
The EC2 Instances in the cluster store their data in an EC2 Instance Store, which does not have native suport for encryption (unlike EBS Volume Encryption).
TODO: Should we implement encryption at rest uising the technique described at https://aws.amazon.com/blogs/security/how-to-protect-data-at-rest-with-amazon-ec2-instance-store-encryption/?
Elasticsearch Keystore
Some Elasticsearch settings may contain secrets and should be encrypted. You can use the Elasticsearch Keystore for such settings. The
elasticsearch.keystore
is created automatically upon boot of each node, and is available for use as described in the
docs.
Dedicated instances
If you wish to use dedicated instances, you can set the tenancy
parameter to "dedicated"
in this module.
Security groups
This module attaches a security group to each EC2 Instance that allows inbound requests as follows:
SSH: For the SSH port (default: 22), you can use the
allowed_ssh_cidr_blocks
parameter to control the list of\ CIDR blocks that will be allowed access. You can use theallowed_inbound_ssh_security_group_ids
parameter to control the list of source Security Groups that will be allowed access.The ID of the security group is exported as an output variable, which you can use with the elasticsearch-security-group-rules, elastalert-security-group-rules, kibana-security-group-rules, and logstash-security-group-rules modules to open up all the ports necessary for Elasticsearch and the respective Elasticsearch tools.
SSH access
You can associate an EC2 Key Pair with each
of the EC2 Instances in this cluster by specifying the Key Pair's name in the ssh_key_name
variable. If you don't
want to associate a Key Pair with these servers, set ssh_key_name
to an empty string.
Reference
- Inputs
- Outputs
Required
ami_id
stringThe AMI id of our custom AMI with Elasticsearch installed
aws_region
stringThe AWS region in which all resources will be created
cluster_size
numberThe number of nodes this cluster should have
The name you want to give to this Elasticsearch cluster
instance_type
stringThe instance type for each of the cluster members. eg: t2.micro
subnet_ids
list(string)The ids of the subnets
vpc_id
stringThe id of the vpc into which we will deploy Elasticsearch
Optional
allow_api_from_security_group_ids
list(string)The IDs of security groups from which ES API connections will be allowed. If you update this variable, make sure to update num_api_security_group_ids
too!
[]
allow_node_discovery_from_security_group_ids
list(string)The IDs of security groups from which ES API connections will be allowed. If you update this variable, make sure to update num_node_discovery_security_group_ids
too!
[]
allowed_cidr_blocks
list(string)The CIDR blocks from which we can connect to nodes of this cluster
[]
allowed_ssh_security_group_ids
list(string)A list of security group IDs from which the EC2 Instances will allow SSH connections
[]
alowable_ssh_cidr_blocks
list(string)The CIDR blocks from which SSH connections will be allowed
[]
api_port
numberThis is the port that is used to access elasticsearch for user queries
9200
backup_bucket_arn
stringA list of Amazon S3 bucket ARNs to grant the Elasticsearch instances access to
"*"
ebs_optimized
boolIf true, the launched EC2 instance will be EBS-optimized.
false
ebs_volumes
list(object(…))A list that defines the EBS Volumes to create for each server. Each item in the list should be a map that contains the keys 'type' (one of standard, gp2, or io1), 'size' (in GB), and 'encrypted' (true or false). Each EBS Volume and server pair will get matching tags with a name of the format ebs-volume-xxx, where xxx is the index of the EBS Volume (e.g., ebs-volume-0, ebs-volume-1, etc). These tags can be used by each server to find and mount its EBS Volume(s).
list(object({
type = string
size = number
encrypted = bool
}))
[]
Example
default = [
{
type = "standard"
size = 100
encrypted = false
},
{
type = "gp2"
size = 300
encrypted = true
}
]
key_name
stringThe name of the Amazon EC2 Key Pair you wish to use for accessing this instance. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html?icmpid=docs_ec2_console#having-ec2-create-your-key-pair
null
node_discovery_port
numberThis is the port that is used internally by elasticsearch for cluster node discovery
9300
The number of security group IDs in allow_api_from_security_group_ids
. We should be able to compute this automatically, but due to a Terraform limitation, if there are any dynamic resources in allow_api_from_security_group_ids
, then we won't be able to: https://github.com/hashicorp/terraform/pull/11482
0
num_enis_per_node
numberThe number of ENIs each node in this cluster should have.
1
The number of security group IDs in allow_node_discovery_from_security_group_ids
. We should be able to compute this automatically, but due to a Terraform limitation, if there are any dynamic resources in allow_node_discovery_from_security_group_ids
, then we won't be able to: https://github.com/hashicorp/terraform/pull/11482
0
Whether the volume should be destroyed on instance termination.
true
root_volume_size
numberThe size, in GB, of the root EBS volume.
50
root_volume_type
stringThe type of volume. Must be one of: standard, gp2, or io1.
"gp2"
If set to true, skip the rolling deployment, and destroy all the servers immediately. You should typically NOT enable this in prod, as it will cause downtime! The main use case for this flag is to make testing and cleanup easier. It can also be handy in case the rolling deployment code has a bug.
false
tags
map(string)A map of key value pairs that represent custom tags to propagate to the resources that correspond to this ElasticSearch cluster.
{}
Example
default = {
foo = "bar"
}
target_group_arns
list(string)A list of target group ARNs to associate with the Elasticsearch cluster.
[]
user_data
stringThe User Data script to run on each server when it is booting.
null