Self-Hosted Elasticsearch to AWS Elasticsearch
Best Practices for Migrating Self-Hosted Elasticsearch to AWS Elasticsearch

The challenges that come with self-managing Elasticsearch are motivating many organizations to migrate their clusters to the Amazon Elasticsearch Service. While this is not a complex undertaking, it requires attention to detail to avoid accidental data corruption or deletion. The following steps and recommendations can help ensure a smooth migration to the Amazon service.

Planning and Preparation for Migration

Before migrating Elasticsearch to AWS, it is essential to assess the existing environment and define the objectives for the migration. These often include addressing costs as well as improving scalability and reliability. The outcome of this first step can help determine which applications to migrate to the cloud service.

The following steps are followed in the initial assessment and planning phase for the migration.

1. Assess current cluster data and plan how many shards will be needed

Conduct an analysis of the indices to find out how much data they are using. Determine how many shards are needed from this information. Amazon recommends a shard size of between 10 and 50 GB. Keep in mind that larger shards may complicate Elasticsearch’s recovery from failures. Conversely, performance issues and memory errors can result if too many smaller shards are used. Amazon advises keeping the shard size small enough for instances to handle them. However, using an excessive number of smaller shards can result in hardware strain. Account for any missing indices and consider adding another shard if this number is expected to increase with the rollout of new applications.

2. Determine how many instances are needed in the cluster

If there are more than ten instances, enable the dedicated master nodes during the AWS ES cluster creation. In the case of less than ten instances with no dedicated master, all will be master-eligible. Amazon recommends a minimum of three nodes to avoid a lapse in communication that may lead to one cluster having two master nodes. For three dedicated master nodes, a minimum of two data nodes for each replication should be used.

3. Calculate storage requirements

One of the most common causes of cluster instability is a lack of sufficient storage. As a result, ensure accurate numbers for instance types, instance counts, and storage volumes. Use the formula, Source Data * (1 + Number of Replicas) * 1.45 = Minimum Storage Requirements. If the domain requires more than 1 PB, Amazon offers storage up to 2 PB. It is advised to verify associated costs with domains of this size before proceeding.

While calculating storage, consider the type of Elasticsearch workloads. These are either long-lived indices or rolling indices. Long-lived index examples include website, document, and ecommerce searches. In these indices, code is written to process data in one more index which are updated as the source data changes. Rolling indices use continuously flowing data into temporary indices with indexing periods and retention windows. These would apply to log analytics, time-series processing, and clickstream analytics.

The number of replicas needed must also be considered. Amazon recommends at least one replica, though more may be needed to improve search performance in read-heavy workloads. Other aspects include Elasticsearch indexing overhead, operating system space, and Amazon ES overhead size.

Migrating to AWS

Upon completion of the assessments above, the migration process can begin as outlined below.

1. Create the AWS domain

The AWS domain can be created by either using the CLI or the console. Using the “Create Elasticsearch domain” console wizard, enter the cluster name in step one. In the steps that follow, enter the instance information and storage size. This creates a new empty AWS Elasticsearch cluster.

2. Create an S3 bucket

Because AWS migration cannot be done by transporting data from connecting two Elasticsearch clusters, an AWS S3 bucket must first be created before moving forward. Grant list, read, and write permissions for a new user in the access policy. The policies can also be created using the console or more advanced command options. The bucket must be in the same region as the AWS Elasticsearch side to facilitate the repository registration. Access to the bucket can be verified by using CLI with the access_key and secret_key commands.

3. Register the S3 bucket as a snapshot repository

In order to connect the data between the two clusters for migration preparation, both Elasticsearch instances must have the same S3 buckets registered as snapshot repositories. The repositories must be registered before snapshot and restore operations can be done. The steps to accomplish this for the self-hosted instance can be found here. The AWS hosted Elasticsearch bucket registration process needs USER, ROLE, and POLICIES configured in AWS IAM. From there, the process is similar to that of the self-hosted Elasticsearch process, except for signing the request and specifying the Elasticsearch role in the message body. Interfaces such as Postman or Boto can be also be used for this process.

4. Restoring from S3 to AWS ES

After identifying the snapshots to migrate, restoring the snapshots can be done through either HTTP basic authentication with master user credentials or with AWS authentication using IAM credentials. Check that all snapshots have been uploaded from the self-hosted Elasticsearch repository by calling up the AWS repository.

5. Finalize the migration

After verifying that all indices were restored as anticipated, reindex those that needed to be changed. An example of this would be changing daily indices to be reindexed monthly. Reindexing status can be checked by using the _tasks API. Other tasks include switching the Logstash Elasticsearch output to be sent to AWS E3 and adjusting retention scripts as necessary. Finally, configure clients to use the new Amazon ES endpoint and configure new IAM roles.

Conclusion

Organizations migrating to Amazon’s Elasticsearch service stand to gain numerous and significant benefits over the self-managed Elasticsearch service. While ElasticSearch is a highly popular, open-source search and analytics engine, deployment and management can be tedious and time-consuming. By following the above guidelines, the migration process to Amazon Elasticsearch can be efficiently completed without data loss or corruption.

Agilisium is an AWS advanced consulting partner providing digital transformation expertise to businesses across several sectors. Contact us for more information on how our services and solutions can improve your business’ productivity.

Overview
“Agilisium architected, designed and delivered an elastically scalable Cloud-based Analytics-ready Big Data solution with AWS S3 Data Lake as the single source of truth”
The client is one of the world’s leading biotechnology company, with presence in 100+ markets globally, was looking for ways to maximize impact of their sales & marketing efforts.

The lack of a single source of truth, quality data and ad hoc manual reporting processes undermined top management’s visibility of integrated insights on sales, sales rep interactions, marketing reach, brand performance, market share, and territory management. Understandably, the client wanted to align information that has hitherto been in silos, to gain a 360-degree product movement view, to optimize sales planning and gain competitive edge.