If you want to run analytics in a serverless cloud environment, Amazon Web Services believes it can help you while reducing your operating costs and simplifying deployments.
As usual for Amazon, the cloud giant previewed this Serverless EMR platform – EMR formerly standing for Elastic MapReduce – at its Re:Invent conference in December, and only opened the services to the public. this week.
AWS is no stranger to serverless with products like Lambda. However, its EMR offering specifically targets analytics workloads, such as those using Apache Spark, Hive, and Presto.
Amazon’s existing EMR platform already supported deployments to VPC clusters running in EC2, Kubernetes clusters in EKS, and on-premises deployments running on Outposts. And while this provides greater control over the application and compute resources, it also required the user to manually configure and manage the cluster.
Additionally, the compute and memory resources needed for many data analytics workloads are subject to change depending on the complexity and volume of data being processed, according to Amazon.
EMS Serverless promises to eliminate this complexity by automatically provisioning and scaling compute resources to meet the demands of open source workloads. As more or fewer resources are needed to accommodate changing data volumes, the platform automatically adds or removes workers. This, according to Amazon, ensures that compute resources are not underutilized or overcommitted. And clients are only charged for the time and number of workers needed to complete the job.
Customers can further control costs by specifying a minimum and maximum number of workers and the vCPUs and memory allocated to each worker. Each application is fully isolated and runs in a secure instance.
According to Amazon, these capabilities make the platform ideal for a number of data pipelines, shared clusters, and interactive data workloads.
By default, EMS Serverless workloads are configured to start when jobs are submitted and stop after the application has been idle for more than 15 minutes. However, customers can also initialize workers to reduce the time it takes to start the process.
EMR Serverless also supports shared applications using Amazon Identity and Access Management roles. This allows multiple tenants to submit jobs using a common pool of workers, the company explained in a statement.
At launch, EMS Serverless supports applications built using the Apache Spark and Hive frameworks.
Regardless of how the application is deployed, workloads are centrally managed from Amazon’s EMR Studio. The control plane also allows customers to create new workloads, submit jobs, and view diagnostic data. The service also integrates with AWS S3 object storage, allowing Spark and Hive logs to be saved for review.
EMR Serverless is now available in Amazon regions of Northern Virginia, Oregon, Ireland, and Tokyo. ®