Amazon Web Services (AWS) cloud accelerates big data analytics. It provides instant scalability and elasticity, letting you focus on analytics instead of infrastructure. Whether you are indexing large data sets or analyzing massive amounts of scientific data or processing clickstream logs, AWS provides a range of big data tools and services that you can leverage for virtually any data-intensive project.
Amazon Elastic MapReduce (EMR) is one such service that provides fully managed hosted Hadoop framework on top of Amazon Elastic Compute Cloud (EC2). In this paper, we highlight the best practices of moving data to AWS, collecting and aggregating the data, and discuss common architectural patterns for setting up and configuring Amazon EMR clusters for faster processing. We also discuss several performance and cost optimization techniques so you can process and analyze massive amounts of data at high throughput and low cost in a reliable manner.