AWS Tools and Approach
Infiniti utilizes AWS EMR (Elastic Map Reduce) as the core of our big data tools. EMR is really a Hadoop and Spark ecosystem that includes tools such as Hive, Hue, Zeppelin, PySpark and others that allow for highly efficient processing of data. AWS S3 is used as the underlying file system that supports the overall data lake architecture. AWS has made a number of enhancements to S3 specifically for big data processing. Infiniti also uses the latest AWS tools (released in November 2016) including Athena for writing SQL directly against files in S3 – this allows for direct access to data in files using SQL without the need for any type of SQL query. The result is a highly data profiling tool. Infiniti is also using AWS QuickSight that provides sophisticated data profiling for 1/10th of the cost of other tools. We are also starting to use AWS Glue which is currently still in preview mode but will be generally available later in 2017.