This is a talk I gave at Spark Summit East 2017 with my boss/mentor Robbie Strickland.

Parquet is a big data storage format that wag integral to our analytics workflow. This talk details the ins and outs of the format in connection with Apache Spark.