A closer look: Exploratory Data Analysis with Spark

A typical workflow of a Data Scientist involves some level of exploratory data analysis. If you’re using Python when working with your data, you are probably quite familiar with packages like pandas, matplotlib, and others. Switching from pandas to Spark - how do you explore your data? How do you visualize it? In this talk, I’ll take a dataset and will explore it with Spark using IntelliJ IDEA and Apache Zeppelin.

View all 156 sessions

Maria Khalusova

Unstructured.io

Maria Khalusova, a Staff Developer Advocate at Unstructured.io, currently works on addressing the challenges of preprocessing complex unstructured data for Generative AI applications, such as Retrieval Augmented Generation. Previously, she contributed to the field through her work on the open-source team at Hugging Face. Maria has co-authored a course on building applications with open source models, and a course on Audio Transformer models, both of which are openly available to all.

A closer look: Exploratory Data Analysis with Spark

Maria Khalusova

Montreal 2020 sponsored by

Montreal 2020
sponsored by