Loading…
DevConf.CZ 2020 has ended
Back To Schedule
Sunday, January 26 • 12:00pm - 1:55pm
Data Engineering for AI Workloads

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
As data is exponentially growing in organizations, there is an increasing need to consolidate silos of information into a single source of truth, a Data Lake to feed hungry Analytics and Machine Learning Engines that can gather insight at scale. In this workshop, we will detail how data engineers can process, manage and explore large-scale data for data science initiatives using open source industry-standard solutions running on OpenShift with the Open Data Hub project.

In this lab, attendees will learn how to:
- Store data in Ceph object storage
- Optimize storage of big data with compressed columnar file formats
- Catalog data sets using Hive Metastore and Hue
- Access and process cataloged data using Spark
- Create a data processing workflow with conditional steps using Argo
- Monitor the query performance of Spark using Prometheus

Speakers
avatar for Ricardo Oliveira

Ricardo Oliveira

JBUG:Brazil, Ansible Meetup, Red Hat Developers, Red Hat, Inc.
Ricardo has 10+ year of Italy experience with both Development and sysadmin skills. Works at Red Hat in the OpenShift xPaaS team, providing all JBoss solutions to run in Dockerized environments and providing advices about how to use OpenShift at their bes
AA

Anish Asthana

Senior Software Engineer, Red Hat, Inc.
Anish is an engineer at Red Hat in the AI Services Organization. He is primarily working on the Open Data Hub - a machine learning-as-a-service platform built with OpenShift at the core. His interests include monitoring, scalability, and reliability.


Sunday January 26, 2020 12:00pm - 1:55pm CET
Workshop Room A - A218 Faculty of Information Technology, Brno University of Technology Božetěchova 1 / 2 612 00 BRNO Czech Republic