Expedia Group Technology — Data

Unify Data Lakes Across Multi-Regions in the Cloud

How to manage a petabyte-scale multi-region cloud data platform with an open-source solution

Ferris wheel at dusk.

1. Background

Expedia Group™ brands(Source: Expedia Group Careers)
Expedia Group™ brands(Source: Expedia Group Careers)

2. Challenges

https://github.com/ExpediaGroup/circus-train/blob/main/circus-train.png
https://github.com/ExpediaGroup/circus-train/blob/main/circus-train.png
Jetstream Infrastructure (source: author image)
Jetstream Infrastructure (source: author image)
Data Replication (source: author image)
Data Replication (source: author image)
Data Replication (source: author image)
Data Replication (source: author image)

3. Solution — Cross-region Data Lake Federation

Alluxio(source: alluxio.io)
Alluxio(source: alluxio.io)
https://github.com/ExpediaGroup/waggle-dance/blob/main/logo.png
(https://github.com/ExpediaGroup/waggle-dance/blob/main/logo.png)
Waggledance Diagram(https://github.com/ExpediaGroup/waggle-dance/blob/main/system-diagram.png)
(https://github.com/ExpediaGroup/waggle-dance/blob/main/system-diagram.png)
Cross region access with Alluxio (source: author image)
Cross region access with Alluxio (source: author image)
Data Catalog with Alluxio (source: author image)
Data Catalog with Alluxio (source: author image)
Data Federation with Alluxio (source: author image)
Data Federation with Alluxio (source: author image)

4. Results

5. Data Mesh Vision and Next Steps

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jian Li

Focuses on Data Lake / Data warehousing, cloud infrastructure, distributed ecosystem, also interested in CI/CD and automation solutions.