Fundamentals

Essentials

Data Planet

Stellar System

Stellar Portal

Transplore™

Data Pipeline - To be released


Abstract

One of the core values of the Aralia Data Pipeline is the ability to continuously and consistently deliver high-value data to users across the ecosystem.

To ensure that the data is constantly updated, data planet administrators need to regularly convert and upload external data sources to the platform.

This process is a typical ETL (Extract - Transform - Load):

To automate and sustain this process, Data Planet administrators can use Apache Airflow to manage and execute ETL.


Why Airflow

Apache Airflow is a set of workflow scheduling and routing tools widely used in the industry for data engineering and analysis platforms.

Airflow provides the following key capabilities in Aralia's data pipeline:

  1. Customizable Scheduling
  2. Integrate Data Sources and Transformation Processing
  3. Process Observability and Error Tracking
  4. Version Control and Reproducibility

<aside> 📌

Simply put: Airflow's role at Aralia is to act as an automated scheduler and process coordinator, helping to keep the data planet on track with its ETL efforts, and ensuring that ecosystem users have access to up-to-date and consistent data in real time.

</aside>


Transform Example (Python)


Load Example

Aralia Data Planet provides an API for uploading data to Data Planet. See the following upload methods.

Aralia Data Planet Open API



← Previous Chapter: Transplore™