📚 Learning Flow

How does the AI Agent retrieve insights?

You submit a complete question. The system uses an LLM to understand what you want, then calls five Agents in sequence:

Dataset Search → Analytics Planning → Filter Decision → Analytics Execution → Interpretation

This end-to-end flow is built on RAG (retrieve first, then generate) and vector-based semantic search.

The Five Agents (what they do & what they expect)

Dataset Search Agent
- Uses semantic retrieval across multiple Data Planets (e.g., SDG, UN) to find relevant datasets.
- Output: candidate datasets + field metadata.
Analytics Planning Agent
- Decides the x-axis, y metrics (with calculations like avg/sum), filters (years/geo), and field formats (e.g., year, admin_level_4).
Filter Decision Agent
- Finalizes years/geo and other conditions into a concrete, executable parameter set.
Analytics Execution Agent
- Queries the selected datasets, retrieves the data, and aligns time and geography.
Interpretation Agent
- Sends the aligned data to the LLM and produces a concise written insight (≈300 words) that states the conclusion and limitations.

Plain-English Glossary

LLM (Large Language Model): A “text assistant” that reads your question, understands context, and writes a response.
RAG (Retrieval-Augmented Generation): First retrieve documents/data, then let the LLM answer based on what was found.
Vectors / Semantic Search: Convert text into numerical vectors so the system can find items by meaning, not just keywords.
LangGraph: A way to model multi-step flows as nodes and edges—both visual and executable.
Data Planet: Aralia’s open data universe. Multiple Planets host datasets curated by different data providers, each sharing domain expertise.

Pro tip: Ask a complete question

The demo Agent will not ask follow-up questions. Clearly scoped questions help the Agent find the right insight.
When unsure about terminology, use well-known, precise metric names (e.g., Gini coefficient)—avoid vague abbreviations.
Specify how you define averages and comparison points from the start.

Completeness checklist (copy and tick)

Metric / Definition: e.g., average GDP growth at purchasers’ prices (average? growth rate? nominal/real?)
Time: explicit year or range (e.g., 2021–2024; 2024)
Geography level: country / state / county / region + scope (e.g., Malaysia by state)
Relationship / Comparison: correlation, ranking, gap, grouped comparison, etc.

Question template (edit in place)

Is there a relationship between <Metric A> in <Time Range A> and <Metric B> in <Time/Year B> across <Geography Level/Scope>?

Example (ready to use):

Is there a relationship between the average GDP growth at purchasers' prices from 2021 to 2024 and the Gini coefficient of each state in Malaysia in 2024?

Non-examples (the Agent can’t answer these)

“Is there a relationship between GDP and Gini in Malaysia?” (missing time and geography level)
“Compare inequality.” (metric definition and years not specified)

Welcome to the next chapter! We’ll run the Agent hands-on and unpack how each step works.

P.S. You do not need to prepare your own environment—we’ll use Google Colab so you can run everything in the browser.

                                                ← Previous： [

Quick Start (3 minutes)](https://deciduous-centipede-9d7.notion.site/Quick-Start-3-minutes-264ddf94fd14808d9b83c9da2cf9efb4)　| Next：Run in Colab (No Code) →