From Semantic Layers to Conversations
Turning Cube.js analytics into a natural-language interface with RAG
Introduction
In one of our previous posts, we showed how dbt, Cube, and Superset can work together to automate and standardize the semantic layer. The response from the community was overwhelmingly positive but it also raised a recurring question: why introduce Cube between dbt and Superset instead of connecting BI tools directly to the database?
The answer is consistency at scale. A single, shared semantic layer ensures that metrics, dimensions, and business logic are defined once and reused everywhere – across BI tools, APIs, and downstream applications – rather than being reimplemented in each consumer.
This naturally led us to a bigger idea. If Cube already exposes a governed semantic layer through SQL, REST, and GraphQL, why force natural-language analytics back into brittle English-to-SQL translations? What if users could converse directly with the semantic layer instead?
In this post, we explore how to turn Cube analytics into a natural-language interface using RAG. We use RAG to store and retrieve Cube metrics and dimensions, allowing each user query to be grounded in the correct semantic context before any query is generated or executed. This keeps conversations aligned with predefined business logic while remaining flexible and intuitive for users.
Let’s walk through how it works.
👉 GitHub: https://github.com/ponderedw/dbt-to-cube
Quickstart
Clone the repository:
👉 https://github.com/ponderedw/dbt-to-cubeCopy .private.env.template to .private.env and configure the required secrets for your LLM provider.
just all
Once everything is up and running:
Open:
http://localhost:8000/and navigate to the cube_schemas collection.
You’ll see a set of vectors representing Cube metadata – dimensions, measures, and cubes.
These embeddings are generated using the cube-to-rag library (https://pypi.org/project/cube-to-rag/), which we use to extract and embed Cube schema metadata into the vector store.
http://localhost:8501/If you’ve set STREAMLIT_PASSWORD in .private.env, use it to log in.
You’re all set, start asking questions and explore your Cube analytics through natural language.
Talking to Your Data
Let’s start simple:
“What metrics and dimensions do we have?”
Great. Now let’s explore the data itself:
“Show all available course names.”
Nice. Let’s narrow it down:
“What metrics are available for Linear Algebra?”
Now for a concrete business question:
“How many enrollments do we have across all courses?”
And finally, let’s push it a bit further with a more complex query involving filters and breakdowns:
“What is the student engagement score for the Data Structures and Algorithms course, broken down by semester?”
And just like that, everything works. Metrics stay consistent, dimensions are respected, and the conversation maps cleanly to the semantic layer
Conclusion
This experiment shows one possible direction for moving from static semantic layers to truly conversational analytics. By grounding natural-language queries in Cube’s governed semantic layer using RAG, we avoid many of the pitfalls of naive text-to-SQL approaches: inconsistent metrics, broken joins, and logic that drifts away from business definitions. Instead, users can explore data conversationally while staying aligned with a single source of truth.
That said, this is not the solution – it’s our attempt at solving a real and growing problem. There are open questions around performance, UX, evaluation, security, and how this approach compares to alternatives such as agent-driven query planning, stricter semantic parsers, or hybrid BI experiences. We’re certain there are improvements to be made, edge cases we haven’t covered, and ideas we haven’t considered.
We’d love to hear what others think about this: What works? What doesn’t? What would you do differently? If you’re experimenting with similar approaches – or taking a completely different path – your feedback, suggestions, and alternatives are more than welcome.












This artticle comes at the perfect time. The vision of direct semantic conversations is insightful.