Text to SQL and its uncanny valley

April 26, 2024

Text to SQL and its uncanny valley

If you’re building for technical people, good luck. If you’re building for non-technical people, you’re going to need a lot more than text-to-SQL. Transformative text-to-SQL is more like text-to-analysis and works at the company layer; empowering anyone, technical or not, to work better from their data’s insights.

Democratic access to data isn’t a hot take in any organization. No matter your role - analyst, intern, CEO - you should have (secure, monitored) access to your company’s database to help you do your job more effectively.

Problem is, not everyone can (or should) learn how to query it. SQL is one of the most approachable query languages, but it still has a learning curve. 

Do you really want the SVP of sales to spend her weekend on a Udemy course for SQL? She can, but shouldn’t have to. And there are only so many members of the data team who can field questions, day after day.

Enter text-to-SQL, a new capability unlocked by generative AI. It’s exactly what it sounds like - take a prompt (“Please show me a breakdown of sales by region for our Black Friday launch”), use AI to convert that to SQL, query the database, return the information, have AI understand it, and send it back to the frontend.

This capability is, at best, a nice-to-have to your data analytics team; it’s as easy as pushing a button to add a generative AI layer to their Snowflake or Databricks instance. If you’re not one of those types of companies, it’s hard to see any point in building that solution. Snowflake owns the distribution channel, which means there’s no onboarding, new contract, implementation, etc. 

That leaves non-technical people. Managers, directors, marketers, salespeople. People who can use data insights, but didn’t necessarily learn how to talk to a database. For them, the potential is transformative.

Let’s explore how to solve their problems.

The problems with moving from human to AI

Every other startup just wants to add a generative AI layer to their existing product, or otherwise throw generative AI at the problem. For customer support or copywriting, this might work. 

Data insights are a bit trickier.

Partner teams are used to talking to humans

Forget about dashboards. Most of the time an executive gets data, it’s pushed to them; they don’t have to ask. But when they want answers from the data team, it’s an ad-hoc request. An ad-hoc request to a human who understands the context. 

How do you think your CEO will feel when he asks your AI bot “How are sales this past week in CA?” and your bot returns…nothing. Oops, your database encodes sales location by their city AND state, not just state.

Let’s say the AI understands, but writes and returns a single value, $12,343. How’s that compared to last week? And the week before? Did we have any sales spikes during the week? We ran that brand campaign, did that have any impact?

Humans have (and give) context. AI doesn’t

Any good analyst would read through the question askers intent, ask clarifying questions, and provide supplementary context when answering the question. Not just how the data was pulled, but provide value in the form of logical reasoning on the data, advancing the analysis unprompted by generating and answering follow-up questions.

Humans understand the intent behind the requests - who’s asking, what additional information is valuable, and who the end audience will be. Jane in data has worked with Steve from accounting for three years, and knows that the CFO prefers his sales data as bar charts.

Macro context matters too! Was this question just asked last week by someone else? Can we just pull up that dashboard? Does your text-to-SQL tool store a repository of all its conversations, where queries have been validated and retrieve them. Does it integrate with Tabelau and in a subprocess check all dashboards there as well?

Humans also deal with ambiguity. In order to handle the complexity of real-world data requests from executives with finesse, your bot can’t should never return an error. Imagine if a CEO asked for a data pull and someone you just hired responded with the SQL error message.

Structuring queries misses the point

Even if your VP of marketing crafts the perfect query, it misses the point. 

A query like “Please generate a chart of hourly sales for the past 4 weeks and highlight any of the sales that have been attributed to instagram.” is approaching data scientist territory. 

You’re closer to “development tool for data scientists” than you are to “extension of the data science team that can field questions from non-technical people. The closer to the former you are, the more you’re competing against a free feature that Snowflake bakes into every customer’s contract. 

The right way for organizations to capitalize on text to SQL

Generative AI handles the text-to-SQL part (mostly) well. It’s far from sitting in the same seat as a data scientist. 

Fortunately, there’s a few behaviors we can add to make that happen.

Solutions should be self-monitoring and self-evaluating

LLMs need to behave more like a system and less like a component. 

If they get a sales question, send the response, and then get five clarifying questions and follow-ups, that’s a data point that their initial response missed the mark.

LLMs that can monitor for this behavior and evaluate against a success criteria can actually integrate into a team.

Once they self-evaluate, they should be able to correct their behavior

There’s not much point to evaluation if the end result is a bunch of fine-tuning on the data scientist’s part. 

Self-evaluation and adjusting the approach (from output to data ingestion) helps avoid repeat errors that will just cause the VP to talk to a human. Remember, data analysts don’t respond with an error code when someone asks them a question.

Let the LLM behave more like an analyst

A good human analyst has an entire internal dialogue. Vanilla GPT works on single thoughts.

We can’t expect an LLM to perform analytics-like operations without acting like a human. Things like: 

  1. Multiple generations chained together

  2. Progressively acquiring and analyzing new information

  3. Generating and evaluating hypotheses

  4. Critiquing and evaluating work

In a word, judgment.

From the CIA’s Intelligence handbook: https://www.ialeia.org/docs/Psychology_of_Intelligence_Analysis.pdf

Judgment is what analysts use to fill gaps in their knowledge. It entails going beyond the available information and is the principal means of coping with uncertainty. It always involves an analytical leap, from the known into the uncertain.

Judgment is an integral part of all intelligence analysis. While the optimal goal of intelligence collection is complete knowledge, this goal is seldom reached in practice. Almost by definition of the intelligence mission, intelligence issues involve considerable uncertainty. Thus, the analyst is commonly working with incomplete, ambiguous, and often contradictory data. The intelligence analyst’s function might be described as transcending the limits of incomplete information through the exercise of analytical judgment.

If this is what we expect from our AI Analyst, we’re going to need a whole lot more than stuffing a prompt with context and data and getting a result.

Dig into the technical implementation

We don’t have to stop at the first iteration of our design.

We can improve the performance of the LLM by giving it access to more context, like knowledge base docs about the dataset. This is called retrieval-augmented generation, and as long as we generate embeddings of the dataset to add, we have an accurate retrieval system the LLM can rely on. This isn’t limited to documents, either - we can create functions around common tasks, which the agents can call (instead of giving context each time).

Fine-tuning the model on the dataset and questions we’re asking it can improve performance, especially if new questions come in frequently. A simple technique called reinforcement learning from human feedback (RLHF) can be built-in to the interface. Ask a question, get back two answers, tell the model which one answers your question better. The model takes that input, and learns what a good answer looks like.

Finally, a multi-agent approach can solve problems of increasing complexity by allowing several agents to work together, splitting up duties. This increases the scope of questions we can answer, and potentially cuts down on time, as we can multithread work. Even one meta-agent overseeing progress and making judgment calls can help guide the LLM.

Conclusion

A generative AI layer to Databricks might work for your technical data scientists, but that’s only going to make their job (and the organization’s) marginally easier.

Transformative text-to-SQL is more like text-to-analysis and works at the company layer; empowering anyone, technical or not, to work better from their data’s insights.

With the right approach, it’s closer than you think.

If you’re interested in learning more, Patterns is building exactly that.

Analytics in natural language