Replacing a SQL analyst with 26 recursive GPT prompts
January 18, 2023
When I was at Square and the team was smaller we had a dreaded “analytics on-call” rotation. It was strictly rotated on a weekly basis, and if it was your turn up you knew you would get very little “real” work done that week and spend most of your time fielding ad-hoc questions from the various product and operations teams at the company (SQL monkeying, we called it). There was cutthroat competition for manager roles on the analytics team and I think this was entirely the result of managers being exempted from this rotation -- no status prize could rival the carrot of not doing on-call work.
So, the idea that something like analytics on-call could be entirely replaced by a next-token optimizer like GPT-3 has been an idea I’ve been excited to try for a long time now. I finally got a chance to sit down and try it out this week.
We’re about to kick off our fundraising process, so I to test the idea I attempted to build a free-form question answering analytics bot on top of the Crunchbase data set of investors, companies, and fundraising rounds. I call it CrunchBot.
Read on to get the details, but here is a quick example of it getting a simple thing exactly right:
And an example of it getting something more complex exactly right:
And an example of it getting something complex terrifically wrong (watch your join conditions davinci-003!):
And something of medium complexity mostly right (it’s so close here, needs an
ilike instead of
like, which it messed up despite a specific prompt telling it to watch out for this mistake):
Overall I was dumbfounded with the quality of the results. LLMs are a continual source of blown minds, but this is shockingly close to replacing an entire role at companies with only a few hours of effort.
How I built it
My plan was to build it in Patterns Studio. There were mostly four parts
Build the prompt from
schemas and sample data of the available tables
Run it through various GPT models and get 5+ completions of raw SQL
Execute the SQL against the relevant tables, pick the best result
Hook it up to a Slack bot
By the end I had a much more complex pipeline as I kept quickly finding prompt improvements. Specifically the SQL generation and execution part became a loop of:
Generate a candidate query
Quality check the SQL with GPT itself, asking it to spot common errors (NULLS LAST, for instance) and produce a correct version of the query
Run the SQL against the tables
If there was an error or no result, ask GPT to fix the query to produce a correct result and repeat the loop
Otherwise, return the result
This led to completion chains of over 20 calls to GPT for a single user question. There feels like some logarithmic improvement in each GPT call — you can continue to add more context and checks with every completion, exploring different versions and iterating on the result. This is the same process that a junior analyst would use to arrive at the answer, in this case it takes 15 seconds and costs $1 in credits vs $50 and 1 hour for the analyst. You have a LOT of leeway there to use even more crazy prompt pipelines before the ROI gets bad.
Anyways, here’s the step-by-step of how I built it. You can follow along in this template app on Patterns that has all the code (no data it in, Crunchbase is a proprietary data set — reach out firstname.lastname@example.org if you are interested in exploring this specific data set).
1. Get the Crunchbase data
The full Crunchbase dataset includes 2.4m organizations and 510k funding rounds. We ingest this via the full CSV dump every 24 hours into our postgres instance. For this analysis we restricted it to three tables: organizations, funding_rounds, and investments.
2. Build the initial prompt
The initial SQL generation prompt includes three basic elements: a summary of the tables and data available to query, the user’s question, and a prompt asking GPT to write a correct Postgres query. Here’s the exact template we used:
Which results in this full prompt, for example:
3. Double check the query
I found GPT made some common mistakes over and over again (the same ones any analyst would make), so I gave it a specific prompt to review each query and fix any bugs before doing anything else:
4. Run the generated SQL against the database, fix any errors
Next we try to run the SQL against the database. If it produces a result, we store the result and query. If it produces no result or an error, we ask GPT to fix the SQL:
Here’s an example of this step working well, with the following query. Can you spot the error?
Which was then fixed with this new query:
5. Repeat N times
Repeat the above completion loop as many times as we want to get a set of queries and results. I chose to stop this loop once I got a result that matched a previous result, or I hit my limit of max tries (5 in this case).
6. Build the Slack bot
I used the off-the-shelf Patterns template for Slack bots and hooked it up. I added a second slack node for acknowledging the original Slack mention quickly, since the full GPT pipeline can take 20 seconds plus.
It seems like there’s almost no limit to how good GPT could get at this. The remaining failure modes were ones of either missing context or not enough iterative debugging (the kind that is the beating heart of any analytics endeavor). For instance, GPT had to guess the exact names and spellings of categories or company names when answering questions. If it was allowed to run some “research” queries first, it could have been able to build enough context for a final prompt and query to get to the right answer. Playing around with GPT at this level you get the feeling that “recursive GPT” is very close to AGI. You could even ask GPT to reinforcement learn itself, adding new prompts based on fixes to previous questions. Of course, who knows what will happen to all this when GPT-4 drops.