These 12 prompts cover the work an analyst actually does: writing and debugging SQL, scoping an analysis, picking a statistical method, speccing charts and dashboards, turning a result into an executive summary, documenting your data, and prepping for the questions stakeholders will ask. Each one is annotated, uses [BRACKET] tokens for your inputs, and is ready to paste into Claude. Remember that Claude drafts and reasons; it does not run queries against your warehouse, and it cannot see your data unless you give it. Keep your judgment on every output.
You are a senior analytics engineer who writes clean, readable [DIALECT, e.g. BigQuery / Snowflake / Postgres] SQL. The question I need to answer: [PLAIN-ENGLISH QUESTION, e.g. "monthly active accounts by plan tier for the last 6 months"] Tables available (schema only, no real rows): [PASTE: table name, column names, types, and a one-line description of each table and its grain] Write the query: 1. Output the SQL with clear CTEs and comments explaining each step. 2. State the assumptions you made about grain, joins, date boundaries, and how you defined the key metric. List anything ambiguous you had to choose for me. 3. Note any column you assumed exists or any join that could fan out the row count, and tell me how to sanity-check it. 4. Suggest a quick validation query (a row count or a spot check) I can run first to confirm the logic before I trust the numbers. Do not invent table or column names. If something critical is missing from the schema, ask rather than guessing. I will run this myself; you are drafting, not executing.
Act as a SQL performance reviewer for [DIALECT]. Here is a query that is [returning wrong results / running slowly / costing too much]. Query: [PASTE SQL] What I expected vs. what happened: [DESCRIBE THE PROBLEM, e.g. "row count doubled" or "scans 4 TB"] Relevant schema and rough table sizes: [PASTE COLUMNS, TYPES, APPROX ROW COUNTS, PARTITION OR CLUSTER KEYS] Do this: 1. If the results look wrong, find the likely cause: a fan-out join, a missing GROUP BY column, a NULL-handling bug, a date boundary off by one, or a duplicated source row. Explain the bug in one sentence before fixing it. 2. Rewrite the query correctly and explain each change. 3. For performance, point to the specific costly steps (full scans, non-sargable filters, unnecessary DISTINCT, exploding joins) and give a faster version that returns the same result. 4. List what I should check to confirm the rewrite is equivalent (compare row counts and a few aggregates against the original). Show the before-and-after clearly. Do not change the meaning of the query while optimizing it; correctness first, speed second.
You are a staff data analyst helping me scope an exploratory analysis before I write any code. The question or hypothesis: [WHAT WE WANT TO LEARN, e.g. "why did signups drop in EMEA last quarter"] What I have to work with: [DATASETS / TABLES AVAILABLE, KEY COLUMNS, TIME RANGE, GRAIN] Decision this feeds: [WHAT SOMEONE WILL DO WITH THE ANSWER] Produce an analysis plan: 1. Restate the question sharply, and list the sub-questions I need to answer to address it. 2. Lay out a step-by-step plan: which slices to look at first, which segments and time windows, and the order that gets to signal fastest. 3. For each step, name the specific cut of data and what result would confirm or kill the hypothesis. 4. Call out the data-quality checks I should run before trusting anything (nulls, duplicates, definition changes, sampling). 5. Flag the traps: confounders, Simpson's paradox risk, seasonality, and any segment that logs differently. 6. Define "done": what evidence would let me confidently answer the question. Keep it practical and ordered by leverage. Do not assume facts about my data that I did not give you.
Act as a pragmatic statistician. Help me choose the right method for this question, then tell me how to apply it correctly. The question: [WHAT I WANT TO KNOW, e.g. "did the new onboarding change 30-day retention"] The data: [OUTCOME VARIABLE AND TYPE (binary / continuous / count), KEY INPUTS, SAMPLE SIZE, HOW IT WAS COLLECTED, ANY PAIRING OR TIME STRUCTURE] Constraints: [WHAT I CAN AND CANNOT RUN, TOOLS AVAILABLE] Do this: 1. Recommend the most appropriate method (t-test, chi-square, regression, logistic regression, survival analysis, nonparametric alternative, etc.) and explain in plain language why it fits this data and question. 2. State the assumptions the method makes and how I should check each one against my data. 3. Name a simpler fallback and a more robust alternative, and when I would switch. 4. Tell me what the result will and will not let me claim, especially correlation vs. causation. 5. Write the formula or the code skeleton (in [R / Python / SQL]) so I can run it myself. Be honest about uncertainty and the limits of the data. Do not produce any specific statistic or p-value, since you cannot see my data; I will compute it and can bring the output back for interpretation.
You are a data visualization expert. Help me design the right chart for this. The message the chart must land: [THE ONE TAKEAWAY, e.g. "churn is concentrated in the first 30 days"] The data behind it: [COLUMNS, GRAIN, NUMBER OF SERIES OR CATEGORIES, TIME RANGE] Audience and medium: [e.g. exec slide / dashboard tile / report PDF] Tool I will build it in: [Tableau / Looker / matplotlib / Excel / other] Spec the chart: 1. Recommend the chart type that communicates the message most directly, and say why it beats the obvious alternatives for this case. 2. Specify the encoding: what goes on each axis, color, size, and order; what to sort by; and what to leave off. 3. Recommend the title (state the takeaway, not the dimensions), axis labels, and any annotation or reference line that makes the point obvious. 4. Call out the ways this chart could mislead (truncated axis, dual axes, overplotting, misleading aggregation) and how to avoid them. 5. Give me the build steps or a code skeleton for my tool. Favor clarity over cleverness. One chart, one message. Do not suggest a chart type that hides the comparison the message depends on.
Act as an analytics product designer. Help me spec a dashboard before I build it in [Looker / Tableau / Power BI / other]. Who uses it and the decision it supports: [PRIMARY AUDIENCE, THE WEEKLY OR DAILY DECISION THEY MAKE] Questions it must answer: [LIST THE 3 TO 7 QUESTIONS] Data available: [TABLES, KEY METRICS, GRAIN, REFRESH FREQUENCY] Produce a dashboard spec: 1. A short list of the headline metrics that belong at the top, with a one-line definition of each so there is no ambiguity later. 2. A tile-by-tile layout: for each tile, the question it answers, the chart type, the metric and dimensions, and the underlying query in plain terms. 3. The filters and date controls the audience actually needs, and sensible defaults. 4. What to deliberately leave out so the dashboard stays focused on the decision. 5. Definitions and edge cases to nail down before building (how each metric handles nulls, late data, and timezone). Design for the decision, not for completeness. Flag any metric whose definition you would confirm with the business before committing it to the dashboard.
You are helping me turn a finished analysis into a tight executive summary. The audience reads the summary, not the SQL. The question I set out to answer: [QUESTION] What I found (paste the key results, already aggregated, not raw rows): [PASTE FINDINGS, NUMBERS, AND ANY CAVEATS] Who reads this and what they decide: [AUDIENCE AND DECISION] Write: 1. A 3 to 5 sentence summary that leads with the answer and the "so what," in plain language a busy executive skims in 30 seconds. 2. The 2 or 3 findings that actually matter, each with the number and one line of why it matters. 3. The recommendation or decision the data supports, stated plainly, and the confidence level. 4. The honest caveats: what the data does not tell us and where the analysis could be wrong. 5. The suggested next step. Rules: use only the numbers I gave you and do not invent any figures. Lead with the conclusion, not the methodology. Never use em dashes or en dashes. If a finding undercuts the headline, surface it rather than smoothing it over.
Help me write clear documentation for a dataset so the next person uses it correctly. Table or model: [NAME AND PURPOSE] Columns (paste: name, type, and whatever you know about each, even if rough): [PASTE COLUMN LIST] What I know about how it is built: [SOURCE, GRAIN, REFRESH, KNOWN QUIRKS] Produce: 1. A one-paragraph description of the table: what one row represents (the grain), where it comes from, and how often it updates. 2. A column dictionary: for each column, a plain-English definition, the type, allowed or expected values, and any gotcha (nullable, deprecated, units, timezone, how it is calculated). 3. A "read this first" section listing the traps an analyst would fall into: columns that look similar but are not, joins that fan out, statuses that are easy to misread. 4. 2 or 3 example questions this table answers well, and a couple it cannot answer on its own. 5. Suggested dbt or YAML-style tests worth adding (uniqueness, not-null, accepted values) based on what the grain implies. Only document what I told you. Where you are inferring a definition, mark it clearly as a guess to confirm rather than stating it as fact.
Act as a skeptical stakeholder reviewing my analysis. I want to rehearse the hard questions before the meeting. What I am presenting (paste the key numbers, the story I plan to tell, and any soft spots I already know about): [PASTE] Audience and the decision on the table: [WHO, AND WHAT THEY WILL DECIDE] Do this: 1. Generate the 10 hardest, most specific questions a sharp stakeholder would ask about this analysis: about the data, the method, the definitions, and the conclusion. 2. For each, give me a strong, honest answer, and flag where my current analysis does not actually support a clean answer. 3. Name the questions I am least prepared for and tell me exactly what to go check or recompute before the meeting. 4. Point out any place where my narrative gets ahead of the evidence, or where I am implying causation I have not earned. Be adversarial and concrete. Do not lob softballs. If a number invites a follow-up I would not want, surface it now so it does not surprise me in the room.
Act as an experimentation analyst. Help me design an A/B test, then later help me read it out correctly. The change being tested: [WHAT VARIES BETWEEN CONTROL AND TREATMENT] Primary metric and type: [e.g. signup conversion (binary), revenue per user (continuous)] Guardrail metrics I must not harm: [LIST] Traffic and constraints: [APPROX USERS PER WEEK, HOW LONG IT CAN RUN, UNIT OF RANDOMIZATION] For design, do this: 1. State the hypothesis and the primary success metric precisely. 2. Estimate the sample size and runtime I need for a [e.g. 5%] relative effect at [confidence and power], and show the inputs so I can adjust them. 3. Flag design risks: novelty effects, contamination between groups, the right randomization unit, and what would break the test. 4. List the guardrails to watch and the pre-registered decision rule (what result means ship, hold, or kill). When I come back with results, switch to readout mode: check whether we hit the required sample, interpret the effect and its uncertainty, warn me against peeking and against reading significance into guardrails, and tell me plainly whether the data supports shipping. Do not declare a winner the data does not support. I will provide the actual numbers; you will not invent them.
You are a retention analyst. Help me frame a cohort analysis correctly before I build it. What I want to understand: [e.g. "whether new-user retention is improving" or "which acquisition channel retains best"] The data: [EVENT OR ACTIVITY TABLE, KEY COLUMNS, USER OR ACCOUNT ID, TIMESTAMP, GRAIN] How we define an active or retained user: [DEFINITION, OR ASK ME TO DECIDE] Produce: 1. Recommend how to define the cohort (signup week or month, first-purchase date, channel) and the activity that counts as "retained," and explain the tradeoffs of each choice. 2. Lay out the cohort table structure: rows, columns, the metric in each cell, and whether to show counts or percentages. 3. Specify the SQL approach in plain terms (how to assign each user to a cohort and compute period-N retention) without assuming columns I did not give you. 4. Call out the traps: survivorship bias, unequal cohort maturity, late-arriving data, and changing definitions across the period. 5. Tell me what pattern would mean retention is genuinely improving vs. a mix-shift artifact. Be explicit about every definitional choice, because cohort results are easy to read wrong. Where a choice is mine to make, lay out the options rather than silently picking one.
Act as an analyst running a root-cause investigation on a metric that moved unexpectedly. The metric and the move: [WHAT METRIC, FROM WHAT TO WHAT, OVER WHAT PERIOD] What I know so far: [ANY CONTEXT: releases, marketing, seasonality, known incidents] Data I can query: [TABLES AND DIMENSIONS AVAILABLE TO SLICE BY] Give me an investigation plan, ordered to rule things out fast: 1. First, the "is it even real" checks: data pipeline lag, a logging or tracking change, a definition change, double-counting, or a broken join upstream. Tell me how to test each. 2. Then the decomposition: which dimensions to slice by (segment, platform, geo, channel, new vs. returning) to localize the change, and the order that isolates it quickest. 3. Distinguish a broad-based move from a spike concentrated in one segment, and what each pattern implies about the cause. 4. A short list of the most likely explanations given my context, ranked, each with the specific query or check that would confirm or rule it out. 5. How to tell a real change from noise or normal seasonality. Work from most-likely and cheapest-to-check first. Do not assume a cause; give me the checks that earn the conclusion. I will run the queries and can bring results back.
Drop these into a Claude Project loaded with your team's context: your warehouse schema and table descriptions, your metric definitions, your dbt models, your style guide for charts, and examples of analyses you are proud of. With those inputs, the prompts above produce drafts far closer to ready than running them in a blank Claude chat. The model writes better SQL when it can see your column names and better summaries when it knows your audience.
Read more guides on the Treetop blog, or start with an AI Audit to map where Claude fits your analytics workflow.