ChatGPT API on Shopify for Customer Support: Setup to Production (With the Gotchas I Hit)

The Problem: Your Shopify Support Queue Is a Mess and Hiring More Agents Isn’t the Answer

What You’re Actually Dealing With

Here’s what a messy Shopify support queue looks like in practice: 200 tickets come in on a Monday, and when you actually categorize them, about 160 of them are some variation of “where is my package” or “I need to return this.” Your agents are copying order numbers into the Shopify admin, reading back shipping status, and pasting a returns policy link — on repeat, for eight hours straight. That’s not support work. That’s data retrieval with a human in the middle, and it’s expensive, slow, and demoralizing for anyone who actually wants to help customers with complex problems.

The instinct is to hire another agent. I get it. But the math doesn’t work. You hire one agent, they handle maybe 40–60 tickets per day if the queue is simple. The volume grows because your store grows. You’re back in the same hole in three months, except now you have payroll to cover. The real fix is automating the 80% that’s mechanical so your human agents can focus on the 20% that actually requires judgment — chargebacks, damaged items, angry VIP customers, anything that needs a real conversation.

Shopify Inbox looks like the obvious starting point, and I tried it. The ceiling is low and you hit it fast. It has no native ability to look up a specific order for a specific customer mid-conversation. You can set up canned responses and FAQ flows, but the moment a customer says “my order #4821 hasn’t moved in five days,” Inbox can’t pull that order, check its fulfillment status, or give a meaningful answer. It also has zero memory between sessions — a customer who contacted you yesterday is a stranger again today. It’s fine for “here’s our return window” but useless for anything that requires actual data.

What we’re building instead is a webhook-driven bot with real access to your store data. The flow looks like this: customer sends a message, your Express server receives it, you pull the relevant order from the Shopify Admin REST API using the customer’s email or order ID, you feed that real data into a ChatGPT prompt as context, and GPT generates a response that’s actually grounded in what’s happening with their order. No hallucinated shipping dates, no generic “please allow 5–7 business days” non-answers — just a response built from live order data.

The stack I landed on after a couple of iterations:

  • Node.js 20 — native fetch, solid async/await, and the OpenAI SDK expects it
  • Express — lightweight enough that you’re not fighting the framework, good for webhook endpoints
  • OpenAI Node SDK v4 — the v4 rewrite changed the import structure significantly from v3, so if you’re following older tutorials you’ll hit breaking changes immediately
  • Shopify Admin REST API — specifically the Orders and Customers endpoints; GraphQL is available but REST is faster to prototype with for this use case
  • Railway for deployment — $5/month hobby tier handles this load easily, environment variable management is clean, and deploys from GitHub push without any config files to wrestle with

One thing that caught me off guard early: the OpenAI SDK v4 no longer exports a default client. You have to instantiate it explicitly with new OpenAI({ apiKey: process.env.OPENAI_API_KEY }). If you’ve got muscle memory from v3 or you’re adapting code from a tutorial written before mid-2023, that will break silently in ways that aren’t obvious until you actually hit the API call. The Shopify side has its own gotcha — API rate limits are 2 requests per second on the Basic plan, which sounds like plenty until you’re doing order lookups on every incoming message and you get a burst of 10 customers messaging simultaneously. We’ll handle that with a simple queue, but keep it in mind.

Architecture Before You Write a Line of Code

Skip the tutorials that start by explaining what an API is. Here’s the actual flow you need to build, drawn as a sequence you can hand to anyone on your team:

Customer message
  → Chat surface (widget, Shopify Inbox, Tidio, etc.)
  → Your Express server (the middleware you control)
  → Shopify Admin API (fetch order, customer, fulfillment data)
  → OpenAI Chat Completion API (with that data injected into the prompt)
  → Response back through your server to the customer

Every step in that chain matters. The thing that caught me off guard the first time I built this was how much of the “intelligence” has nothing to do with OpenAI — it’s about how cleanly you inject Shopify data into the prompt context. A well-structured system prompt with real order data will outperform a complex AI setup with no context every single time.

Why You Cannot Put the API Key in Your Shopify Theme

I’ve seen this attempted. Someone embeds a fetch call in a theme.liquid snippet, hardcodes the OpenAI key, and ships it. The key is visible in the browser within 30 seconds of anyone opening DevTools. OpenAI’s free tier will be gone by morning if a bot finds it, and if you’re on a paid plan, you’ll be looking at an unexpected bill. Beyond the security issue, Shopify’s Content Security Policy will likely block the outbound request to api.openai.com anyway. You need a server you own sitting in the middle — an Express app on Railway, Render, or a simple VPS. That server holds your secrets, rate-limits requests, validates that the customer exists in Shopify before you go paying for tokens, and gives you somewhere to log conversations. Budget about $7/month for a Render instance to start; that’s enough headroom until you’re doing serious volume.

Picking Your Chat Surface

You have three realistic options, and the right one depends on how much custom UI you want to own:

  • Shopify Inbox with webhooks: Zero UI work. Shopify handles the chat widget natively. You register a webhook endpoint on your server, receive conversation events, call OpenAI, and post the reply back via the Conversations API. The catch — Shopify Inbox’s webhook documentation is sparse, and the conversation_message.sent event payload has changed format at least twice without major announcement. I’d verify the current shape against a live test store before building around it.
  • Custom chat widget: A floating div with a WebSocket or SSE connection to your Express server. Full control. You can stream tokens back character-by-character using stream: true in the OpenAI call, which feels dramatically faster to users even when total latency is similar. The downside is you’re writing your own widget from scratch or stitching together something like a headless component — expect a few days of work before it looks decent on mobile.
  • Tidio or a similar third-party: Tidio has a bot builder with webhook actions, so you can hit your own server mid-conversation. Fastest to get live, but you’re paying Tidio’s pricing on top of OpenAI’s, and their webhook docs have gaps you’ll find at 11pm when something breaks. Use this if the business needs something deployed this week and the team has no bandwidth for custom UI.

My default recommendation for a serious integration: custom widget with SSE streaming. Shopify Inbox if you want to be live fast with minimal code. Avoid Tidio unless speed-to-market is the only constraint.

Stateless vs. Stateful — This Decision Will Haunt You Later

OpenAI’s Chat Completion API is stateless. Each request is independent. If you want the AI to remember that the customer mentioned their order number three messages ago, you have to send the entire conversation history in the messages array on every single request. That means you need to store message history somewhere, and you need to decide this before you write your first route handler.

For a prototype, a JSON file keyed by session ID works fine:

// sessions/abc123.json
{
  "customerId": "gid://shopify/Customer/7891234",
  "messages": [
    { "role": "user", "content": "Where is my order?" },
    { "role": "assistant", "content": "I found order #4521..." }
  ]
}

This breaks immediately under any concurrent load and has no TTL. Switch to Redis the moment you’re past a single-user test. With ioredis, you store the message array as a JSON string, set an expiry (I use 2 hours — covers most support sessions without accumulating stale data), and retrieve it at the start of each request. One gotcha: OpenAI charges per token in and out, and a long conversation history can push you over the 8K or 16K context window fast depending on which model you’re using. I cap stored history at 20 messages and summarize older context into the system prompt if the conversation goes longer. Here’s the shape of a working Redis setup:

import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL);

async function getHistory(sessionId) {
  const raw = await redis.get(`chat:${sessionId}`);
  return raw ? JSON.parse(raw) : [];
}

async function saveHistory(sessionId, messages) {
  await redis.set(
    `chat:${sessionId}`,
    JSON.stringify(messages),
    "EX",
    7200 // 2 hours
  );
}

Redis on Railway is $5/month for the starter instance. That’ll handle hundreds of concurrent sessions without breaking a sweat. If you’re already on Upstash for something else, their free tier (10K commands/day) will cover you during development and light production traffic — just know the limit is per day, not per month, so a spike during a flash sale could cause problems.

Step 1: Get Your OpenAI and Shopify Credentials Wired Up

Start with the OpenAI Key — and Use Project-Scoped Keys, Not the Legacy Ones

The thing that caught me off guard when I first set this up was that OpenAI now has two types of API keys: the old user-level keys that live under your profile, and the newer project-scoped keys under Dashboard → API Keys → Create new secret key → Owned by: Project. Use the project-scoped ones. They let you set spending limits per project, restrict which models the key can hit, and if the key leaks, you nuke just that project’s access without rotating credentials across everything else you’ve built. Log into platform.openai.com, create a project called something like shopify-support-bot, then generate a key scoped to it. Copy it immediately — you won’t see it again. OpenAI’s free tier gives you $5 in credits on new accounts, but for any real volume you’ll want to add a payment method and watch your usage dashboard. The gpt-4o-mini model we’ll use later runs at $0.15 per million input tokens — cheap enough that even a busy support queue won’t wreck your budget.

Now for Shopify. You’re going to build a Custom App, not a public app. The difference matters: public apps go through app review and live in the App Store, custom apps are private to one store and skip that whole process. Go to your Shopify Partner Dashboard → select the store → Apps → Develop apps → Create an app. Name it whatever makes sense internally, like Support Bot Integration. Once it’s created, click Configure Admin API scopes and enable exactly these two:

  • read_orders — so the bot can pull order status, tracking info, and fulfillment state
  • read_customers — so it can look up a customer’s history by email or ID

Don’t enable write scopes unless you actually need them. More scope = larger blast radius if something goes wrong. After saving the scopes, hit Install app, confirm, and Shopify will show you the Admin API access token. That’s your SHOPIFY_ACCESS_TOKEN. Again — you see it once. If you miss it, you regenerate and invalidate the old one.

Here’s your .env file. No surprises, just the three values you’ll need for everything in this guide:

OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxx
SHOPIFY_SHOP_DOMAIN=your-store.myshopify.com
SHOPIFY_ACCESS_TOKEN=shpat_xxxxxxxxxxxxxxxxxxxx

A few things about that domain value: use the .myshopify.com domain, not a custom domain. The Admin API endpoint is always built from the myshopify subdomain — https://your-store.myshopify.com/admin/api/2024-10/orders.json — so custom domains will break your requests silently. I’ve seen junior devs spend two hours debugging a 404 that was just a wrong domain format.

Never hardcode these values in source files. I know you know this, but I’m saying it again because the consequences here are specific and painful. OpenAI runs automated scanners on GitHub and will auto-rotate any key they find in a public repo — you’ll wake up to a broken bot and a 401. Shopify’s App Review team (relevant if you ever go public) will reject submissions that contain embedded credentials. Add .env to your .gitignore before your first commit, not after. Use dotenv in Node or python-dotenv in Python to load these at runtime:

# Node.js
npm install dotenv

# Python
pip install python-dotenv

Then at the top of your entry file:

// Node.js
require('dotenv').config();
const openaiKey = process.env.OPENAI_API_KEY;
const shopDomain = process.env.SHOPIFY_SHOP_DOMAIN;
const shopToken = process.env.SHOPIFY_ACCESS_TOKEN;

# Python
from dotenv import load_dotenv
import os
load_dotenv()
openai_key = os.getenv("OPENAI_API_KEY")
shop_domain = os.getenv("SHOPIFY_SHOP_DOMAIN")
shop_token = os.getenv("SHOPIFY_ACCESS_TOKEN")

If you’re deploying to Vercel, Railway, or Render, each of them has an environment variables UI in the project settings — paste the values there directly, never in a config file that gets committed. That’s your credentials layer sorted. Everything from here builds on top of these three values.

Step 2: Bootstrap the Express Server

Skip the project scaffolding ceremony — here’s the exact setup I use every time I build a ChatGPT-backed service. One command gets you a clean project, and I’ll walk through every decision so you’re not cargo-culting my structure.

mkdir shopify-support-bot && cd shopify-support-bot
npm init -y
npm install openai@^4.0.0 express dotenv axios

Pin openai to ^4.0.0 specifically. The jump from v3 to v4 was a complete API redesign — the client instantiation, streaming syntax, and error shapes all changed. Half the tutorials you’ll find on Google are still using new Configuration() and new OpenAIApi() from v3, which will throw immediately if you’re on v4. The new SDK uses new OpenAI() directly and it’s cleaner, but it will bite you if you mix docs. I’m also pulling in axios now because we’ll need it in Step 3 when we hit the Shopify Admin REST API — better to have it in the tree from day one.

Create a .env file at the root and stub out your keys before writing a single route:

OPENAI_API_KEY=sk-...
PORT=3001

Then your entry point, server.js:

require('dotenv').config();
const express = require('express');
const OpenAI = require('openai');

const app = express();
app.use(express.json());

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// In-memory conversation store — keyed by session ID
const sessions = {};

app.post('/chat', async (req, res) => {
  const { sessionId, message } = req.body;

  if (!sessionId || !message) {
    return res.status(400).json({ error: 'sessionId and message are required' });
  }

  if (!sessions[sessionId]) {
    sessions[sessionId] = [
      {
        role: 'system',
        content: 'You are a helpful customer support agent for a Shopify store. Answer questions about orders, returns, and products.'
      }
    ];
  }

  sessions[sessionId].push({ role: 'user', content: message });

  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: sessions[sessionId]
  });

  const reply = completion.choices[0].message.content;
  sessions[sessionId].push({ role: 'assistant', content: reply });

  res.json({ reply, sessionId });
});

app.listen(process.env.PORT, () => {
  console.log(`Server running on port ${process.env.PORT}`);
});

I’m using gpt-4o-mini here rather than gpt-4o. For customer support chat — where most queries are “where’s my order” or “how do I return this” — the quality difference is negligible and the cost difference is real. At the time of writing, gpt-4o-mini is $0.15 per million input tokens vs $2.50 for gpt-4o. A busy support chat sending 500-token conversations all day adds up fast. You can always swap the model string later if you find edge cases where responses need to be sharper.

Now the honest conversation about sessions = {}: this works perfectly on a single Node process and it’s the fastest way to get multi-turn context working. The thing that caught me off guard the first time I deployed something like this to Vercel was that every cold start creates a fresh object, and Vercel can spin up multiple instances simultaneously. You’ll see users mid-conversation suddenly get responses with zero context — the model has no idea they already said they’re angry about a delayed order. This is a real problem and not an edge case. I’m leaving the in-memory array here intentionally because it’s the clearest way to understand what conversation history actually is before you complicate it with Redis or DynamoDB. Step 5 replaces this with persistent storage — but understand the concept first, then swap the backing store.

Before you touch Shopify, run a smoke test. Start the server with node server.js and fire this:

curl -X POST http://localhost:3001/chat \
  -H "Content-Type: application/json" \
  -d '{"sessionId": "test-123", "message": "Hi, I need help with my order"}'

You should get a JSON response with a reply key within 1-3 seconds. If you’re seeing a 401, your API key is wrong or you forgot to load .env — double-check that dotenv.config() is the first line called. If you’re getting a timeout, OpenAI’s API occasionally has latency spikes, but anything consistently over 10 seconds means you might be on a flagged key or hitting rate limits on a free-tier account. Fire a second curl with the same sessionId and a follow-up message — the model should respond as if it remembers the first message. That’s your confirmation that context chaining is working before we wire in any Shopify data.

Step 3: Pulling Real Order Data from Shopify Admin API

GPT hallucinates order data if you let it. I’ve seen demos where the bot confidently says “your order shipped yesterday” when the order was placed five minutes ago and hasn’t even been picked. The fix is straightforward: pull real data from Shopify before you touch the OpenAI API, then inject that data into the system prompt. Never ask GPT to guess what it doesn’t know.

The endpoint you want for order lookup is GET /admin/api/2024-01/orders.json. You can query by email or order number. Here’s what both look like in practice:

# Look up by customer email
GET https://{shop}.myshopify.com/admin/api/2024-01/[email protected]&status=any

# Look up by order number (note: this is the display number like #1042, not the internal ID)
GET https://{shop}.myshopify.com/admin/api/2024-01/orders.json?name=%231042&status=any

Pass your access token in the header as X-Shopify-Access-Token: {your_token}. The status=any param is non-obvious but important — without it, the default filter only returns open orders and you’ll miss fulfilled or cancelled ones. Took me an embarrassing amount of time to figure that out when a customer complained the bot kept saying their order “doesn’t exist.”

The response object is dense. Here’s the specific fields you want to extract for a support context:

  • fulfillment_status — can be null (unfulfilled), partial, or fulfilled
  • fulfillments[0].tracking_number and fulfillments[0].tracking_url — lives nested inside the fulfillments array
  • line_items[] — map over these to get product names, quantities, and SKUs
  • refunds[] — if this array has entries, a return was already initiated
  • created_at — use this to calculate return eligibility against your return window policy

Once you have that data, build a context block and stuff it into your system prompt. Here’s the pattern I use in Node.js:

const orderContext = `
CUSTOMER ORDER DATA (authoritative, do not guess beyond this):
Order: ${order.name}
Status: ${order.fulfillment_status ?? 'unfulfilled'}
Tracking: ${order.fulfillments?.[0]?.tracking_number ?? 'not yet assigned'}
Tracking URL: ${order.fulfillments?.[0]?.tracking_url ?? 'N/A'}
Items: ${order.line_items.map(i => `${i.name} x${i.quantity}`).join(', ')}
Return eligible: ${isReturnEligible(order.created_at) ? 'Yes' : 'No - outside 30-day window'}
`;

const systemPrompt = `You are a customer support agent for Shopify store [StoreName]. 
Only answer based on the data provided below. If you don't know something, say so.

${orderContext}`;

Now the gotcha that will wreck you under any real load: Shopify’s Basic plan caps you at 2 REST API requests per second, and they enforce it with 429 responses. You get a Retry-After header in the 429, which tells you exactly how many seconds to wait — use it. I spent a day wondering why our queue was silently dropping requests before I added proper retry logic. Here’s a minimal exponential backoff wrapper:

async function shopifyFetch(url, options, attempt = 0) {
  const res = await fetch(url, options);

  if (res.status === 429) {
    const retryAfter = parseInt(res.headers.get('Retry-After') ?? '1', 10);
    const delay = Math.max(retryAfter * 1000, Math.pow(2, attempt) * 500);
    await new Promise(r => setTimeout(r, delay));
    return shopifyFetch(url, options, attempt + 1);
  }

  if (!res.ok) throw new Error(`Shopify API error: ${res.status}`);
  return res.json();
}

Cap your retries at 4 or 5 attempts and throw after that — you don’t want infinite loops burning through your function invocations. If you’re on Shopify Advanced or Plus, the limit jumps to 4 req/sec and you get access to the GraphQL Admin API which has a proper leaky-bucket model. The REST API on Basic is genuinely painful at scale, so if you’re expecting high support volume, factor an upgrade into your architecture decision early rather than retrofitting later.

Step 4: Writing the System Prompt That Actually Works

The system prompt is where most Shopify integrations fall apart. Developers spend hours on the API plumbing and then throw together a two-line prompt like "You are a helpful customer support agent for a Shopify store." — and then wonder why the bot promises free overnight shipping to angry customers. The prompt is the policy layer. Get it wrong and you’re not just getting bad UX, you’re creating liability.

I structure every support system prompt in four explicit blocks, in this order: role definition → store policies → injected order data → escalation rules. The order matters because the model weights earlier context more heavily when it’s processing a long prompt. You want the constraints established before it sees the customer’s order details. Here’s what a production-ready version looks like:

SYSTEM_PROMPT = """
You are a customer support agent for Brew & Bask, a Shopify store selling specialty coffee equipment.
Your job is to help customers with order status, returns, and product questions.

POLICIES (never deviate from these):
- Returns are accepted within 30 days of delivery with original packaging only.
- Do NOT promise refunds outside this window under any circumstances.
- Shipping estimates are 3-5 business days for standard, 1-2 for expedited.
- Do NOT guarantee specific delivery dates. Say "typically" or "estimated".
- Discount codes cannot be applied retroactively to completed orders.

ORDER DATA (current customer):
Order #{order_id} | Status: {fulfillment_status} | Placed: {created_at}
Items: {line_items_summary}
Tracking: {tracking_number} | Carrier: {shipping_carrier}

ESCALATION RULES:
- If the customer expresses anger, frustration, or uses aggressive language, respond with one empathetic message then say:
  "I'm going to connect you with a human agent right now who can help further."
  Then output the token [ESCALATE] on its own line so the system can route this ticket.
- If the customer asks about a legal dispute, chargeback, or mentions a lawyer, immediately output [ESCALATE].
- If the issue cannot be resolved with the policies above, output [ESCALATE].

Tone: Direct, warm, no corporate filler. No "Certainly!" or "Absolutely!". Keep responses under 120 words.
"""

The [ESCALATE] token approach is something I landed on after trying webhooks and confidence scores — both were more complex than needed. You parse the response string server-side and check for that token before sending it to the customer. Simple, reliable, and it means your escalation logic lives in one place. The policy block about refunds is the exact kind of thing that will bite you without explicit instruction. GPT will try to be helpful and say something like “I can check if an exception is possible” — which is exactly what you don’t want your bot saying at 2am when no one’s monitoring.

Use gpt-4o-mini for this, not gpt-4o. I switched after running both side-by-side on a few hundred support tickets and the quality difference for structured, policy-constrained responses was negligible. The price difference is not: as of mid-2025, gpt-4o-mini runs at $0.15/1M input tokens and $0.60/1M output tokens versus $2.50/$10.00 for gpt-4o. At any real support volume that’s a meaningful cost gap for genuinely comparable output. The model call looks like this:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    temperature=0.3,
    max_tokens=300,
    messages=[
        {"role": "system", "content": filled_system_prompt},
        *conversation_history,
        {"role": "user", "content": customer_message}
    ]
)

Temperature at 0.3 is non-negotiable for support automation. I’ve seen people use the default 1.0 and the responses vary wildly — same question, different phrasing, different answer. That’s a support disaster. You want the bot to tell every customer the same refund policy in almost the same words. Lower temperature = higher consistency. You can push it to 0.1 if you want even tighter control, but responses start feeling robotic below that. 0.3 is the sweet spot.

Token budget is the gotcha that nobody warns you about until the invoice arrives. Your ceiling should be 4,000 tokens for the combined payload — system prompt plus order data plus conversation history. The thing that caught me off guard was how fast conversation history grows if you’re passing the full thread. A five-message back-and-forth with a detailed order can eat 800-1000 tokens easily. I keep the system prompt under 500 tokens, the injected order data under 300, and then trim conversation history to the last 6 messages maximum. Implement it like this:

MAX_HISTORY_MESSAGES = 6

def build_messages(system_prompt, history, new_message):
    trimmed_history = history[-MAX_HISTORY_MESSAGES:]
    return [
        {"role": "system", "content": system_prompt},
        *trimmed_history,
        {"role": "user", "content": new_message}
    ]

One last thing: don’t inject raw order objects from the Shopify API into your prompt. They’re verbose and will blow your token budget immediately. Pre-process the order data into a compact summary string before it goes into the prompt. Pull only what the support agent actually needs — order ID, status, line item names and quantities, tracking number, and delivery estimate. Everything else is noise that costs you money and dilutes the model’s attention on the actual policy instructions.

Step 5: Persisting Conversation History So Context Doesn’t Break Between Messages

Your conversation array lives in memory — until your function restarts and it’s gone

The first version of my Shopify support bot stored conversation history in a plain JavaScript array scoped to the request handler. It worked fine in local testing. Then I deployed to Vercel and watched it fall apart immediately. Serverless functions spin up a fresh execution context on every invocation. There’s no persistent process holding your array between the customer’s first message and their second. Same problem hits you on Railway if you have two instances running behind a load balancer — instance A handled message one, instance B handles message two and has no idea what was said. Even a single-instance Node server loses everything on restart or crash. In-memory state for multi-turn conversation is a trap that looks fine until it doesn’t.

The quickest fix that actually works in production is Redis with a session ID as the key. I switched to this after about three days of debugging why the bot kept greeting returning customers as if they’d never spoken before. The pattern is simple: generate a sessionId when the chat widget initializes (a UUID stored in the customer’s browser via localStorage), send it with every message, and use it to look up and update the conversation array in Redis. Set a TTL of 7200 seconds (2 hours) — that’s enough for any realistic support conversation, and it means you’re not paying to store abandoned sessions indefinitely.

For hosting, I’ve used both Railway and Upstash for this exact use case. Upstash is the easier choice here — their free tier gives you 10,000 requests per day and 256MB storage, which is plenty for a small Shopify store, and they have a REST API if you’re deep in serverless and want to skip the persistent connection overhead. Railway’s Redis is faster to spin up if you’re already deploying your backend there (it’s literally one click), but the free tier is more limited and the pricing gets less predictable at scale. For most stores I’d start with Upstash. Install the client:

npm install ioredis

Then connect like this — if you’re on Upstash, grab the Redis URL and token from their dashboard and use the TLS URL:

import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

const SESSION_TTL = 7200; // 2 hours in seconds

async function getConversation(sessionId: string) {
  const data = await redis.get(`chat:${sessionId}`);
  if (!data) {
    return [
      {
        role: 'system',
        content: 'You are a customer support agent for [Store Name]. You help customers with order status, returns, and product questions. Be concise and friendly.'
      }
    ];
  }
  return JSON.parse(data);
}

async function saveConversation(sessionId: string, messages: Message[]) {
  await redis.set(`chat:${sessionId}`, JSON.stringify(messages), 'EX', SESSION_TTL);
}

The message array schema follows what OpenAI expects directly — each object has a role field (system, user, or assistant) and a content field. You load the array at the start of each request, append the new user message, send the full array to the API, then append the assistant’s response and save the whole thing back. That’s it. No clever diffing, no summarization yet — just append and persist:

async function handleMessage(sessionId: string, userMessage: string) {
  const messages = await getConversation(sessionId);

  messages.push({ role: 'user', content: userMessage });

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: messages,
  });

  const assistantMessage = response.choices[0].message.content;
  messages.push({ role: 'assistant', content: assistantMessage });

  await saveConversation(sessionId, messages);

  return assistantMessage;
}

One thing that caught me off guard: the conversation array grows with every turn, and OpenAI’s context window isn’t infinite. GPT-4o mini has a 128K token context limit which sounds enormous, but a long customer service session with detailed order info can creep up faster than you’d expect. Add a guard that trims the array when it exceeds 20 turns — keep the system message at index 0, drop the oldest user/assistant pairs from index 1 onwards. You don’t need a fancy summarization strategy for v1. Just trim. Once you’re handling real traffic and seeing which conversations actually run long, you’ll have real data to decide whether smarter context management is worth building.

Step 6: Connecting the Bot to Shopify Inbox via Webhooks

The HMAC verification step is where I see most people cut corners — and it’s the one that will burn you. Before you write a single line of logic to process incoming messages, you need to validate that the webhook is actually coming from Shopify and not some bad actor who figured out your endpoint URL. Shopify signs every webhook with an HMAC-SHA256 hash using your webhook secret. Skip the check and you’ve handed anyone on the internet a direct line into your ChatGPT calls and your Conversations API writes.

First, register the webhook. Go to Shopify Admin → Settings → Notifications → Webhooks, or do it via API if you’re automating your setup:

POST https://{shop}.myshopify.com/admin/api/2024-01/webhooks.json
Content-Type: application/json
X-Shopify-Access-Token: {your_access_token}

{
  "webhook": {
    "topic": "customer_messages/create",
    "address": "https://yourdomain.com/webhooks/shopify-inbox",
    "format": "json"
  }
}

Note the topic is customer_messages/create, not the older customer_message you’ll find in some outdated tutorials. Shopify updated the topic naming and the old one silently fails to fire in newer API versions — the thing that caught me off guard was that the webhook registration endpoint returns a 201 with no error, so you spend an hour wondering why messages aren’t coming through.

Now for the HMAC check. Every incoming request from Shopify includes an X-Shopify-Hmac-Sha256 header. You compute your own HMAC using your webhook secret and the raw request body, then compare. The critical detail: you must use the raw bytes of the body before any JSON parsing. Here’s what this looks like in Node.js:

const crypto = require('crypto');

function verifyShopifyWebhook(req, secret) {
  const hmacHeader = req.headers['x-shopify-hmac-sha256'];
  const body = req.rawBody; // must be raw Buffer, not parsed JSON
  const digest = crypto
    .createHmac('sha256', secret)
    .update(body)
    .digest('base64');
  return crypto.timingSafeEqual(
    Buffer.from(digest),
    Buffer.from(hmacHeader)
  );
}

app.post('/webhooks/shopify-inbox', express.raw({ type: 'application/json' }), (req, res) => {
  if (!verifyShopifyWebhook(req, process.env.SHOPIFY_WEBHOOK_SECRET)) {
    return res.status(401).send('Unauthorized');
  }
  // safe to process now
  res.status(200).send('OK'); // respond fast, process async
});

Use crypto.timingSafeEqual, not ===. Timing attacks on string comparison are a real thing even if they feel academic. Also notice I respond with 200 immediately and process the message asynchronously — Shopify will retry the webhook if you don’t acknowledge within 5 seconds, and your ChatGPT call alone can eat 2-3 of those seconds on a bad day.

Once verified, you call ChatGPT with the customer message, then post the reply back using the Shopify Conversations API:

POST https://{shop}.myshopify.com/api/unstable/graphql.json
Content-Type: application/json
X-Shopify-Storefront-Access-Token: {token}

mutation {
  conversationMessageCreate(input: {
    conversationId: "{conversation_gid}",
    body: "{bot_reply_text}"
  }) {
    message {
      id
      body
    }
    userErrors {
      field
      message
    }
  }
}

The conversationId comes from the webhook payload under conversation.admin_graphql_api_id. Pull it from the raw payload immediately when you parse the webhook body — the structure is a bit nested and easy to miss the first time you look at it.

Here’s the gotcha nobody puts in the docs: Shopify Inbox has no native bot-typing indicator. From the customer’s side, the conversation just goes quiet while your code calls OpenAI and waits. If the total round-trip — webhook receipt, ChatGPT inference, Conversations API write — takes longer than about 3 seconds, customers will assume the chat is broken. I handled this by sending an immediate acknowledgment message the moment the webhook is verified, before the ChatGPT call even starts:

// Step 1: post acknowledgment immediately
await postToConversationsAPI(conversationId, "One moment, looking into that for you...");

// Step 2: call ChatGPT
const botReply = await getChatGPTResponse(customerMessage);

// Step 3: post actual reply
await postToConversationsAPI(conversationId, botReply);

This adds a second message to the thread, which isn’t perfect UX, but it’s miles better than a silent 4-second gap. The alternative is to optimize your ChatGPT call ruthlessly — use gpt-4o-mini instead of gpt-4o for first-pass responses (latency drops from ~3s to under 1s in most cases), keep your system prompt tight, and don’t stream if you’re posting back via the Conversations API anyway since streaming doesn’t map cleanly to that endpoint.

Deploying to Production: Railway, Environment Variables, and What to Monitor

Why I Moved This Stack to Railway (And What to Watch Once It’s Live)

I switched from Heroku to Railway for this specific setup because of one concrete reason: Redis. On Heroku, you’re adding Redis as a third-party add-on from a different provider, managing separate billing, separate credentials, and a separate dashboard. On Railway, you spin up a Redis instance inside the same project in about 30 seconds, and the internal connection URL is auto-injected as an environment variable. No copy-pasting credentials between dashboards. When your ChatGPT integration uses Redis for session caching — storing conversation history per customer so GPT has context across messages — having both services in the same project graph is genuinely cleaner. The pricing model also stopped surprising me. Railway charges based on actual resource usage rather than a dyno-hours model, so a low-traffic Shopify store with occasional support spikes doesn’t bleed money on idle compute.

Deploying is one command once you’ve got the Railway CLI installed:

# Install Railway CLI
npm install -g @railway/cli

# Login and link your project
railway login
railway link

# Deploy from your local working directory
railway up

Set your environment variables through the Railway dashboard under your service’s Variables tab — not in a committed .env file. You need at minimum these four:

OPENAI_API_KEY=sk-...
REDIS_URL=${{Redis.REDIS_URL}}   # Railway auto-populates this from your Redis service
SHOPIFY_WEBHOOK_SECRET=your_hmac_secret
NODE_ENV=production

The ${{Redis.REDIS_URL}} syntax is Railway’s variable reference system — it pulls the value from your Redis service automatically. Don’t hardcode the Redis connection string. If you ever tear down and recreate the Redis instance, the reference updates itself; a hardcoded URL breaks silently and you’ll spend 40 minutes confused about why sessions aren’t persisting.

Logging every OpenAI request and response is non-negotiable. I learned this after a customer complained that the bot told them a product was in stock when it wasn’t. Without logs, I was flying blind. The fix is cheap: generate a UUID per request, log the full prompt, the model response, the token usage, and a timestamp. Here’s the pattern I use:

const { v4: uuidv4 } = require('uuid');

async function callOpenAI(messages, customerId) {
  const requestId = uuidv4();
  
  console.log(JSON.stringify({
    event: 'openai_request',
    requestId,
    customerId,
    timestamp: new Date().toISOString(),
    messages
  }));

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages,
    max_tokens: 500
  });

  console.log(JSON.stringify({
    event: 'openai_response',
    requestId,
    customerId,
    content: response.choices[0].message.content,
    usage: response.usage,
    finishReason: response.choices[0].finish_reason
  }));

  return response;
}

When a customer emails saying “your bot gave me wrong information at 2pm yesterday,” you search logs by customerId, find the requestId, and pull the exact prompt and response in 90 seconds. Without that UUID correlation, you’re guessing. Railway’s built-in log viewer handles this adequately for moderate traffic; if you start getting serious volume, pipe logs to Logtail or Papertrail through Railway’s integrations panel.

Cost monitoring is the thing that catches people off guard. gpt-4o-mini is priced at $0.15 per million input tokens and $0.60 per million output tokens as of the current OpenAI pricing page — cheap enough that most support bots cost pennies a day. But a Shopify flash sale can send your support volume up 20x overnight, and “pennies per conversation” multiplied by thousands of simultaneous sessions adds up faster than you expect. Go into your OpenAI dashboard, find Billing → Usage limits, and set a hard monthly spend cap. Not a soft notification — a hard limit. The API will start returning 429 errors when you hit it, which you handle gracefully in your error middleware with a fallback message pointing customers to your email. That’s a better outcome than a surprise $400 bill. Also log response.usage.total_tokens per request so you can track your own cost projection before the month closes.

For a broader look at automating your dev and business workflows beyond just this integration, the Ultimate Productivity Guide: Automate Your Workflow in 2026 covers the full picture — from CI/CD to client communication pipelines worth building alongside your support bot.

Gotchas I Hit That the Docs Don’t Warn You About

The fastest way to lose a day on this integration is copying OpenAI code from Stack Overflow answers posted before mid-2023. The v4 SDK is a complete rewrite — not a minor version bump, not deprecation warnings, a full break. The old pattern using new Configuration() and new OpenAIApi() is dead. If your code looks like this, throw it out:

// DEAD. Don't use this.
const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY });
const openai = new OpenAIApi(configuration);

The new client is just this:

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Where's my order?" }],
});

That’s it. No configuration object, no wrapper class. The new SDK also handles retries and timeouts natively, which is worth knowing when you’re calling it from a webhook handler with a 10-second response window.

Shopify’s Conversations API Is Not the Admin API

I burned two hours assuming I could use my Admin API token to post replies into Shopify Inbox. You can’t. The Conversations API lives at https://api.shopify.com/2024-01/conversations and requires OAuth scopes specifically including write_conversations — which is a separate grant you need to add to your app config. Even after you add the scope, the auth header format tripped me up. Admin API calls use X-Shopify-Access-Token, but Conversations API calls through the REST surface need the same header but the token must come from a re-authorization flow that includes the conversations scope. If your existing token predates when you added that scope, it won’t work — you’ll get a vague 401 with no explanation. Re-auth the store, grab a fresh token, and the requests go through immediately.

GPT Apologizing for Lacking Real-Time Data Even When You’ve Given It the Data

This one confused me for longer than I’d like to admit. I was injecting live order data from the Admin API directly into the message and GPT kept responding with something like “I’m sorry, I don’t have access to real-time order information.” The fix is entirely about framing in the system prompt. GPT has a baked-in reflex to disclaim anything that sounds like a live lookup. If you label your injected data as a tool result or API response, it triggers that reflex. Change the framing to sound like authoritative internal data and it stops:

const systemPrompt = `You are a customer support agent for Acme Store.
Here is the verified order data retrieved from our system for this conversation:

Order #${order.id}
Status: ${order.fulfillment_status}
Estimated Delivery: ${order.estimated_delivery}
Tracking: ${order.tracking_number}

Use this data directly to answer the customer. Do not say you lack access to order information.`;

The phrase “verified order data from our system” does the heavy lifting. It signals to the model that this isn’t something it needs to retrieve — it’s already been retrieved and handed to it as fact. I also explicitly tell it not to disclaim, which catches the edge cases where it still hedges.

Don’t Even Try Streaming Into Shopify Inbox

Streaming completions look great in demos. Shopify Inbox does not support Server-Sent Events. If you set stream: true in your OpenAI call and try to pipe that into an Inbox reply, you’ll either get a broken message or nothing at all, depending on how your webhook endpoint handles the response. Just skip it entirely for this use case. Set stream: false, wait for the full completion, then POST the whole response text to the Conversations API in one shot. The added latency is usually under two seconds with gpt-4o-mini, which is fast enough that customers don’t notice. Save streaming for your own chat UI where you control the frontend. Here’s the non-streaming call you actually want:

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: conversationHistory,
  stream: false,
  max_tokens: 300,
});

const replyText = response.choices[0].message.content;

await postToShopifyInbox(conversationId, replyText);

Keeping max_tokens at 300 or below also matters here. Support replies don’t need to be essays, and capping tokens keeps your response time predictable and your API costs from drifting up quietly over thousands of conversations.

When to Hand Off to a Human Agent (And How to Automate That Too)

Escalation triggers belong in the system prompt, not your application logic

The biggest mistake I see in support bot implementations is burying escalation logic in a rats’ nest of if/else conditions across the middleware. Put it where it belongs — in the system prompt itself. Tell GPT explicitly what warrants a handoff, and then tell it how to signal that to your code. Here’s the relevant section I use in production:

You are a customer support agent for [Store Name]. If the customer:
- Mentions a refund dispute involving more than $150
- Uses any of these words: "lawyer", "fraud", "chargeback", "sue", "BBB", "dispute"
- Expresses extreme frustration three or more times in the same conversation
- Requests to speak with a human agent

...then you MUST respond with a JSON object structured as follows BEFORE any natural language reply:

{"escalate": true, "reason": "chargeback_mention", "urgency": "high", "summary": "Customer claims item never arrived and mentioned filing a chargeback"}

If escalation is not needed, include {"escalate": false} in your response.
Always wrap the JSON in a  XML tag so I can parse it reliably.

The XML tag wrapper is a trick I picked up after a week of regex nightmares. JSON embedded in natural language is a parsing disaster — GPT will occasionally slip a comma in the wrong place or wrap the block in markdown backticks depending on the context. The XML wrapper makes extraction deterministic. Your middleware just grabs everything between <escalation_signal> tags before passing the response to the customer.

Parsing the escalation flag in your middleware before the reply goes out

Your middleware layer needs to intercept GPT’s response, extract the escalation signal, strip it from the visible reply, and route accordingly — all before the customer sees anything. Here’s a Node.js snippet that handles this:

function parseEscalationSignal(rawResponse) {
  const signalMatch = rawResponse.match(/<escalation_signal>([\s\S]*?)<\/escalation_signal>/);
  if (!signalMatch) return { escalate: false };

  try {
    const signal = JSON.parse(signalMatch[1]);
    const cleanedReply = rawResponse
      .replace(/<escalation_signal>[\s\S]*?<\/escalation_signal>/, '')
      .trim();
    return { signal, cleanedReply };
  } catch (err) {
    console.error('Failed to parse escalation signal:', err);
    // Fail safe: escalate if you can't parse
    return { signal: { escalate: true, reason: 'parse_error', urgency: 'medium' }, cleanedReply: rawResponse };
  }
}

const { signal, cleanedReply } = parseEscalationSignal(gptResponse);

if (signal.escalate) {
  await routeToHumanAgent(signal, conversationHistory);
}

sendToCustomer(cleanedReply);

Notice the fail-safe on parse error — I escalate rather than silently drop it. A missed fraud mention that slips through to an automated reply is a much worse outcome than an unnecessary ticket in Zendesk.

Routing to Gorgias or Zendesk via API — and why I lean toward Gorgias for Shopify

I switched to Gorgias for a client project because the native Shopify integration is genuinely useful. Gorgias auto-pulls order data, subscription status, and lifetime value directly into the ticket view — your human agent can see the full customer context without leaving the support interface. With Zendesk, you’re wiring that up yourself via custom apps or middleware. Zendesk is more powerful at scale, but the Shopify integration takes real effort to get right and the app marketplace options range from decent to sketchy.

Here’s what the escalation API call looks like against Gorgias:

async function routeToGorgias(signal, conversationHistory, customerEmail) {
  const ticketBody = {
    channel: 'api',
    via: 'api',
    from_agent: false,
    customer: { email: customerEmail },
    messages: [
      {
        channel: 'api',
        via: 'api',
        from_agent: false,
        body_text: `AI Escalation — Reason: ${signal.reason} | Urgency: ${signal.urgency}\n\nSummary: ${signal.summary}\n\n---\nConversation History:\n${conversationHistory.map(m => `${m.role}: ${m.content}`).join('\n')}`,
        sender: { email: customerEmail }
      }
    ],
    tags: ['ai-escalated', signal.reason, signal.urgency]
  };

  const response = await fetch('https://YOUR_DOMAIN.gorgias.com/api/tickets', {
    method: 'POST',
    headers: {
      'Authorization': `Basic ${Buffer.from('EMAIL:API_KEY').toString('base64')}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(ticketBody)
  });

  return response.json();
}

Gorgias’s API tier starts at their Pro plan ($60/month for 2 agents) — the Starter plan at $10/month is API-read only, which trips up a lot of people. You won’t find that limitation clearly documented on the pricing page; I found it buried in a support article. Zendesk’s API access is available on the Suite Team plan at $55/agent/month, which adds up fast if you’re staffing a full team.

The max-turns limit is not optional — set it or you’ll regret it

The thing that caught me off guard on my first deployment was long-tail conversations. A frustrated customer who isn’t getting what they want will loop the bot indefinitely. Without a turn limit, you’re burning tokens on a circular conversation that was never going to resolve, and the customer is getting more angry by the minute.

I set a hard limit of 10 turns. Track it on your conversation session object:

const MAX_TURNS = 10;

async function handleMessage(sessionId, userMessage) {
  const session = await getSession(sessionId);

  if (session.turnCount >= MAX_TURNS) {
    await routeToHumanAgent(
      { escalate: true, reason: 'max_turns_exceeded', urgency: 'medium', summary: 'Conversation unresolved after 10 exchanges' },
      session.history,
      session.customerEmail
    );
    return "I'm connecting you with a member of our team who can sort this out properly. You'll hear from us within a few hours.";
  }

  session.turnCount += 1;
  await saveSession(session);
  // ... rest of GPT call
}

Ten turns sounds like a lot but it goes fast with pleasantries, clarifications, and GPT asking follow-up questions. I’ve experimented with lower limits — seven works fine for straightforward order queries, but shipping issues with missing items often legitimately need more back-and-forth. Ten is the number I keep landing on. Adjust based on your actual ticket data once you have a few weeks of logs.

  • Always include the full conversation history in the escalation ticket body — agents hate picking up mid-conversation with no context
  • Tag tickets by escalation reason in Gorgias/Zendesk — “chargeback_mention” tickets need faster SLA than “max_turns_exceeded”
  • Send a confirmation message to the customer when escalating — silence after a bot handoff is the fastest way to get a negative review
  • Don’t let GPT attempt resolution after the escalation signal fires — the cleanedReply should be a neutral handoff message your team controls, not whatever GPT generates

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.


Eric Woo

Written by Eric Woo

Lead AI Engineer & SaaS Strategist

Eric is a seasoned software architect specializing in LLM orchestration and autonomous agent systems. With over 15 years in Silicon Valley, he now focuses on scaling AI-first applications.

Leave a Comment