Developing APIs can often be a complex web of dependencies, external dependencies, and murky network traffic. In order to build better, developers need a certain amount of stability to test a query or feature against, and when this stability is lacking, development can get more complicated and difficult.
Enter API mocking. API mocking is an approach to generating a mock service that provides dependable data for a variety of testing purposes. This data can then be used as a test case for actual API calls, allowing for more complete and accurate development.
While mocking is pretty well understood, there are some caveats and special cases that make their use a bit more complicated. AI is certainly one of those cases, introducing substantial complexity and injecting considerations into the development process. The ability to effectively and accurately mock Open AI API calls and functions is paramount, and this requires a firm understanding of the technology, how it interacts with mocks, and how it can be leveraged for better results.
Today, we’re going to look at some best practices and processes for mocking the Open AI API, as well as some ways that Speedscale can help this process along with the powerful Proxymock solution.
Prerequisites
Before diving into mock OpenAI’s API, ensuring that you have a few pieces in place is helpful. This article assumes that you have some basic understanding of how APIs work and interact with applications, as well as some basic knowledge of the differences between common standards and paradigms such as RESTful design, event-driven architectures, and so forth.
In terms of hard requirements, you will need to ensure that you have access to the OpenAI API in the form of an API key, as well as a local development environment with something like Node.js or Python. This will help you rapidly get Proxymock set up and ready to use.
Understanding API Mocking
API mocking is essentially the process of creating simulated responses from an API instead of making actual calls to a live server. When you create a mock service, you are taking something that is hard to control, whether it is the environment or the variables in the code of the application itself, and you’re making it something much easier to control. In the development world, stability improves the development process significantly and mocking abstracts a lot of the chaos of development away, allowing you to work with a much more dependable and predictable data source.
This process has some huge benefits, generally speaking, but there are some key benefits that apply specifically to working with a third-party API such as OpenAI. These include:
-
- Reduced Costs – services like OpenAI typically charge per request or through the use of token-based accounting. These services can be very expensive to develop against, as each iteration will incur significant costs as the system is tested again and again. Developing a mock service will allow you to abstract this cost away, generating the cost only a handful of times at the start of the mocking process.
- Improved Speed – since the mocked service is not actually going to the external services, there’s no need to wait for any actual API responses. This can drastically speed up both development and iterated testing, as you are able to rapidly test changes and edge cases without having to wait for the roundtrip that an external service incurs. You can also use this to great effect to do simulated mass-scale testing, allowing you to test what changes might do in a service that handles hundreds of thousands of cases rather than just testing on a handful of potential queries.
- Enhances reliability – since you are testing against consistent responses, you are generating consistent output, even when the actual API is down or otherwise constrained. This stability has significant tack-on effects that result in more stable and reliable services at scale, allowing you to develop from a place of security and predictability.
- Facilitates testing – mocking allows developers to verify how their application behaves under various conditions, opening up a wider range of variables. You can’t really recreate some circumstances that appear random, so being able to control and cause them during testing can be significantly helpful, allowing for more accurate results.
- Sidesteps rate limiting – many third party services, including OpenAI, limit how many requests you can make. By creating a mock and leveraging observed traffic, you can sidestep this limitation while still reaping the benefits of the service directly.
Approaches to Mocking OpenAI’s API
Given the obvious benefits that come with mocking OpenAI’s API, it’s clearly paramount that we not only do it – but do it right. Accordingly, developers must consider their specific approach to mocking.
Roughly speaking, there’s a handful of methods that developers should consider:
- Static Mocks – static mocks are hardcoded responses that are stored from past requests, typically as JSON files, allowing you to mimic an API’s output predictably.
- Request Interception – request interception utilizes middleware to capture requests as they occur in testing and route them to a mixture of dynamic and static resources. This is a bit more complicated, but it allows for the use of local models or lower-cost solutions before tying into third-party services.
- Local Mock Servers – utilizing a tool like MockServer allows you to run a lightweight API simulation, creating a server that is treated as if it’s real while replacing it with local responses.
- Proxy-Based Mocking – solutions like Proxymock allow you to capture and replay traffic and requests at scale, using actual data to create a mock that is more closely aligned to actual user traffic and behavior.
Static Mocks
Static mocks are the simplest method by far, but they are also very limited compared to other approaches. In this approach, you create predefined JSON responses that your application uses instead of making real API calls.
Example (JavaScript):
const mockResponse = {
id: "chatcmpl-123",
object: "chat.completion",
created: 1689214321,
model: "gpt-4",
choices: [{ message: { role: "assistant", content: "Hello, how can I assist you today?" } }],
};
function getMockResponse() {
return mockResponse;
}
console.log(getMockResponse());
In this example, a chat response is returned as if it were generated by the GPT-4 model, but it’s the same message regardless of prompt or request. The client will receive this request as if it were just generated, allowing for some basic testing and iteration against the default state of the data.
Request Interception
Interception allows you to replace real API requests with mock responses dynamically. This is a bit more complicated and isn’t nearly as dynamic as it seems, as you are still ultimately pushing to an internal resource, which has cost. This partially negates some of the benefits of the mocking approach but does provide more flexibility.
Example (Node.js with Nock):
const nock = require('nock');
nock('https://api.openai.com')
.post('/v1/chat/completions')
.reply(200, {
id: "chatcmpl-123",
object: "chat.completion",
model: "gpt-4",
choices: [{ message: { role: "assistant", content: "Hello from mock!" } }],
});
Example (Python with responses):
import responses
import requests
@responses.activate
def test_openai_mock():
responses.add(
responses.POST,
"https://api.openai.com/v1/chat/completions",
json={"choices": [{"message": {"role": "assistant", "content": "Hello from mock!"}}]},
status=200,
)
response = requests.post("https://api.openai.com/v1/chat/completions")
print(response.json())
test_openai_mock()
Local Mock Servers
Mock servers act as stand-ins for real APIs, allowing you to create a resource that will be a stand-in for the external dependency or resource.
Example (json-server setup):
npm install -g json-server
Create a db.json file:
{
"chat_completions": [
{
"id": "chatcmpl-123",
"choices": [{"message": {"role": "assistant", "content": "Mocked response"}}]
}
]
}
Run the mock server:
json-server --watch db.json --port 3000
This process is complex. Although it is highly flexible, it does create yet another resource, and thus, another point of failure to consider when error testing your systems.
Proxy-Based Mocking
This approach records real API responses and replays them, ensuring accurate testing scenarios. This is where Speedscale’s Proxymock shines. Proxy-based mocking allows you to use real data, capturing that data for replay later on. In essence, you are testing against production but testing in a safe and isolated way. We’ll discuss this more shortly in its own section, but this provides a significant amount of flexibility while providing substantial benefits from the efficiency and safety of data sourcing.
Best Practices for Mocking OpenAI’s API
Before we dive into Proxymock, we should set some best practices for mocking OpenAI’s API. Adopting these best practices will ensure that your data and development output are as accurate and useful as possible.
- Use Realistic Data – this is provided in large part by Proxymock, but using real data is incredibly important. Make sure that your mocks are based on reasonable data that represents the typical use case within your systems.
- Cover Error Cases – OpenAI, and AI APIs in general, came to introduce some weird errors occurring from pure randomness or confused data sets. Because of this, you need to ensure that your mocks effectively handle rate limits, server errors, and malformed responses. You might want to detail specific OpenAI API error codes, like rate limits (HTTP 429) and invalid API keys (HTTP 401), to ensure comprehensive error handling.
- Validate Responses – you need to ensure that your application correctly processes different responses (e.g., missing fields, incorrect data) in order to be really useful. OpenAI sometimes returns varying response formats depending on the parameters used, and efficiently handling this will ensure that developers can create more robust and useful mocks.
- Handle Streaming Responses Properly – OpenAI’s API supports streaming responses (stream: true), which means the server sends chunks of data over time. If you’re planning on using this feature, you should ensure that your mocking efforts simulate streaming using generators in Python or readable streams in Node.js.
- Simulate Token-Based Output Limits – OpenAI’s responses are token-limited (max_tokens setting). Make sure that your mock is handling this particular quirk of OpenAI and other LLMs. If your application processes partial responses, ensure that your mock responses properly respect token truncation.
- Respect Rate Limits and Errors – OpenAI pretty strictly enforces rate limits (429 Too Many Requests), so you need to ensure that you simulate rate limits and error handling (e.g., mock intermittent 500 Internal Server Error, 400 Bad Request, and 401 Unauthorized errors). This will allow you to mock proper responses and build for the reality of the AI API itself.
- Account for System and User Messages in Chat Models – When mocking GPT-4 or GPT-3.5-turbo, ensure the request structure includes the system and user messages common to these models, including:
- System, user, and assistant messages;
- Function calling (tool_calls field); and
- JSON mode (forcing structured responses).
- Simulate Latency for Realistic Testing – OpenAI’s API has variable response times, and these times can sometimes change pretty significantly depending on the complexity of the request or the depth of data retrieval that is occurring. Accordingly, make sure you properly mock and introduce latency to test timeout handling and error routing.
Common Pitfalls and How to Avoid Them
Mocking has some general pitfalls that are much more impactful when mocking OpenAI’s API. These include:
- Relying Too Much on Static Mocks – Static mocks can become outdated quickly, especially as these models and their interactions change almost monthly. Use Proxymock and real data recordings to bridge your service with actual request traffic.
- Ignoring Error Handling – Simulate rate limits and server errors to make your app resilient. With OpenAI and other LLM systems, error handling introduces a significant source of noise that can have huge tack-on effects if improperly handled or injected into the prompt. Accordingly, test as if everything is on fire to make sure you can contain the blaze.
- Mocking Incorrectly – ensure your mocks return expected headers, structures, and response times. Using Proxymock helps significantly with this, aligning your code and testing against actual implementation.
Introducing Speedscale’s Proxymock
Proxymock is a feature offering from Speedscale that simplifies API mocking by capturing and replaying real API requests and responses. Proxymock can be used to take this live behavior and create a mock generatively from it, facilitating rapid testing and development at scale.
The use of generative technology through Proxymock allows for more accurate development and abstracts away the need to manually create mock responses and code, leading to simpler, easier, and more effective testing apparatuses.
Key Features
Proxymock is incredibly powerful, offering some pretty key features that make it stand out as best-in-class:
- Traffic Recording – Proxymock captures your application’s inbound and outbound API and database calls, creating a snapshot of its interactions. This data can then be used for a variety of purposes, both for development and ongoing testing. It supports both HTTP and gRPC, offering a wide range of capabilities.
- Automatic Mock Generation – based on the recorded traffic, Proxymock can generate mock servers that replicate the behavior of the original backend services, allowing your application to operate as if in a live environment. This removes the need to manually create mocks, reducing the difficulty of mocking and significantly reducing the likelihood of human error in the process.
- Protocol Support – Proxymock supports a wide range of protocols, including HTTP, gRPC, and Postgres, making it versatile for different applications. In the context of OpenAI’s API, this makes it useful not only for testing OpenAI implementations but also for enabling future implementation testing of other models and API systems as well as those systems that might connect with it internally.
-
- Enhanced Development Efficiency – by simulating backend services locally, developers can continue building and testing applications without waiting for external systems, reducing development time. This can also significantly reduce resource and token consumption, reducing ongoing iterative costs for the OpenAI API.
- Improved Testing Accuracy – since this approach uses real data to test against, this allows your OpenAI API traffic and user needs on the service itself to be reflected in the testing. This reduces the drift that occurs between tests and practical deployment, allowing for more effective testing, development, and debugging.
- Cost Efficient – Proxymock is entirely free for local development, but offers cost efficient scaling with Pro and Enterprise solutions for more complicated use cases. This offers flexibility and cost efficiency that is industry-leading.
Using Proxymock to Mock OpenAI APIs
Getting started with Proxymock is quite simple. In essence, you just want to treat your OpenAI API traffic as you would any service traffic, routing it through Proxymock for observation and replay.
Installing Proxymock
Installing Proxymock is super easy – you can simply install it through homebrew as follows:
brew install speedscale/tap/proxymock
Setting the Stage
With Proxymock installed, your next steps will be to grab a key and initialize Proxymock for listening. To do this, you can visit the key service here. Next, initialize it as follows, passing your API key into the request:
proxymock init
Capture Live Traffic
Now that you have Proxymock running, you’ll need to actually capture your traffic into a snapshot as you query the OpenAI API. To do this, you’ll need to first ensure that Proxymock is running:
proxymock run
This should generate an output that looks like this:
...
export http_proxy=http://127.0.0.1:4140
export https_proxy=http://127.0.0.1:4140
...
proxymock is capturing and mocking snapshot <snapshot-id>
...
Now that you have your snapshot capturing, you need to make some OpenAI API requests. These requests will generate a snapshot via the snapshot ID, as noted above, creating an auto-generated mockup with the ID that can be called elsewhere and utilized for further testing.
You can learn more about running proxymock in the Getting Started.
Conclusion
Mocking OpenAI’s API is essential for cost-effective, reliable, and efficient development, allowing you to use GPT without the overhead and complexity of other solutions. While traditional mocking methods like static mocks and request interception work well, Proxymock elevates the process by providing dynamic, replayable API simulations that are more closely aligned with actual traffic and use patterns.
Adopting Proxymock ensures that you can easily deploy robust testing, model against consistent behavior, and enjoy seamless integration into CI/CD pipelines. With Proxymock, developers can confidently build and scale applications without worrying about API limits or downtime. Try it out today and enhance your API testing strategy!