How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inference

How To Migrate From OpenAI to Cerebrium for Cost-Predictable AI Inference

If you're building an AI application, you probably started with OpenAI's convenient APIs. However, as your application scales, you'll need more control over costs, models, and infrastructure.

Cerebrium is a serverless AI infrastructure platform that lets you run open-source models on dedicated hardware with predictable, time-based pricing instead of token-based billing.

This guide will show you how to build a complete chat application with OpenAI, migrate it to Cerebrium by changing just two lines of code, and add performance and cost tracking to compare the two approaches to AI inference using real data. When you're done, you'll have a working chat application that demonstrates the practical differences between token-based and compute-based pricing models, and the insights you need to choose the right approach for your use case.

Prerequisites

To follow along with this guide, you'll need Python 3.10 or higher installed on your system. You'll also need the following (all free):

OpenAI API key.

Cerebrium account (includes free tier access to test GPU instances up to A10 level).

Hugging Face token (free account required).

Llama 3.1 model access on Hugging Face. Visit meta-llama/Meta-Llama-3.1-8B-Instruct and click "Request access" to get approval from Meta (typically takes a few minutes to a few hours).

... continue reading