OpenAI has been at the forefront of AI research, with their foundation models like GPT-4 gaining widespread adoption. However, it is important to acknowledge that OpenAI and similar closed-source models may not be your only option for every use-case.
With the reliability issues that followed DevDay and the current uncertainty surrounding OpenAI's future, developers and enterprises have been left wondering about the future of their own products and AI plans. Many, as a response, are considering open-source models. Which begs the question: When should enterprises move away from OpenAI, and how can leaders prepare to make the switch?
The answer is simple: By setting up your data flywheel early, you allow your team to remain nimble and make informed decisions around your LLM needs.
When to move off of OpenAI
While GPT-4 is great for complex reasoning, it isn’t the best model for every task. When factoring the costs and limitations of using OpenAI in production, the benefits of open-source models cannot be overstated. When is it appropriate to consider moving away from OpenAI?
Security Sensitive Use Cases : If your application has privacy requirements where data sensitivity and security is a concern, open-source models may be a better fit. Open-source models, on the other hand, offer the advantage of retaining sensitive data within a company's own cloud environment, thereby safeguarding data privacy and avoiding the risks associated with transmitting sensitive information to external entities. There are various companies like Baseten , Gradient and MosaicML that enable private hosting of open-source LLMs in your own cloud environment.Cost Optimization : If your company is facing large LLM serving costs (~$5,000/month) or requires significant scalability, open-source models offer a cost-efficient solution, representing cost savings of over 90%. Even though OpenAI has been steadily reducing the costs of its models, GPT-4 remains a costly option for production-scale usage. Open-source models are quickly emerging as an economical alternative, enabling comparable or superior performance (if fine-tuned) with the potential to reduce inference costs dramatically. This also allows you to switch from a token-based pricing model to a compute-based model, making costs more predictable.Control and Reliability : Building a production application demands a high level of reliability. Open-source models allow you to meet these requirements on your own terms. OpenAI's API can suffer from outages and the service is constrained by rate limits, that may interrupt your application during any surges in usage. Open-source models, in contrast, allow organizations to exert greater control over their model’s availability, latency and uptime, leading to a more dependable integration at production scale.Customization: While closed-source models are great for generalized tasks, they're not very customizable and can be overkill for most contexts. Open-source models like Llama v2, MosiacML's MPT-7B, or Mistral-7B are designed with fine-tuning and customization in mind. GPT-3.5 Turbo fine-tuning is restricted to just the first two layers of the neural network. In contrast, open-source models offer extensive flexibility for fine-tuning, permitting organizations to adapt the models to their specific needs and optimize performance with advanced techniques such as PPO, DPO, LoRA, and others.
How to prepare for the switch
Starting your data flywheel as early as possible helps make any model transition smooth.
Before making the switch to open-source models, it’s critical to have the right infrastructure and processes set up to manage your proprietary data and prepare your organization for a smooth transition. With a well-functioning data flywheel in place, you can enable your team to use your own proprietary data to customize, evaluate, and deploy open-source models for your unique use-case.
Here's what you need to start the data flywheel:
Logging production data : Integrate a tool like HoneyHive and add logging capabilities into your existing LLM infrastructure. By logging production LLM data and human feedback, you can capture real-world user interactions and responses. This data will serve as the foundation for evaluation and comparison with other models in the future.Define Evaluation Metrics : Work with your team to define the evaluation metrics that align with your organization's goals and objectives. There are a range of evaluators and test suites that can be customized to measure the performance of different models for common use-cases like data extraction, RAG, code-generation and autonomous agents. This step ensures that you have a clear benchmark for evaluating new models in your app and tracking their progress over time.Curate Fine-Tuning and Evaluation Datasets: Use both human feedback and your evaluation metrics to filter production data and curate a diverse selection of input/output pairs for fine-tuning and model evaluation. You can further improve the quality of your dataset by involving domain experts to label and correct model responses. Tools like HoneyHive make collaborating with domain experts and collecting annotations seamless.Fine-Tune and Compare Models : Once you have collected enough data, you can fine-tune multiple candidate models using services like Gradient, MosaicML, or Together.ai, and benchmark them against GPT-4 to compare performance using the metrics you defined earlier. HoneyHive's evaluators and test suites make it easy to compare different models and track their progress over your real data.Iterate and Optimize : The data flywheel is a continuous process of iteration and optimization. You can continuously collect data, evaluate models, and make improvements based on the insights gained. This iterative approach allows you to refine your models and achieve better performance over time.
HoneyHive for your data flywheel
Setting up your data flywheel early in your organization is both easy and important. Tools like HoneyHive can help reduce your organizations reliance on model providers and arm you with tools for your enterprise to make informed decisions related to LLMOps.
By leveraging HoneyHive for logging in production LLM, you kickstart the Data Flywheel early in your organization. This approach ensures that you have a solid foundation of real data to evaluate and compare models, enabling you to make informed decisions around LLM needs and drive continuous improvement.
If you're interested in learning more about setting up a data flywheel for your organization, join our beta waitlist or book a demo with the team.