Simple pricing, built to scale with your needs.

Pricing should never get in your way. This is why HoneyHive is free forever for individual developers and researchers.

Individual

For developers just getting started

Free

Request access

Up to 1,000 traces per month

Up to 2 users

Up to 1 project

30d data retention policy

Playground with version control, collaboration, tools, and 100+ models

Full evaluation and observability suite

Pre-built evaluators for offline and online evaluation

Community support

"It's critical to ensure quality and performance across our LLM agents. With HoneyHive, we've not only improved the capabilities of our agents but also seamlessly deployed them to thousands of users — all while enjoying peace of mind."

Divyansh Garg

Co-Founder & CEO, MultiOn

Frequently asked questions
What can HoneyHive do for my company?

HoneyHive provides essential testing and observability tools that help teams test their LLM apps, discover issues, and improve reliability and performance through continuous iteration. This helps AI teams build safer, trustworthy, and more reliable AI products that are optimized for production scale.

Our tools will help you iterate faster, evaluate and benchmark performance as you iterate, monitor performance in production, and curate high-quality datasets for fine-tuning and continuous evaluation - all within a unified, collaborative workspace.

What is a trace?

A trace is a collection of events that are captured during a typical interaction with your LLM application, also commonly referred to as a session in HoneyHive. Each event within the trace is a log of an API call or a function in your LLM app orchestration that can be stored in HoneyHive (eg: your LLM requests, retrieval step from a vector database/external tool, or any pre/post-processing steps).

What is an evaluator?

An evaluator is a function that helps you compute heuristics to measure performance of your LLM app.

We allow users to design their own custom evaluators in Python using popular libraries such as Transformers, Numpy, Scikit Learn, etc. or alternatively use LLMs as a judge to grade specific events within a session or the entire session as a whole.

Evaluators can be used to judge subjective traits like coherence or answer faithfulness, detect if your agent went off track, check JSON schema validity, and more. This allows you to monitor and evaluate your applications with quantitative rigor and understand precisely where your LLM apps fail.

Evaluators can be defined both on the user session and event level, and are automatically computed as you log data in HoneyHive via any one of our logging methods.

Can HoneyHive help me fine-tune custom models?

HoneyHive allows you to filter and curate datasets from your production logs. These datasets can be annotated by domain experts within the platform and exported programmatically for fine-tuning open-source models.

Users can export their datasets curated within HoneyHive via our SDK and use their preferred fine-tuning provider and optimization method (such as DPO, KTO, etc.) to fine-tune custom models.

Users can also set up active learning automations to periodically export their logs and run fine-tuning and validation jobs with their preferred fine-tuning providers. Contact us to learn more.

Is my data secure? 

All data is secure, encrypted, and private to your tenant. We conduct regular penetration tests, are currently undergoing SOC-2 audit, and provide flexible hosting solutions (VPC and on-prem) to meet your security and privacy needs.

Does HoneyHive proxy requests?

By default, we do not proxy your requests via our servers. That said, we do provide an optional proxy for teams looking to manage their prompts via HoneyHive. This proxy can be hosted via HoneyHive or within your private cloud environment.

How do I log my data? 

You can log your production requests or any evaluation runs in real-time using our logging endpoints and proxy, or async via our batch ingestion endpoints. We offer native SDKs in Python and Typescript, and provide additional integrations with popular open-source orchestration frameworks like Langchain and LlamaIndex.

For Enterprise customers, we also offer support in additional languages like Go, Java, and Rust via our API endpoints.

Our distributed tracing architecture generalizes across multiple orchestration frameworks (LlamaIndex, Langchain, AutoGen, etc.), models, and hosting environments (cloud, local, on-prem). This allows you to trace any LLM app, no matter how complex or custom your application is.

How long does it take to integrate the SDK?

Integrating the SDK with your application can take anywhere from a few minutes to a couple hours, depending on the complexity of your application and your orchestration framework.

If you're currently using Langchain and LlamaIndex, you can get started in under 5 minutes with our 1-click LlamaIndex integration and Langchain tracer.

For Team and Enterprise plan users, our team is happy to provide hands-on support and instrumentation advice to get your team set up.

Ship reliable AI products that your users trust