Get started for free

Start building for free, and only pay when you scale your app.

Developer

Free

No credit card required

Get started

10k events per month

90d log retention

Up to 2 users

Full evaluation and observability suite

Usage Limits
Developer
enterprise
Event Ingestion Volume
10k per month
Custom
Number of Online Evaluations
Unlimited
Unlimited
Log Retention
90d
Custom
Max Event Size
5MB
5MB
Max Requests per Minute
1,000
Custom
Observability
Developer
Enterprise
Distributed Tracing
Performance Monitoring
Custom Charts
Dataset Curation
Human Annotation
Data Export
Alerts
Evaluation
Developer
Enterprise
Online Evaluation w/ sampling
Offline Experiments
Evaluation Reports
GitHub Actions Integration
Custom Evaluators
Unlimited
Unlimited
Prompt Studio
Developer
Enterprise
Playground
Prompt Versioning and History
Functions and External Tools
Prompt deployments
Custom Models in Playground
Workspace
Developer
Enterprise
Number of Users
Up to 2
Unlimited
Number of Projects
Unlimited
Unlimited
Security
Developer
Enterprise
SSO (social)
SAML
Custom SSO
RBAC
Coming soon
Hosting
Cloud Hosted in US
Cloud Hosted in US
VPC Self-Hosting Add-On
AWS, Azure, or GCP
InfoSec Review
DPA and BAA
Support
Developer
Enterprise
Community Support
Email Support
Slack Connect Channel
SLA
CSM and Team Trainings

"It's critical to ensure quality and performance across our AI agents. With HoneyHive, we've not only improved the capabilities of our agents but also seamlessly deployed them to thousands of users — all while enjoying peace of mind."

Divyansh Garg

CEO

"For prompts, specifically, versioning and evaluation was the biggest pain for our cross-functional team in the early days. Manual processes using Gdocs - not ideal. Then I found @honeyhiveai in the @mlopscommunity slack and we’ve never looked back."

Rex Harris

Head of AI/ML

"HoneyHive solved our biggest headache: monitoring RAG pipelines for personalized e-commerce. Before, we struggled to pinpoint issues and understand pipeline behavior. Now we can debug issues instantly, making our product more reliable than ever."

Cristian Pinto

CTO

Frequently asked questions
What is an event?

An event refers to a single trace span, structured log, or metric label combination sent to our API as OTLP or JSON. It captures any relevant data from your system, including all context fields generated by your application's instrumentation.

What is an evaluator?

Automated Evaluators: An automated evaluator is a function (code or LLM) that helps you unit test any arbitrary event or combinations of events to generate a measurable score (and explanation, in case of LLM evaluators). Common examples of auto-evaluators include Context Relevance, Answer Faithfulness, ROUGE, BERTScore, and more. We provide many common evaluators out-of-the-box and allow defining custom evaluators within the platform.

Human Evaluators: We strongly encourage a hybrid-evaluation approach, i.e. combining automated techniques with human oversight. This helps you account for evaluation criteria bias and better align your evaluators with your domain experts' scoring rubric. To enable this, you can define custom scoring rubrics in HoneyHive for domain experts to use when evaluating outputs.

Do you help me fine-tune models?

HoneyHive allows you to filter and curate datasets from your production logs. These datasets can be annotated by domain experts within the platform and exported programmatically for fine-tuning models.

You can export datasets curated within HoneyHive using our SDK and use your preferred 3rd-party provider to fine-tune custom models. Our DSL's flexibility also supports more advanced use-cases like Active Learning. Contact us to learn more.

Is my data secure? 

All data is secure and encrypted in transit and at rest, managed by AWS and Clickhouse Cloud. We conduct regular penetration tests, are currently undergoing SOC-2 audit, and provide flexible hosting solutions (cloud-hosted or VPC) to meet your security and compliance needs.

Can I log long-context traces?

Yes. We support logging up to 5MB per span, which translates to ~1.7M tokens per span. This is designed to support most current-generation long-context models such as Claude and Gemini.

How do I manage and version prompts?

By default, we do not proxy your requests via our servers. Instead, we store prompts as YAML configurations, which can be deployed and fetched in your application logic using the GET Configuration API.

How do I trace my application? 

You can log traces using our SDKs and API endpoints, or async via our batch ingestion endpoint. We offer native SDKs in Python and Typescript with OpenTelemetry support, and provide automatic integrations with popular frameworks like LangChain/LangGraph and LlamaIndex.

For users using other languages, you can send your OpenTelemetry traces to our OTel collector or manually instrument your application using our APIs.

Ship reliable AI products that your users trust