Pricing should never get in your way. This is why HoneyHive is free forever for individual developers and researchers.
For developers just getting started
Free
Up to 1,000 traces per month
Up to 2 users
Up to 1 project
30d data retention policy
Playground with version control, collaboration, tools, and 100+ models
Full evaluation and observability suite
Pre-built evaluators for offline and online evaluation
Community support
For teams building production LLM apps
Let's chat
Custom data volume
Unlimited users
Unlimited projects
Custom data retention policy
Custom evaluators for offline and online evaluation
SAML and SSO
VPC deployment in AWS, Azure, or GCP
Dedicated CSM, white-glove onboarding, and shared Slack channel for support
HoneyHive provides essential testing and observability tools that help teams test their LLM apps, discover issues, and improve reliability and performance through continuous iteration. This helps AI teams build safer, trustworthy, and more reliable AI products that are optimized for production scale.
Our tools will help you iterate faster, evaluate and benchmark performance as you iterate, monitor performance in production, and curate high-quality datasets for fine-tuning and continuous evaluation - all within a unified, collaborative workspace.
A trace is a collection of events that are captured during a typical interaction with your LLM application, also commonly referred to as a session
in HoneyHive. Each event
within the trace is a log of an API call or a function in your LLM app orchestration that can be stored in HoneyHive (eg: your LLM requests, retrieval step from a vector database/external tool, or any pre/post-processing steps).
An evaluator is a function that helps you compute heuristics to measure performance of your LLM app.
We allow users to design their own custom evaluators in Python using popular libraries such as Transformers, Numpy, Scikit Learn, etc. or alternatively use LLMs as a judge to grade specific events within a session or the entire session as a whole.
Evaluators can be used to judge subjective traits like coherence or answer faithfulness, detect if your agent went off track, check JSON schema validity, and more. This allows you to monitor and evaluate your applications with quantitative rigor and understand precisely where your LLM apps fail.
Evaluators can be defined both on the user session and event level, and are automatically computed as you log data in HoneyHive via any one of our logging methods.
HoneyHive allows you to filter and curate datasets from your production logs. These datasets can be annotated by domain experts within the platform and exported programmatically for fine-tuning open-source models.
Users can export their datasets curated within HoneyHive via our SDK and use their preferred fine-tuning provider and optimization method (such as DPO, KTO, etc.) to fine-tune custom models.
Users can also set up active learning automations to periodically export their logs and run fine-tuning and validation jobs with their preferred fine-tuning providers. Contact us to learn more.
All data is secure, encrypted, and private to your tenant. We conduct regular penetration tests, are currently undergoing SOC-2 audit, and provide flexible hosting solutions (VPC and on-prem) to meet your security and privacy needs.
By default, we do not proxy your requests via our servers. That said, we do provide an optional proxy for teams looking to manage their prompts via HoneyHive. This proxy can be hosted via HoneyHive or within your private cloud environment.
You can log your production requests or any evaluation runs in real-time using our logging endpoints and proxy, or async via our batch ingestion endpoints. We offer native SDKs in Python and Typescript, and provide additional integrations with popular open-source orchestration frameworks like Langchain and LlamaIndex.
For Enterprise customers, we also offer support in additional languages like Go, Java, and Rust via our API endpoints.
Our distributed tracing architecture generalizes across multiple orchestration frameworks (LlamaIndex, Langchain, AutoGen, etc.), models, and hosting environments (cloud, local, on-prem). This allows you to trace any LLM app, no matter how complex or custom your application is.
Integrating the SDK with your application can take anywhere from a few minutes to a couple hours, depending on the complexity of your application and your orchestration framework.
If you're currently using Langchain and LlamaIndex, you can get started in under 5 minutes with our 1-click LlamaIndex integration and Langchain tracer.
For Team and Enterprise plan users, our team is happy to provide hands-on support and instrumentation advice to get your team set up.