Navigating complex information spaces based on nuanced user preferences remains challenging for traditional vector search systems. When preferences are subtle or difficult to articulate, conventional approaches relying on explicit queries or similarity matching often fail. Together with our partners at Qdrant, we’re showcasing an iterative optimization technique that refines search results by leveraging both positive and negative feedback within a contextual vector space.
Our approach employs an adaptive process learning from ongoing user interaction. By collecting positive and negative examples through feedback cycles, the system creates refined context pairs that guide vector exploration without requiring a fixed target. This enables dynamic navigation of semantic landscapes while avoiding similarity bubbles that might constrain discovery. We trace each step of this preference learning journey with HoneyHive, allowing us to visualize how user feedback transforms into context pairs and ultimately shapes the recommendation trajectory.
To demonstrate this technique in action, we'll build a quote recommendation system powered by Qdrant's context search that adapts to user feedback in real-time.
Overview: Discovering Quotes That Resonate
This application acts as a personalized guide to help you discover inspirational quotes you'll love. It works iteratively, learning your preferences over a short conversation. For this demonstration, we'll use a small dataset of 1000 inspirational quotes taken from Kaggle's Inspirational Quotes Dataset .
An overview of how the application works. How it Works: At its core, the app uses a clever retrieval technique powered by the Qdrant vector database. Specifically, we leverage Qdrant's context search capability. Instead of searching for quotes similar to a single specific example (like a typical semantic search), context search allows us to explore the "quote space" based on pairs of positive and negative examples. We tell Qdrant, "Show me quotes that are like these (positive examples) but unlike those (negative examples)." Crucially, we don't provide a fixed target vector; the search is guided purely by the evolving context of your likes and dislikes.
The Iterative Process: Starting Point: The conversation begins with a randomly selected quote from our database. This gives us an initial anchor point.User Feedback: You provide feedback. This can be general ("I like stoic quotes, and I dislike romantic ones") or a direct reaction to the quote shown ("Yes, I like this!" or "No, not for me.").Intelligent Interpretation: An LLM analyzes your feedback. It identifies your preferences and, importantly, generates examples of quotes that align with what you like (positive examples) and what you dislike (negative examples). Even if you state a general preference like "stoic quotes," the LLM can generate a few concise examples of stoic-sounding quotes.Building Context: These generated positive and negative example quotes are converted into vector embeddings. These embeddings form the "context pairs" that we feed into Qdrant.Contextual Retrieval: We query Qdrant using these context pairs. Qdrant’s context search uses positive/negative parts to divide vector spaces into subspaces and provides (randomly) examples from a positive subspace - this allows for stirring recommendations, so they don’t get stuck in a similarity bubble. We only retrieve quotes that actually exist in our database, ensuring relevance and quality. The generated examples guide the search but aren't the final recommendations themselves.Refinement: A new quote, recommended based on your feedback, is presented.Looping: The process repeats. With each round of feedback, the positive and negative context grows, allowing the context search to increasingly zero in on quotes that truly match your unique taste. You can stop anytime by simply stating in natural language, something like "Ok, I've found my ideal quote, we can stop now."
Essentially, you're teaching the application your preferences through examples, and it uses Qdrant's context search to navigate the world of quotes and find the ones that will inspire you.
Implementation
In this section we'll cover the core implementation, focusing on how HoneyHive helps us trace and monitor the application’s behavior throughout the user journey. For the full code and setup instructions, check out HoneyHive’s cookbooks repository .
Initializing Tracing and Running the Agent if __name__ == "__main__" :
# --- Run Agent ---
HoneyHiveTracer.init(
session_name= "motivational_quote_assistant" ,
)
round_count = run_agent()
print(f "Total rounds: {round_count}" )
enrich_session(
metadata={ "round_count" : round_count}
)
HoneyHiveTracer.flush()
Here, we initialize HoneyHive’s tracer to start monitoring the session. The run_agent
function drives the conversational loop until the user decides to stop or interrupts. Thanks to HoneyHive’s @trace
decorator, all traced functions automatically inherit the session context and are visualized with a clear hierarchical structure in the HoneyHive dashboard. We also use enrich_session
to enrich the session metadata with the total number of rounds it took to find the ideal quote, making this key metric easy to analyze later.
Tracing the Main Agent Loop The main function driving the conversation is decorated with @trace
to capture its execution flow:
@trace
def run_agent():
"" "Runs the conversational quote recommendation agent." ""
# --- Initialization ---
initialization_result = _initialize_agent()
# ... rest of the function ...
Since HoneyHive is initialized beforehand, the decorator seamlessly integrates with the session context, enabling detailed tracing of the agent’s lifecycle without cluttering the code.
Starting the Conversation: Introducing the Agent and First Quote def _initialize_agent() -> Optional[Tuple[models.PointStruct, Set [int]]]:
"" "Initializes the agent state, prints welcome messages, and gets the first quote." ""
print( "\\n=== Motivational Quote Assistant ===" )
print( "I'm here to help you discover motivational quotes that resonate with you!" )
print( "You can tell me your preferences (e.g., 'I like stoic quotes, not romantic ones')" )
print( "or give feedback on quotes I show you." )
print( "Type 'stop' when you've found quotes you like." )
print( "\\nLet's start with a random quote to get your initial feedback..." )
# ...
The agent begins by greeting the user and setting expectations, then presents a random quote to kick off the feedback loop.
Interpreting User Input with an LLM After the initial quote is shown, the application waits for the user's input. The crucial next step is translating this natural language feedback into actionable positive and negative examples. These examples are essential for refining the retrieval process with increasingly relevant context. To achieve this, we employ an LLM to interpret the user's input.
@trace
def interpret_user_input_with_llm(
user_input: str,
current_quote : Optional[str] = None,
num_examples : int = 3
) -> Dict[str, Optional[List[str]] | bool]:
"" "
Interprets user input, generates example quotes, and detects stop intent.
Args:
user_input: The user's input text
current_quote: The current quote being discussed (if any)
num_examples: Number of examples to generate for each category
Returns:
A dictionary containing:
- positive_examples: List of example quotes matching preferences or None
- negative_examples: List of example quotes to avoid or None
- stop_requested: Boolean indicating if the user wants to stop
Examples:
" I like stoic quotes " -> {
" positive_examples ": [N stoic quotes],
" negative_examples ": None,
" stop_requested ": False
}
" Yes " (with current_quote) -> {
" positive_examples ": [current_quote + (N-1) similar quotes],
" negative_examples ": None,
" stop_requested ": False
}
" I like stoic quotes but not romantic ones " -> {
" positive_examples ": [N stoic quotes],
" negative_examples ": [N romantic quotes],
" stop_requested ": False
}
" ksajsoijehiuhs " -> {
" positive_examples ": None,
" negative_examples ": None,
" stop_requested ": False
}
" Ok, that 's enough, let' s stop " -> {
" positive_examples ": None,
" negative_examples ": None,
" stop_requested ": True
}
" ""
Updating Context Embeddings After the LLM interprets user feedback and generates example quotes, we update our context example lists. Qdrant's context search operates with either vector embeddings or quote IDs from our database. Though our feedback examples may come from either the database or be generated, we standardize by always passing embeddings directly to the search API. These lists of embeddings maintain the cumulative positive and negative context from the entire conversation.
def _update_embeddings_from_interpretation(
interpretation: Dict,
positive_embeddings : List[List[float]],
negative_embeddings : List[List[float]]
):
"" "Updates positive and negative embedding lists based on LLM interpretation." ""
if interpretation.get( "positive_examples" ):
print( "\\nProcessing positive feedback..." )
for example in interpretation[ "positive_examples" ]:
try :
embedding = get_embedding(example)
positive_embeddings.append(embedding)
except Exception as e:
logger.error(f "Error getting embedding for positive example: {e}" )
continue
if interpretation.get( "negative_examples" ):
print( "\\nProcessing negative feedback..." )
for example in interpretation[ "negative_examples" ]:
try :
embedding = get_embedding(example)
negative_embeddings.append(embedding)
except Exception as e:
logger.error(f "Error getting embedding for negative example: {e}" )
continue
Note that while this function itself isn’t explicitly traced, calls to OpenAI’s embedding API are automatically captured by HoneyHive, ensuring no loss of visibility into this crucial step.
Querying Qdrant with Context Pairs With the updated context embeddings, we are ready to query our Qdrant vector database. We use the accumulated positive and negative embeddings as context pairs to perform a Qdrant Context Search. This search identifies quotes in our database that align with the positive examples while diverging from the negative ones.
@trace
def query_by_context(
context_pairs: List[models.ContextPair],
limit : int = 10 ,
) -> List[models.ScoredPoint]:
"" "
Performs a Qdrant Context query using context pairs.
Args:
context_pairs: A list of positive/negative context pairs (models.ContextPair).
limit: The maximum number of results to return.
Returns:
A list of ScoredPoint objects from the query results.
" ""
try :
context_queries = [
models.QueryRequest(
query=models.ContextQuery(context=context_pairs),
limit=limit,
with_payload=True,
with_vector=True, # Include vectors in response
),
]
logger.info(f "Performing Context query in collection '{COLLECTION_NAME}' with {len(context_pairs)} pairs..." )
results = client.query_batch_points(
collection_name=COLLECTION_NAME,
requests=context_queries
)
logger.info( "Query finished." )
# Return the points from the first (and only) query
enrich_span(metadata={ "number_pairs" : len(context_pairs)})
return results[ 0 ].points if results else []
except Exception as e:
logger.error(f "Error in query_by_context: {e}" )
raise
The retrieval step is critical for the application's success, and monitoring its performance is essential. When recommendations aren't hitting the mark, we need sufficient data to diagnose the issue. Therefore, we enrich the query_by_context
trace span using enrich_span
. Specifically, we add the number_pairs
(the count of context pairs used in the Qdrant query) as metadata. Correlating this metadata with other trace information (like the quality of LLM interpretation or the latency of the Qdrant query) can help pinpoint the root cause of any problematic interactions.
Iterative Refinement and Debugging
At this point, the application presents the newly recommended quote to the user. The cycle then repeats: the user provides feedback, the LLM interprets it, the context embeddings are updated, and Qdrant retrieves the next recommendation. With each round, the system gathers more nuanced context about the user's preferences, ideally converging on quotes that truly resonate.
Of course, the process might not always be perfect. But because we're tracing every significant step with HoneyHive, we accumulate a wealth of data. This detailed tracing provides the necessary visibility to debug issues, understand user journeys, and ultimately improve the application's performance and effectiveness.
Monitoring and Analysis with HoneyHive
After running the application and interacting with it, we can visualize the entire session flow within HoneyHive. Each session trace provides a hierarchical view of the operations performed, including our custom traced functions (run_agent
, interpret_user_input_with_llm
, query_by_context
) and automatically traced events like LLM calls (for generating examples) and embedding generations.
Visualizing the trace of a single conversational session in HoneyHive. This detailed tracing enables powerful analysis and debugging workflows:
Analyzing Interaction Length: How many rounds does it typically take for a user to find a quote they like? Are users genuinely finding good recommendations, or are they abandoning the session after numerous unsuccessful attempts? By logging the round_count
metadata, we can filter sessions based on the number of rounds (filter->metadata->round_count
). Examining sessions with very high or very low round counts can reveal whether users are engaged and successful or frustrated and giving up.Root Cause Analysis: When a user receives unsatisfying recommendations, the traces help identify the source of the problem. Was it an LLM misinterpretation of the user's feedback? Did the Qdrant retrieval step fail to return relevant results despite accurate context? Or perhaps the user interacted with the application in an unexpected way? Tracing each component allows us to pinpoint bottlenecks and areas for improvement.Identifying and Annotating Issues: No system is perfect, and our quote assistant will inevitably encounter edge cases or make mistakes. For instance, the LLM might misinterpret feedback, leading to incorrect positive or negative examples (as shown below, where the LLM incorrectly flagged family/friendship quotes as negative despite the user's preference). HoneyHive allows us to annotate these problematic events directly within the trace, adding comments and ratings. This human feedback is invaluable for targeted improvements and can be easily surfaced later by filtering for annotated traces.Adding human feedback to an incorrect LLM interpretation. Conclusion
Building applications that understand nuanced user preferences requires sophisticated tools. By combining Qdrant's context search for powerful, example-driven retrieval and HoneyHive for comprehensive tracing and observability, we created an interactive quote discovery assistant that learns and adapts. The ability to trace the entire workflow—from user input interpretation to vector database querying—provides invaluable insights for debugging, analysis, and continuous improvement. If you want to build your own adaptive search systems, check out HoneyHive's cookbooks repository for complete code and setup instructions.
References