Hybrid search is an advanced technique that combines the strengths of semantic search and keyword-based search to provide more accurate and comprehensive search results. This approach is particularly effective when dealing with diverse datasets or complex query requirements.

Please keep in mind hybrid search is slower than non-hybrid search. Unless required by your use case, we recommend not enabling hybrid search. For more details on the various search techniques, please reference this article.

Implementing Hybrid Search in Carbon

To enable hybrid search for a customer across a set of documents, follow these steps:

1. Use the /modify_user_configuration endpoint

Send the following payload to enable sparse vectors for the customer:

{
  "configuration_key_name": "sparse_vectors",
  "value": {
    "enabled": true
  }
}

2. Generate Sparse Vectors for Content

For Files via API: Set the query parameter generate_sparse_vectors to true in the API request body.

For Carbon Connect: Set generateSparseVectors to TRUE for each enabledIntegrations.

3. Enable hybrid search on /embeddings endpoint

To enable hybrid search, you must set the hybrid_search parameter in the /embeddings request body to TRUE.

When hybrid search is enabled, a combination of keyword search and semantic search are used to rank and select candidate embeddings during information retrieval. By default, these search methods are weighted equally during the ranking process. To adjust the weight (or “importance”) of each search method, you can use the hybrid_search_tuning_parameters property. The description for the different tuning parameters are:

  • weight_a: weight to assign to semantic search
  • weight_b: weight to assign to keyword search

You must ensure that sum(weight_a, weight_b,..., weight_n) for all n weights is equal to 1. The equality has an error tolerance of 0.001 to account for possible floating point issues.