Overview

Carbon supports multiple embeddings models for various use cases and modalities.

Supported Models

Carbon can help you switch between different embedding models. Contact Us

Text Embeddings

ModelDeveloperCompression FactorEmbedding SizeAverage MTEB ScoreCarbon Slug
ada v2OpenAI-153661.0OPENAI
text-embedding-3-smallOpenAI-51261.6OPENAI_ADA_SMALL_512
OpenAI-153662.3OPENAI_ADA_SMALL_1536
text-embedding-3-largeOpenAI-25662.0OPENAI_ADA_LARGE_256
OpenAI-102464.1OPENAI_ADA_LARGE_1024
OpenAI-307264.6OPENAI_ADA_LARGE_3072
Cohere Embed v3 MultilingualCohere-102464.0COHERE_MULTILINGUAL_V3
Cohereint81024-Launching soon
Coherebinary1024-Launching soon
Solar EmbeddingsUpstage-4096-SOLAR_1_MINI
jina-embeddings-v2Jina-76860.4Launching soon

Reranking Models

ModelDeveloperCarbon Slug
jina-reranker-v2-base-multilingualJina AIJINA_MULTILINGUAL_BASE_V2
Cohere Rerank 3 MultilingualCohereCOHERE_RERANK_MULTILINGUAL_V3
Pongo RerankingPongoPONGO_RERANKER

Image Embeddings

Currently do not designate VERTEX_MULTIMODAL as an embedding_model. This model is automatically employed by Carbon when processing an image file.
ModelDeveloperEmbedding SizeCarbon Slug
Embeddings for MultimodalGoogle1408VERTEX_MULTIMODAL

Video Embeddings

Currently do not designate VERTEX_MULTIMODAL as an embedding_model. This model is automatically employed by Carbon when processing a video file.
ModelDeveloperEmbedding SizeCarbon Slug
Embeddings for MultimodalGoogle1408VERTEX_MULTIMODAL

Usage

To define the embedding model, utilize the embedding_model parameter in the POST body for the /embeddings and other API endpoints. By default, if no specific model is provided, the system will use OPENAI.

During a vector search, only files with embeddings generated using the specified model are taken into consideration.

For example, if files A and B have embeddings generated with the OPENAI model, and files C and D with COHERE_MULTILINGUAL_V3, the system will default to considering only files A and B when a query is executed without embedding_model set.

Alternatively, if COHERE_MULTILINGUAL_V3 is explicitly set as the embedding_model in the /embeddings endpoint, the search will exclusively consider files C and D.

It’s important that all files intended for a query have embeddings generated using the same model.