Introduction
Data Connectors for LLMs.
What is Carbon?
Carbon offers a comprehensive framework specifically designed to streamline the process of connecting external data sources to Large Language Models (LLMs).
Carbon significantly simplifies the process of Retrieval Augmented Generation (RAG), allowing you to devote more time to using your data and less time to ingesting it.
How Carbon Works
You can use our REST API or SDKs to sync user data sources and retrieve the data to use with LLMs. Carbon has native integrations with 20+ data connectors and supports more than 20+ file formats, encompassing text, audio, and visual data.
Depending on your use case and in-house infrastructure, you can retrieve user data from Carbon in several formats:
- Original file (PDF, CSV, etc.)
- Parsed plain text
- Chunks and embeddings to store in your vector store
- Semantic, keyword and hybrid search on a managed or self-hosted vector database
Products
π Connect
Carbon Connect provides a seamless client-side UI for users to authenticate and upload content from various data sources, including Notion, Google Drive, Dropbox, OneDrive, websites, and file uploads. Once authenticated, Carbon processes the source data and automates data synchronization to keep your application up-to-date with the connected sources.
You can integrate the Connect module into your application using a React component, SDK libraries, or a magic link (coming soon).
ποΈ Store
Carbon offers flexible storage options. You can choose between Carbonβs managed vector database, hosted on Qdrant Cloud, or use your own custom vector store. Embeddings and chunks in the connected vector store automatically updates as users modify connected sources.
π Universal API
Carbonβs Universal API allows you to access and manage data from any source, including documents, chunks, vectors, and other metadata. Our flexible API suite powers all functionality, including the Carbon Connect UI which makes calls to the backend.
For detailed information on available endpoints, request/response formats, and code samples, refer to our API Reference.
Concepts
Files
Files (along with folders) can be synced with Carbon locally or through third-party connectors. The filesβ content is (optionally) processed to extract semantic information, and we also provide the original file, plain text, chunks, and embeddings.
Carbon supports various file types across text, image, video, and audio, and the complete list of supported file types is available here. Files can include source-specific and custom metadata, which can later be used for retrieval filtering. You can update files, which will replace the previous version in future retrieval.
Tags
Tags are custom metadata that can be added to files. They can be used to filter results across various endpoints, from retrieving files to deleting them. Tags are helpful for many purposes, such as managing permissions and organizing files. They can be modified without having to re-process the file. Learn more about filtering by tags here.
Connectors
To sync user data from external sources to Carbon, you can utilize connectors. For example, you can connect a Google Drive folder, and all its content will be automatically sent to Carbon and kept updated. Learn more about our data connectors here.
Retrieval
You can choose whether Carbon should generate and store chunks and embeddings for a specific file. If you opt for this, Carbon will manage your chunks and embeddings, enabling you to perform semantic, keyword, and hybrid searches directly on your file content by entering a natural language search query. Learn more about hybrid search here.
Setup
π Getting a Carbon API Key
Your first million characters on Carbon is free.
Sign up for a free API here, or book a 15-minute onboarding to get an API key here.
π Helpful Links
To get started with Carbon, follow our guides: