Cold Storage
Carbon offers cold storage for file embeddings, optimizing storage costs and performance.
Overview
Carbon now supports moving file embeddings between hot and cold storage. This feature allows you to optimize storage costs and improve performance by keeping embeddings for frequently accessed files in hot storage (vector storage) while moving less frequently used files to cold storage (object storage).
Enabling Cold Storage
By default, the cold storage feature is not enabled. To enable cold storage, you must set a flag at file upload time. Currently, cold storage is only available for local file uploads via the following endpoints:
/uploadfile
/upload_text
/upload_file_from_url
Once enabled, files will automatically be moved to cold storage after a set period of inactivity.
Moving Files between Storage Tiers
Hot to Cold Storage
Once enabled, files will be automatically moved from hot to cold storage after a specified period of inactivity. This period is determined by the hot_storage_time_to_live
parameter, which represents the number of seconds a file must be inactive before it’s moved to cold storage. There is no manual way to move files to cold storage.
You can make an API request to the /modify_cold_storage_parameters
endpoint to update existing files to use cold storage.
Cold to Hot Storage
To move files from cold to hot storage, make an API request to /move_to_hot_storage
. The request will take filters similar to /user_files_v2
, and all files matching the provided filters will be moved to hot storage.
To avoid a single request hogging resources, there is a limit of 200 files that can be moved in one request. If the number of files matching the filters exceeds 200, the files will be processed in batches of 200 over a longer period of time.
/embeddings
Endpoint Behavior
If a request is made to /embeddings
that involves files in cold storage:
- An error will be returned that includes all
file_ids
for the affected files. This allows the client to know which files need to be moved to hot storage before the request can be processed. - If
exclude_cold_storage_embeddings
is set totrue
, any files in cold storage will be ignored, and no error will be thrown for requests involving files in cold storage. The search will naturally exclude those files.
In the future, we may enable a way to allow /embeddings
to work with files that are in both cold and hot storage.
File Object Information
Activity is defined as when a file was last used, which currently includes file resyncs, queries involving that file, and updates to that file’s tags.
The following fields under the file object (under user_files_v2
) are related to cold storage:
last_use
: A timestamp indicating when a file was last used (i.e., when it last had activity).supports_cold_storage
: A flag indicating whether or not a file can be moved to cold storage.hot_storage_time_to_live
: An integer representing the number of seconds a file must be inactive before it’s moved to cold storage.embedding_storage_status
: The storage status of the embeddings for a file, indicating whether they are in cold or hot storage.
Cold Storage Webhooks
MOVED_TO_COLD_STORAGE
: This event is fired when a file is moved to cold storage.MOVED_TO_HOT_STORAGE
: This event is fired when a file is moved to hot storage.