All text, audio, video and image files are supported file formats via Spaces Object Storage.

Compatibility with S3

The Spaces API is interoperable with the AWS S3, so Digital Ocean Spaces makes use of the existing S3 endpoints.

This means that the source of Digital Ocean files is S3. To differentiate between data sources and files from Spaces Object Storage, additional metadata has been added:

Data Source Metadata

data_source_metadata: Indicates the type of data source. Possible values include:

  • S3: Represents an Amazon S3 data source.
  • DigitalOcean Space: Represents a DigitalOcean Spaces data source.

File Metadata

file_metadata: Specifies the type of file. Possible values include:

  • S3 File: Represents a file stored in Amazon S3.
  • DigitalOcean Space File: Represents a file stored in DigitalOcean Spaces.
  • S3 Bucket: Represents a file representation for a S3 Bucket.
  • DigitalOcean Space Bucket: Represents a file representation for a DigitalOcean Space Bucket.

Authorization Type

Carbon uses Access Key based authentication to connect to Spaces Object Storage.

Functionality

Carbon allows users to upload text, image, video. and audio file formats directly from Spaces Object Storage.

Synchronization

Step 1: Generate Access Key and Secret

For Digital Ocean Spaces, generate the Access Key and Secret under the Space Keys tab of the Applications and API page here.

Step 2: Provide Credentials to Carbon

In order to access Spaces from different regions, you need to create separate Access Keys for each region.

POST the generated credentials to the /integrations/s3 endpoint in this format:

{
    "access_key": "<<your-access-key>>",
    "access_key_secret": "<<access-key-secret>>",
    "endpoint_url": "<<region>>.digitaloceanspaces.com"
}

Step 3: List Files and Buckets

Optionally load all available Spaces Object Storage buckets and objects by using POST /integrations/items/sync and list them via POST /integrations/items/list.

Step 4: Sync Files or Buckets

Use the /integrations/s3/files endpoint to sync specific file objects or buckets.

For syncing specific files (multiple files allowed):

{
    "ids": [{"bucket": "bucket-name", "id": "object-key"}],
    "tags": "...",
    "chunk_size": "...",
    "chunk_overlap": "...",
    "skip_embedding_generation": "..."
}

For syncing all files from buckets (multiple buckets allowed):

{
    "ids": [{"bucket": "bucket-name"}],
    "tags": "...",
    "chunk_size": "...",
    "chunk_overlap": "...",
    "skip_embedding_generation": "..."
}

You can also use the resync_file API endpoint to programmatically resync specific Spaces Object Storage files. To delete Spaces Object Storage files from Carbon, you can use the delete_file endpoint directly.

To sync Spaces Object Storage files on a 24-hour schedule (more frequent schedules available upon request), you can use the /update_users endpoint. This endpoint allows organizations to customize syncing settings according to their requirements, with the option to enable syncing for all data sources using the string ‘ALL’. It’s important to note that each request supports up to 100 customer IDs.

Here’s an example illustrating how to automatically enable syncing for updated Spaces Object Storage content for specified users:

{
    "customer_ids": ["team@carbon.ai", "sam@openai.com"],
    "auto_sync_enabled_sources": ["S3"]
}