Functionality

Our RSS connector parses content from web-hosted RSS and Atom feeds (all versions), treating each <item> (or <entry>) as a separate file.

The <content> is parsed and made retrievable in plain text, chunks, and embeddings via Carbon’s API.

If <content> isn’t available, we first fallback to <summary> and then to <description>. If none of these values exist, then the request will fail.

The <title> of the <item> is assigned as the file’s name, and the <link> is used as the external URL. Uniqueness is determined by the <link>, with a fallback to a slugified version of the <title> if no <link> is provided for the <item>.

Synchronization

Syncs are triggered when end-users re-submits an URL via the /integrations/rss_feed endpoint or Carbon Connect. You can also use the resync_file API endpoint to programmatically resync the file for a specific RSS <item>.

To delete the file corresponding to a RSS <item> from Carbon, you can use the delete_files endpoint directly.

A <pubDate> value is necessary for re-syncs to update existing files. Without this value, re-syncs will only add new items that are introduced to the feed.

To sync RSS feeds on a 24-hour schedule (more frequent schedules available upon request), you can use the /update_users endpoint. This endpoint allows organizations to customize syncing settings according to their requirements, with the option to enable syncing for all data sources using the string ‘ALL’. It’s important to note that each request supports up to 100 customer IDs.

Here’s an example illustrating how to automatically enable syncing for updated RSS content for specified users:

{
    "customer_ids": ["team@carbon.ai", "sam@openai.com"],
    "auto_sync_enabled_sources": ["RSS"]
}