Custom Metadata

A tag is a key-value pair that can be added to a file to provide custom metadata. These pairs can be used for searches or document retrieval to narrow down search results. A file can have any number of tags.

The following keys are reserved and cannot be used:

  • db_embedding_id
  • organization_id
  • user_id
  • organization_user_file_id
  • chunk_number

Carbon currently supports two data types for tag values: string and a list of string. Keys can only be strings. If values other than string and a list of string are used, they are automatically converted to strings (e.g. 4 will become “4”).

Tags can be added directly via the API (through file upload or OAuth URL) or through Carbon Connect.

Filtering

When filtering based on tags, customers can construct complex filters using “AND”, “OR”, and negation logic.

Take the below input as an example:

{
    "OR": [
        {
            "key": "subject",
            "value": "holy-bible",
            "negate": false
        },
        {
            "key": "person-of-interest",
            "value": "jesus christ",
            "negate": false
        },
        {
            "key": "genre",
            "value": "religion",
            "negate": true
        }
        {
            "AND": [
                {
                    "key": "subject",
                    "value": "tao-te-ching",
                    "negate": false
                },
                {
                    "key": "author",
                    "value": "lao-tzu",
                    "negate": false
                }
            ]
        }
    ]
}

In this case, files will be filtered such that:

  1. “subject” = “holy-bible” OR
  2. “person-of-interest” = “jesus christ” OR
  3. “genre” != “religion” OR
  4. “subject” = “tao-te-ching” AND “author” = “lao-tzu”

Note that the top level of the query must be either an “OR” or “AND” array. Currently, nesting is limited to 3. For tag blocks (those with “key”, “value”, and “negate” keys), the following typing rules apply:

  1. “key” isn’t optional and must be a string
  2. “value” isn’t optional and can be any or list[any]
  3. “negate” is optional and must be true or false. If present and true, then the filter block is negated in the resulting query. It is false by default.