Quickstart

Installation

$ pip install cohere

Instantiate and use the client

There are two clients in the sdk with a common interface

  • Client is based on the python requests package.

  • AsyncClient uses the python asyncio interface and aiohttp package.

It is recommended to use AsyncClient for performance critical applications with many concurrent calls.

from cohere import Client
co = Client()
co.generate("Hello, my name is", max_tokens=10)
from cohere import AsyncClient
co = AsyncClient()
await co.generate("Hello, my name is", max_tokens=10)
await co.close()  # the AsyncClient client should be closed when done
from cohere import AsyncClient
async with AsyncClient() as co:  # using 'async with' runs check_api_key and closes any sessions automatically
    await co.generate("Hello, my name is", max_tokens=10)

API

Client

class cohere.client.Client(api_key: str | None = None, num_workers: int = 64, request_dict: dict = {}, check_api_key: bool = True, client_name: str | None = None, max_retries: int = 3, timeout: int = 120, api_url: str | None = None)

Cohere Client

Parameters:
  • api_key (str) – Your API key.

  • num_workers (int) – Maximal number of threads for parallelized calls.

  • request_dict (dict) – Additional parameters for calls with the requests library. Currently ignored in AsyncClient

  • check_api_key (bool) – Whether to check the api key for validity on initialization.

  • client_name (str) – A string to identify your application for internal analytics purposes.

  • max_retries (int) – maximal number of retries for requests.

  • timeout (int) – request timeout in seconds.

  • api_url (str) – override the default api url from the default cohere.COHERE_API_URL

check_api_key() Dict[str, bool]

Checks the api key, which happens automatically during Client initialization, but not in AsyncClient. check_api_key raises an exception when the key is invalid, but the return value for valid keys is kept for backwards compatibility.

loglikelihood(prompt: str | None = None, completion: str | None = None, model: str | None = None) LogLikelihoods

Calculates the token log-likelihood for a provided prompt and completion. Using this endpoint instead of co.generate with max_tokens=0 will guarantee that any required tokens such as <EOP_TOKEN> are correctly inserted, and makes it easier to retrieve only the completion log-likelihood.

Parameters:
  • prompt (str) – The prompt

  • completion (str) – (Optional) The completion

  • model (str) – (Optional) The model to use for calculating the log-likelihoods

batch_generate(prompts: List[str], return_exceptions=False, **kwargs) List[Generations | Exception]

A batched version of generate with multiple prompts.

Parameters:
  • prompts – list of prompts

  • return_exceptions (bool) – Return exceptions as list items rather than raise them. Ensures your entire batch is not lost on one of the items failing.

  • kwargs – other arguments to generate

generate(prompt: str | None = None, prompt_vars: object = {}, model: str | None = None, preset: str | None = None, num_generations: int | None = None, max_tokens: int | None = None, temperature: float | None = None, k: int | None = None, p: float | None = None, frequency_penalty: float | None = None, presence_penalty: float | None = None, end_sequences: List[str] | None = None, stop_sequences: List[str] | None = None, return_likelihoods: str | None = None, truncate: str | None = None, stream: bool = False) Generations | StreamingGenerations

Generate endpoint. See https://docs.cohere.ai/reference/generate for advanced arguments

Parameters:
  • prompt (str) – Represents the prompt or text to be completed. Trailing whitespaces will be trimmed.

  • model (str) – (Optional) The model ID to use for generating the next reply.

  • return_likelihoods (str) – (Optional) One of GENERATION|ALL|NONE to specify how and if the token (log) likelihoods are returned with the response.

  • preset (str) – (Optional) The ID of a custom playground preset.

  • num_generations (int) – (Optional) The number of generations that will be returned, defaults to 1.

  • max_tokens (int) – (Optional) The number of tokens to predict per generation, defaults to 20.

  • temperature (float) – (Optional) The degree of randomness in generations from 0.0 to 5.0, lower is less random.

  • truncate (str) – (Optional) One of NONE|START|END, defaults to END. How the API handles text longer than the maximum token length.

  • stream (bool) – Return streaming tokens.

Returns:

a Generations object if stream=True: a StreamingGenerations object including:

id (str): The id of the whole generation call generations (Generations): same as the response when stream=False finish_reason (string) possible values:

COMPLETE: when the stream successfully completed ERROR: when an error occurred during streaming ERROR_TOXIC: when the stream was halted due to toxic output. ERROR_LIMIT: when the context is too big to generate. USER_CANCEL: when the user has closed the stream / cancelled the request MAX_TOKENS: when the max tokens limit was reached.

texts (List[str]): list of segments of text streamed back from the API

Return type:

if stream=False

Examples

A simple generate message:
>>> res = co.generate(prompt="Hey! How are you doing today?")
>>> print(res.text)
Streaming generate:
>>> res = co.generate(
>>>     prompt="Hey! How are you doing today?",
>>>     stream=True)
>>> for token in res:
>>>     print(token)
chat(message: str | None = None, conversation_id: str | None = '', model: str | None = None, return_chat_history: bool | None = False, return_prompt: bool | None = False, return_preamble: bool | None = False, chat_history: List[Dict[str, str]] | None = None, preamble_override: str | None = None, user_name: str | None = None, temperature: float | None = 0.8, max_tokens: int | None = None, stream: bool | None = False, p: float | None = None, k: float | None = None, search_queries_only: bool | None = None, documents: List[Dict[str, Any]] | None = None, citation_quality: str | None = None, prompt_truncation: str | None = None, connectors: List[Dict[str, Any]] | None = None) Chat | StreamingChat

Returns a Chat object with the query reply.

Parameters:
  • message (str) – The message to send to the chatbot.

  • stream (bool) – Return streaming tokens.

  • conversation_id (str) – (Optional) To store a conversation then create a conversation id and use it for every related request.

  • preamble_override (str) – (Optional) A string to override the preamble.

  • chat_history (List[Dict[str, str]]) – (Optional) A list of entries used to construct the conversation. If provided, these messages will be used to build the prompt and the conversation_id will be ignored so no data will be stored to maintain state.

  • model (str) – (Optional) The model to use for generating the response.

  • temperature (float) – (Optional) The temperature to use for the response. The higher the temperature, the more random the response.

  • p (float) – (Optional) The nucleus sampling probability.

  • k (float) – (Optional) The top-k sampling probability.

  • max_tokens (int) – (Optional) The max tokens generated for the next reply.

  • return_chat_history (bool) – (Optional) Whether to return the chat history.

  • return_prompt (bool) – (Optional) Whether to return the prompt.

  • return_preamble (bool) – (Optional) Whether to return the preamble.

  • user_name (str) – (Optional) A string to override the username.

  • search_queries_only (bool) – (Optional) When true, the response will only contain a list of generated search queries, but no search will take place, and no reply from the model to the user’s message will be generated.

  • documents (List[Dict[str, str]]) –

    (Optional) Documents to use to generate grounded response with citations. Example: documents=[

    {

    “id”: “national_geographic_everest”, “title”: “Height of Mount Everest”, “snippet”: “The height of Mount Everest is 29,035 feet”, “url”: “https://education.nationalgeographic.org/resource/mount-everest/”,

    }, {

    ”id”: “national_geographic_mariana”, “title”: “Depth of the Mariana Trench”, “snippet”: “The depth of the Mariana Trench is 36,070 feet”, “url”: “https://www.nationalgeographic.org/activity/mariana-trench-deepest-place-earth”,

    },

    ],

  • connectors (List[Dict[str, str]]) – (Optional) When specified, the model’s reply will be enriched with information found by quering each of the connectors (RAG). Example: connectors=[{“id”: “web-search”}]

  • citation_quality (str) – (Optional) Dictates the approach taken to generating citations by allowing the user to specify whether they want “accurate” results or “fast” results. Defaults to “accurate”.

  • prompt_truncation (str) – (Optional) Dictates how the prompt will be constructed. With prompt_truncation set to “AUTO”, some elements from chat_history and documents will be dropped in attempt to construct a prompt that fits within the model’s context length limit. With prompt_truncation set to “OFF”, no elements will be dropped. If the sum of the inputs exceeds the model’s context length limit, a TooManyTokens error will be returned.

Returns:

a Chat object if stream=False, or a StreamingChat object if stream=True

Examples

A simple chat message:
>>> res = co.chat(message="Hey! How are you doing today?")
>>> print(res.text)
Continuing a session using a specific model:
>>> res = co.chat(
>>>     message="Hey! How are you doing today?",
>>>     conversation_id="1234",
>>>     model="command",
>>>     return_chat_history=True)
>>> print(res.text)
>>> print(res.chat_history)
Streaming chat:
>>> res = co.chat(
>>>     message="Hey! How are you doing today?",
>>>     stream=True)
>>> for token in res:
>>>     print(token)
Stateless chat with chat history:
>>> res = co.chat(
>>>     message="Tell me a joke!",
>>>     chat_history=[
>>>         {'role': 'User', message': 'Hey! How are you doing today?'},
>>>         {'role': 'Chatbot', message': 'I am doing great! How can I help you?'},
>>>     ],
>>>     return_prompt=True)
>>> print(res.text)
>>> print(res.prompt)
Chat message with documents to use to generate the response:
>>> res = co.chat(
>>>     "How deep in the Mariana Trench",
>>>     documents=[
>>>         {
>>>            "id": "national_geographic_everest",
>>>            "title": "Height of Mount Everest",
>>>            "snippet": "The height of Mount Everest is 29,035 feet",
>>>            "url": "https://education.nationalgeographic.org/resource/mount-everest/",
>>>         },
>>>         {
>>>             "id": "national_geographic_mariana",
>>>             "title": "Depth of the Mariana Trench",
>>>             "snippet": "The depth of the Mariana Trench is 36,070 feet",
>>>             "url": "https://www.nationalgeographic.org/activity/mariana-trench-deepest-place-earth",
>>>         },
>>>       ])
>>> print(res.text)
>>> print(res.citations)
>>> print(res.documents)
Chat message with connector to query and use the results to generate the response:
>>> res = co.chat(
>>>     "What is the height of Mount Everest?",
>>>      connectors=[{"id": "web-search"})
>>> print(res.text)
>>> print(res.citations)
>>> print(res.documents)
Generate search queries for fetching documents to use in chat:
>>> res = co.chat(
>>>     "What is the height of Mount Everest?",
>>>      search_queries_only=True)
>>> if res.is_search_required:
>>>      print(res.search_queries)
embed(texts: List[str], model: str | None = None, truncate: str | None = None, input_type: str | None = None, embedding_types: List[str] | None = None) Embeddings

Returns an Embeddings object for the provided texts. Visit https://cohere.ai/embed to learn about embeddings.

Parameters:
  • text (List[str]) – A list of strings to embed.

  • model (str) – (Optional) The model ID to use for embedding the text.

  • truncate (str) – (Optional) One of NONE|START|END, defaults to END. How the API handles text longer than the maximum token length.

  • input_type (str) – (Optional) One of “classification”, “clustering”, “search_document”, “search_query”. The type of input text provided to embed.

  • embedding_types (List[str]) – (Optional) Specifies the types of embeddings you want to get back. Not required and default is None, which returns the float embeddings in the response’s embeddings field. Can be one or more of the following types: “float”, “int8”, “uint8”, “binary”, “ubinary”.

codebook(model: str | None = None, compression_codebook: str | None = 'default') Codebook

Returns a codebook object for the provided model. Visit https://cohere.ai/embed to learn about compressed embeddings and codebooks.

Parameters:
  • model (str) – (Optional) The model ID to use for embedding the text.

  • compression_codebook (str) – (Optional) The compression codebook to use for compressed embeddings. Defaults to “default”.

classify(inputs: List[str] = [], model: str | None = None, preset: str | None = None, examples: List[Example] = [], truncate: str | None = None) Classifications

Returns a Classifications object of the inputs provided, see https://docs.cohere.ai/reference/classify for advances usage.

Parameters:
  • inputs (List[str]) – A list of texts to classify.

  • model (str) – (Optional) The model ID to use for classifing the inputs.

  • examples (List[ClassifyExample]) – A list of ClassifyExample objects containing a text and its associated label.

  • truncate (str) – (Optional) One of NONE|START|END, defaults to END. How the API handles text longer than the maximum token length.

summarize(text: str, model: str | None = None, length: str | None = None, format: str | None = None, temperature: float | None = None, additional_command: str | None = None, extractiveness: str | None = None) SummarizeResponse

Returns a generated summary of the specified length for the provided text.

Parameters:
  • text (str) – Text to summarize.

  • model (str) – (Optional) ID of the model.

  • length (str) – (Optional) One of {“short”, “medium”, “long”}, defaults to “medium”. Controls the length of the summary.

  • format (str) – (Optional) One of {“paragraph”, “bullets”}, defaults to “paragraph”. Controls the format of the summary.

  • extractiveness (str) –

  • temperature (float) – Ranges from 0 to 5. Controls the randomness of the output. Lower values tend to generate more “predictable” output, while higher values tend to generate more “creative” output. The sweet spot is typically between 0 and 1.

  • additional_command (str) – (Optional) Modifier for the underlying prompt, must complete the sentence “Generate a summary _”.

Examples

Summarize a text:
>>> res = co.summarize(text="Stock market report for today...")
>>> print(res.summary)
Summarize a text with a specific model and prompt:
>>> res = co.summarize(
>>>     text="Stock market report for today...",
>>>     model="summarize-xlarge",
>>>     length="long",
>>>     format="bullets",
>>>     temperature=0.3,
>>>     additional_command="focusing on the highest performing stocks")
>>> print(res.summary)
batch_tokenize(texts: List[str], return_exceptions=False, **kwargs) List[Tokens | Exception]

A batched version of tokenize.

Parameters:
  • texts – list of texts

  • return_exceptions (bool) – Return exceptions as list items rather than raise them. Ensures your entire batch is not lost on one of the items failing.

  • kwargs – other arguments to tokenize

tokenize(text: str, model: str | None = None) Tokens

Returns a Tokens object of the provided text, see https://docs.cohere.ai/reference/tokenize for advanced usage.

Parameters:
  • text (str) – Text to summarize.

  • model (str) – An optional model name that will ensure that the tokenization uses the tokenizer used by that model, which can be critical for counting tokens properly.

batch_detokenize(list_of_tokens: List[List[int]], return_exceptions=False, **kwargs) List[Detokenization | Exception]

A batched version of detokenize.

Parameters:
  • list_of_tokens – list of list of tokens

  • return_exceptions (bool) – Return exceptions as list items rather than raise them. Ensures your entire batch is not lost on one of the items failing.

  • kwargs – other arguments to detokenize

detokenize(tokens: List[int], model: str | None = None) Detokenization

Returns a Detokenization object of the provided tokens, see https://docs.cohere.ai/reference/detokenize for advanced usage.

Parameters:
  • tokens (List[int]) – A list of tokens to convert to strings

  • model (str) – An optional model name. This will ensure that the detokenization is done by the tokenizer used by that model.

detect_language(texts: List[str]) DetectLanguageResponse

This API is deprecated.

generate_feedback(request_id: str, good_response: bool, model=None, desired_response: str | None = None, flagged_response: bool | None = None, flagged_reason: str | None = None, prompt: str | None = None, annotator_id: str | None = None) GenerateFeedbackResponse

Give feedback on a response from the Cohere Generate API to improve the model.

Parameters:
  • request_id (str) – The request_id of the generation request to give feedback on.

  • good_response (bool) – Whether the response was good or not.

  • model (str) – (Optional) ID of the model.

  • desired_response (str) – (Optional) The desired response.

  • flagged_response (bool) – (Optional) Whether the response was flagged or not.

  • flagged_reason (str) – (Optional) The reason the response was flagged.

  • prompt (str) – (Optional) The prompt used to generate the response.

  • annotator_id (str) – (Optional) The ID of the annotator.

Examples

A user accepts a model’s suggestion in an assisted writing setting:
>>> generations = co.generate(f"Write me a polite email responding to the one below: {email}. Response:")
>>> if user_accepted_suggestion:
>>>     co.generate_feedback(request_id=generations[0].id, good_response=True)
The user edits the model’s suggestion:
>>> generations = co.generate(f"Write me a polite email responding to the one below: {email}. Response:")
>>> if user_edits_suggestion:
>>>     co.generate_feedback(request_id=generations[0].id, good_response=False, desired_response=user_edited_suggestion)
generate_preference_feedback(ratings: List[PreferenceRating], model=None, prompt: str | None = None, annotator_id: str | None = None) GeneratePreferenceFeedbackResponse

Give preference feedback on a response from the Cohere Generate API to improve the model.

Parameters:
  • ratings (List[PreferenceRating]) – A list of PreferenceRating objects.

  • model (str) – (Optional) ID of the model.

  • prompt (str) – (Optional) The prompt used to generate the response.

  • annotator_id (str) – (Optional) The ID of the annotator.

Examples

A user accepts a model’s suggestion in an assisted writing setting, and prefers it to a second suggestion: >>> generations = co.generate(f”Write me a polite email responding to the one below: {email}. Response:”, num_generations=2) >>> if user_accepted_idx: // prompt user for which generation they prefer >>> ratings = [] >>> if user_accepted_idx == 0: >>> ratings.append(PreferenceRating(request_id=0, rating=1)) >>> ratings.append(PreferenceRating(request_id=1, rating=0)) >>> else: >>> ratings.append(PreferenceRating(request_id=0, rating=0)) >>> ratings.append(PreferenceRating(request_id=1, rating=1)) >>> co.generate_preference_feedback(ratings=ratings)

rerank(query: str, documents: List[str] | List[Dict[str, Any]], model: str, top_n: int | None = None, max_chunks_per_doc: int | None = None) Reranking

Returns an ordered list of documents ordered by their relevance to the provided query

Parameters:
  • query (str) – The search query

  • documents (list[str], list[dict]) – The documents to rerank

  • model (str) – The model to use for re-ranking

  • top_n (int) – (optional) The number of results to return, defaults to returning all results

  • max_chunks_per_doc (int) – (optional) The maximum number of chunks derived from a document

create_dataset(name: str, data: BinaryIO, dataset_type: str, eval_data: BinaryIO | None = None, keep_fields: str | List[str] | None = None, optional_fields: str | List[str] | None = None, parse_info: ParseInfo | None = None) Dataset

Returns a Dataset given input data

Parameters:
  • name (str) – The name of your dataset

  • data (BinaryIO) – The data to be uploaded and validated

  • dataset_type (str) – The type of dataset you want to upload

  • eval_data (BinaryIO) – (optional) If the dataset type supports it upload evaluation data

  • keep_fields (Union[str, List[str]]) – (optional) A list of fields you want to keep in the dataset that are required

  • optional_fields (Union[str, List[str]]) – (optional) A list of fields you want to keep in the dataset that are optional

  • parse_info – ParseInfo: (optional) information on how to parse the raw data

Returns:

Dataset object.

Return type:

Dataset

get_dataset(id: str) Dataset

Returns a Dataset given a dataset id

Parameters:

id (str) – The name of id of your dataset

Returns:

Dataset object.

Return type:

Dataset

list_datasets(dataset_type: str | None = None, limit: int | None = None, offset: int | None = None) List[Dataset]

Returns a list of your Datasets

Parameters:
  • dataset_type (str) – (optional) The dataset_type to filter on

  • limit (int) – (optional) The max number of datasets to return

  • offset (int) – (optional) The number of datasets to offset by

Returns:

List of Dataset objects.

Return type:

List[Dataset]

delete_dataset(id: str) None

Deletes your dataset

Parameters:

id (str) – The id of the dataset to delete

get_dataset_usage() DatasetUsage

Gets your total storage used in datasets

Returns:

Object containg current dataset usage

Return type:

DatasetUsage

wait_for_dataset(dataset_id: str, timeout: float | None = None, interval: float = 10) Dataset

Wait for Dataset validation result.

Parameters:
  • dataset_id (str) – Dataset id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

Dataset object.

Return type:

Dataset

create_cluster_job(input_dataset_id: str | None = None, embeddings_url: str | None = None, min_cluster_size: int | None = None, n_neighbors: int | None = None, is_deterministic: bool | None = None, generate_descriptions: bool | None = None) ClusterJobResult

Create clustering job.

Parameters:
  • input_dataset_id (str) – Id of the dataset to cluster.

  • embeddings_url (str) – File with embeddings to cluster.

  • min_cluster_size (Optional[int], optional) – Minimum number of elements in a cluster. Defaults to 10.

  • n_neighbors (Optional[int], optional) – Number of nearest neighbors used by UMAP to establish the local structure of the data. Defaults to 15. For more information, please refer to https://umap-learn.readthedocs.io/en/latest/parameters.html#n-neighbors

  • is_deterministic (Optional[bool], optional) – Determines whether the output of the cluster job is deterministic. Defaults to True.

  • generate_descriptions (Optional[bool], optional) – Determines whether to generate cluster descriptions. Defaults to False.

Returns:

Created clustering job

Return type:

ClusterJobResult

get_cluster_job(job_id: str) ClusterJobResult

Get clustering job results.

Parameters:

job_id (str) – Clustering job id.

Raises:

ValueError – “job_id” is empty

Returns:

Clustering job result.

Return type:

ClusterJobResult

list_cluster_jobs() List[ClusterJobResult]

List clustering jobs.

Returns:

Clustering jobs created.

Return type:

List[ClusterJobResult]

wait_for_cluster_job(job_id: str, timeout: float | None = None, interval: float = 10) ClusterJobResult

Wait for clustering job result.

Parameters:
  • job_id (str) – Clustering job id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

Clustering job result.

Return type:

ClusterJobResult

create_embed_job(dataset_id: str, model: str, input_type: str, name: str | None = None, truncate: str | None = None, embedding_types: List[str] | None = None) EmbedJob

Create embed job.

Parameters:
  • dataset_id (str) – ID of the dataset to embed.

  • model (str) – ID of the model to use for embedding the text.

  • input_type (str) – One of “classification”, “clustering”, “search_document”, “search_query”. The type of input text provided to embed.

  • name (Optional[str], optional) – The name of the embed job. Defaults to None.

  • truncate (Optional[str], optional) – How the API handles text longer than the maximum token length. Defaults to None.

  • embedding_types (List[str]) – (Optional) Specifies the types of embeddings you want to get back. Not required and default is None, which returns the float embeddings in the response’s embeddings field. Can be one or more of the following types: “float”, “int8”, “uint8”, “binary”, “ubinary”.

Returns:

The created embed job

Return type:

EmbedJob

list_embed_jobs() List[EmbedJob]

List embed jobs.

Returns:

Embed jobs.

Return type:

List[EmbedJob]

get_embed_job(job_id: str) EmbedJob

Get embed job.

Parameters:

job_id (str) – Embed job id.

Raises:

ValueError – “job_id” is empty

Returns:

Embed job.

Return type:

EmbedJob

cancel_embed_job(job_id: str) None

Cancel embed job.

Parameters:

job_id (str) – Embed job id.

Raises:

ValueError – “job_id” is empty

wait_for_embed_job(job_id: str, timeout: float | None = None, interval: float = 10) EmbedJob

Wait for embed job completion.

Parameters:
  • job_id (str) – Embed job id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

Embed job.

Return type:

EmbedJob

create_custom_model(name: str, model_type: Literal['GENERATIVE', 'CLASSIFY', 'RERANK', 'CHAT'], dataset: Dataset | str, base_model: str | None = None, hyperparameters: HyperParametersInput | None = None) CustomModel

Create a new custom model

Parameters:
  • name (str) – name of your custom model, has to be unique across your organization

  • model_type (GENERATIVE, CLASSIFY, RERANK) – type of custom model

  • dataset (Dataset, str) – A dataset or dataset id for your training.

  • base_model (str) –

    base model to use for your custom model. For generative and classify models, base_model has to be None (no option available for now) For rerank models, you can choose between english and multilingual. Defaults to english if not specified.

    The English model is better for English, while the multilingual model should be picked if a non-negligible part of queries/documents will be in other languages

  • hyperparameters (HyperParametersInput) – adjust hyperparameters for your custom model. Only for generative custom models.

Returns:

the custom model that was created

Return type:

CustomModel

Examples

prompt completion custom model with dataset
>>> co = cohere.Client("YOUR_API_KEY")
>>> ds = co.create_dataset(name="prompt-completion-datset", data=open("/path/to/your/file.csv", "rb"), dataset_type="prompt-completion-finetune-input")
>>> ds.await_validation()
>>> co.create_custom_model("prompt-completion-ft", model_type="GENERATIVE", train_dataset=ds.id)
classification custom model with train and evaluation data
>>> co = cohere.Client("YOUR_API_KEY")
>>> ds = co.create_dataset(name="classify-datset", data=open("train_file.csv", "rb"), eval_data=open("eval_file", "rb"), dataset_type="single-label-classification-finetune-input")
>>> ds.await_validation()
>>> co.create_custom_model("classify-ft", model_type="CLASSIFY", train_dataset=ds.id)
wait_for_custom_model(custom_model_id: str, timeout: float | None = None, interval: float = 60) CustomModel

Wait for custom model training completion.

Parameters:
  • custom_model_id (str) – Custom model id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

Custom model.

Return type:

BulkEmbedJob

get_custom_model(custom_model_id: str) CustomModel

Get a custom model by id.

Parameters:

custom_model_id (str) – custom model id

Returns:

the custom model

Return type:

CustomModel

get_custom_model_by_name(name: str) CustomModel

Get a custom model by name.

Parameters:

name (str) – custom model name

Returns:

the custom model

Return type:

CustomModel

get_custom_model_metrics(custom_model_id: str) List[ModelMetric]

Get a custom model’s training metrics by id

Parameters:

custom_model_id (str) – custom model id

Returns:

a list of model metrics

Return type:

List[ModelMetric]

list_custom_models(statuses: List[Literal['UNKNOWN', 'CREATED', 'TRAINING', 'DEPLOYING', 'READY', 'FAILED', 'DELETED', 'TEMPORARILY_OFFLINE', 'PAUSED', 'QUEUED']] | None = None, before: datetime | None = None, after: datetime | None = None, order_by: Literal['asc', 'desc'] | None = None) List[CustomModel]

List custom models of your organization. Limit is 50.

Parameters:
  • statuses (CUSTOM_MODEL_STATUS, optional) – search for fintunes which are in one of these states

  • before (datetime, optional) – search for custom models that were created before this timestamp

  • after (datetime, optional) – search for custom models that were created after this timestamp

  • order_by (Literal["asc", "desc"], optional) – sort custom models by created at, either asc or desc

Returns:

a list of custom models.

Return type:

List[CustomModel]

create_connector(name: str, url: str, active: bool = True, continue_on_failure: bool = False, excludes: List[str] | None = None, oauth: dict | None = None, service_auth: dict | None = None) Connector

Creates a Connector with the provided information

Parameters:
  • name (str) – The name of your connector

  • url (str) – The URL of the connector that will be used to search for documents

  • active (bool) – (optional) Whether the connector is active or not

  • continue_on_failure (bool) – (optional) Whether a chat request should continue or not if the request to this connector fails

  • excludes (List[str]) – (optional) A list of fields to exclude from the prompt (fields remain in the document)

  • oauth (dict) – (optional) The OAuth 2.0 configuration for the connector.

  • service_auth – (dict): (optional) The service to service authentication configuration for the connector

Returns:

Connector object.

Return type:

Connector

update_connector(id: str, name: str | None = None, url: str | None = None, active: bool | None = None, continue_on_failure: bool | None = None, excludes: List[str] | None = None, oauth: dict | None = None, service_auth: dict | None = None) Connector

Updates a Connector with the provided id

Parameters:
  • id (str) – The ID of the connector you wish to update.

  • name (str) – (optional) The name of your connector

  • url (str) – (optional) The URL of the connector that will be used to search for documents

  • active (bool) – (optional) Whether the connector is active or not

  • continue_on_failure (bool) – (optional) Whether a chat request should continue or not if the request to this connector fails

  • excludes (List[str]) – (optional) A list of fields to exclude from the prompt (fields remain in the document)

  • oauth (dict) – (optional) The OAuth 2.0 configuration for the connector.

  • service_auth – (dict): (optional) The service to service authentication configuration for the connector

Returns:

Connector object.

Return type:

Connector

get_connector(id: str) Connector

Returns a Connector given an id

Parameters:

id (str) – The id of your connector

Returns:

Connector object.

Return type:

Connector

list_connectors(limit: int | None = None, offset: int | None = None) List[Connector]

Returns a list of your Connectors

Parameters:
  • limit (int) – (optional) The max number of connectors to return

  • offset (int) – (optional) The number of connectors to offset by

Returns:

List of Connector objects.

Return type:

List[Connector]

delete_connector(id: str) None

Deletes a Connector given an id

Parameters:

id (str) – The id of your connector

oauth_authorize_connector(id: str, after_token_redirect: str | None = None) str

Returns a URL which when navigated to will start the OAuth 2.0 flow.

Parameters:

id (str) – The id of your connector

Returns:

A URL that starts the OAuth 2.0 flow.

Return type:

str

AsyncClient

class cohere.client_async.AsyncClient(api_key: str | None = None, num_workers: int = 16, request_dict: dict = {}, check_api_key: bool = True, client_name: str | None = None, max_retries: int = 3, timeout=120, api_url: str | None = None)

This client provides an asyncio/aiohttp interface. Using this client is recommended when you are making highly parallel request, or when calling the Cohere API from a server such as FastAPI.

async check_api_key() Dict[str, bool]

check_api_key raises an exception when the key is invalid, but the return value for valid keys is kept for backwards compatibility.

async loglikelihood(prompt: str | None = None, completion: str | None = None, model: str | None = None) LogLikelihoods

Calculates the token log-likelihood for a provided prompt and completion. Using this endpoint instead of co.generate with max_tokens=0 will guarantee that any required tokens such as <EOP_TOKEN> are correctly inserted, and makes it easier to retrieve only the completion log-likelihood.

Parameters:
  • prompt (str) – The prompt

  • completion (str) – (Optional) The completion

  • model (str) – (Optional) The model to use for calculating the log-likelihoods

async batch_generate(prompts: List[str], return_exceptions=False, **kwargs) List[Generations | Exception]

A batched version of generate with multiple prompts.

Parameters:
  • prompts – list of prompts

  • return_exceptions (bool) – Return exceptions as list items rather than raise them. Ensures your entire batch is not lost on one of the items failing.

  • kwargs – other arguments to generate

async generate(prompt: str | None = None, prompt_vars: object = {}, model: str | None = None, preset: str | None = None, num_generations: int | None = None, max_tokens: int | None = None, temperature: float | None = None, k: int | None = None, p: float | None = None, frequency_penalty: float | None = None, presence_penalty: float | None = None, end_sequences: List[str] | None = None, stop_sequences: List[str] | None = None, return_likelihoods: str | None = None, truncate: str | None = None, stream: bool = False) Generations | StreamingGenerations

Generate endpoint. See https://docs.cohere.ai/reference/generate for advanced arguments

Parameters:
  • prompt (str) – Represents the prompt or text to be completed. Trailing whitespaces will be trimmed.

  • model (str) – (Optional) The model ID to use for generating the next reply.

  • return_likelihoods (str) – (Optional) One of GENERATION|ALL|NONE to specify how and if the token (log) likelihoods are returned with the response.

  • preset (str) – (Optional) The ID of a custom playground preset.

  • num_generations (int) – (Optional) The number of generations that will be returned, defaults to 1.

  • max_tokens (int) – (Optional) The number of tokens to predict per generation, defaults to 20.

  • temperature (float) – (Optional) The degree of randomness in generations from 0.0 to 5.0, lower is less random.

  • truncate (str) – (Optional) One of NONE|START|END, defaults to END. How the API handles text longer than the maximum token length.

  • stream (bool) – Return streaming tokens.

Returns:

a Generations object if stream=True: a StreamingGenerations object including:

id (str): The id of the whole generation call generations (Generations): same as the response when stream=False finish_reason (string) possible values:

COMPLETE: when the stream successfully completed ERROR: when an error occurred during streaming ERROR_TOXIC: when the stream was halted due to toxic output. ERROR_LIMIT: when the context is too big to generate. USER_CANCEL: when the user has closed the stream / cancelled the request MAX_TOKENS: when the max tokens limit was reached.

texts (List[str]): list of segments of text streamed back from the API

Return type:

if stream=False

Examples

A simple generate message:
>>> res = co.generate(prompt="Hey! How are you doing today?")
>>> print(res.text)
Streaming generate:
>>> res = co.generate(
>>>     prompt="Hey! How are you doing today?",
>>>     stream=True)
>>> for token in res:
>>>     print(token)
async chat(message: str | None = None, conversation_id: str | None = '', model: str | None = None, return_chat_history: bool | None = False, return_prompt: bool | None = False, return_preamble: bool | None = False, chat_history: List[Dict[str, str]] | None = None, preamble_override: str | None = None, user_name: str | None = None, temperature: float | None = 0.8, max_tokens: int | None = None, stream: bool | None = False, p: float | None = None, k: float | None = None, search_queries_only: bool | None = None, documents: List[Dict[str, Any]] | None = None, citation_quality: str | None = None, prompt_truncation: str | None = None, connectors: List[Dict[str, Any]] | None = None) AsyncChat | StreamingChat

Returns a Chat object with the query reply.

Parameters:
  • message (str) – The message to send to the chatbot.

  • stream (bool) – Return streaming tokens.

  • conversation_id (str) – (Optional) To store a conversation then create a conversation id and use it for every related request.

  • preamble_override (str) – (Optional) A string to override the preamble.

  • chat_history (List[Dict[str, str]]) – (Optional) A list of entries used to construct the conversation. If provided, these messages will be used to build the prompt and the conversation_id will be ignored so no data will be stored to maintain state.

  • model (str) – (Optional) The model to use for generating the response.

  • temperature (float) – (Optional) The temperature to use for the response. The higher the temperature, the more random the response.

  • p (float) – (Optional) The nucleus sampling probability.

  • k (float) – (Optional) The top-k sampling probability.

  • max_tokens (int) – (Optional) The max tokens generated for the next reply.

  • return_chat_history (bool) – (Optional) Whether to return the chat history.

  • return_prompt (bool) – (Optional) Whether to return the prompt.

  • return_preamble (bool) – (Optional) Whether to return the preamble.

  • user_name (str) – (Optional) A string to override the username.

  • search_queries_only (bool) – (Optional) When true, the response will only contain a list of generated search queries, but no search will take place, and no reply from the model to the user’s message will be generated.

  • documents (List[Dict[str, str]]) –

    (Optional) Documents to use to generate grounded response with citations. Example: documents=[

    {

    “id”: “national_geographic_everest”, “title”: “Height of Mount Everest”, “snippet”: “The height of Mount Everest is 29,035 feet”, “url”: “https://education.nationalgeographic.org/resource/mount-everest/”,

    }, {

    ”id”: “national_geographic_mariana”, “title”: “Depth of the Mariana Trench”, “snippet”: “The depth of the Mariana Trench is 36,070 feet”, “url”: “https://www.nationalgeographic.org/activity/mariana-trench-deepest-place-earth”,

    },

    ],

  • connectors (List[Dict[str, str]]) – (Optional) When specified, the model’s reply will be enriched with information found by quering each of the connectors (RAG). Example: connectors=[{“id”: “web-search”}]

  • citation_quality (str) – (Optional) Dictates the approach taken to generating citations by allowing the user to specify whether they want “accurate” results or “fast” results. Defaults to “accurate”.

  • prompt_truncation (str) – (Optional) Dictates how the prompt will be constructed. With prompt_truncation set to “AUTO”, some elements from chat_history and documents will be dropped in attempt to construct a prompt that fits within the model’s context length limit. With prompt_truncation set to “OFF”, no elements will be dropped. If the sum of the inputs exceeds the model’s context length limit, a TooManyTokens error will be returned.

Returns:

a Chat object if stream=False, or a StreamingChat object if stream=True

Examples

A simple chat message:
>>> res = co.chat(message="Hey! How are you doing today?")
>>> print(res.text)
Continuing a session using a specific model:
>>> res = co.chat(
>>>     message="Hey! How are you doing today?",
>>>     conversation_id="1234",
>>>     model="command",
>>>     return_chat_history=True)
>>> print(res.text)
>>> print(res.chat_history)
Streaming chat:
>>> res = co.chat(
>>>     message="Hey! How are you doing today?",
>>>     stream=True)
>>> for token in res:
>>>     print(token)
Stateless chat with chat history:
>>> res = co.chat(
>>>     message="Tell me a joke!",
>>>     chat_history=[
>>>         {'role': 'User', message': 'Hey! How are you doing today?'},
>>>         {'role': 'Chatbot', message': 'I am doing great! How can I help you?'},
>>>     ],
>>>     return_prompt=True)
>>> print(res.text)
>>> print(res.prompt)
Chat message with documents to use to generate the response:
>>> res = co.chat(
>>>     "How deep in the Mariana Trench",
>>>     documents=[
>>>         {
>>>            "id": "national_geographic_everest",
>>>            "title": "Height of Mount Everest",
>>>            "snippet": "The height of Mount Everest is 29,035 feet",
>>>            "url": "https://education.nationalgeographic.org/resource/mount-everest/",
>>>         },
>>>         {
>>>             "id": "national_geographic_mariana",
>>>             "title": "Depth of the Mariana Trench",
>>>             "snippet": "The depth of the Mariana Trench is 36,070 feet",
>>>             "url": "https://www.nationalgeographic.org/activity/mariana-trench-deepest-place-earth",
>>>         },
>>>       ])
>>> print(res.text)
>>> print(res.citations)
>>> print(res.documents)
Chat message with connector to query and use the results to generate the response:
>>> res = co.chat(
>>>     "What is the height of Mount Everest?",
>>>      connectors=[{"id": "web-search"})
>>> print(res.text)
>>> print(res.citations)
>>> print(res.documents)
Generate search queries for fetching documents to use in chat:
>>> res = co.chat(
>>>     "What is the height of Mount Everest?",
>>>      search_queries_only=True)
>>> if res.is_search_required:
>>>      print(res.search_queries)
async embed(texts: List[str], model: str | None = None, truncate: str | None = None, input_type: str | None = None, embedding_types: List[str] | None = None) Embeddings

Returns an Embeddings object for the provided texts. Visit https://cohere.ai/embed to learn about embeddings.

Parameters:
  • text (List[str]) – A list of strings to embed.

  • model (str) – (Optional) The model ID to use for embedding the text.

  • truncate (str) – (Optional) One of NONE|START|END, defaults to END. How the API handles text longer than the maximum token length.

  • input_type (str) – (Optional) One of “classification”, “clustering”, “search_document”, “search_query”. The type of input text provided to embed.

  • embedding_types (List[str]) – (Optional) Specifies the types of embeddings you want to get back. Not required and default is None, which returns the float embeddings in the response’s embeddings field. Can be one or more of the following types: “float”, “int8”, “uint8”, “binary”, “ubinary”.

async codebook(model: str | None = None, compression_codebook: str | None = 'default') Codebook

Returns a codebook object for the provided model. Visit https://cohere.ai/embed to learn about compressed embeddings and codebooks.

Parameters:
  • model (str) – (Optional) The model ID to use for embedding the text.

  • compression_codebook (str) – (Optional) The compression codebook to use for compressed embeddings. Defaults to “default”.

async classify(inputs: List[str] = [], model: str | None = None, preset: str | None = None, examples: List[Example] = [], truncate: str | None = None) Classifications

Returns a Classifications object of the inputs provided, see https://docs.cohere.ai/reference/classify for advances usage.

Parameters:
  • inputs (List[str]) – A list of texts to classify.

  • model (str) – (Optional) The model ID to use for classifing the inputs.

  • examples (List[ClassifyExample]) – A list of ClassifyExample objects containing a text and its associated label.

  • truncate (str) – (Optional) One of NONE|START|END, defaults to END. How the API handles text longer than the maximum token length.

async summarize(text: str, model: str | None = None, length: str | None = None, format: str | None = None, temperature: float | None = None, additional_command: str | None = None, extractiveness: str | None = None) SummarizeResponse

Returns a generated summary of the specified length for the provided text.

Parameters:
  • text (str) – Text to summarize.

  • model (str) – (Optional) ID of the model.

  • length (str) – (Optional) One of {“short”, “medium”, “long”}, defaults to “medium”. Controls the length of the summary.

  • format (str) – (Optional) One of {“paragraph”, “bullets”}, defaults to “paragraph”. Controls the format of the summary.

  • extractiveness (str) –

  • temperature (float) – Ranges from 0 to 5. Controls the randomness of the output. Lower values tend to generate more “predictable” output, while higher values tend to generate more “creative” output. The sweet spot is typically between 0 and 1.

  • additional_command (str) – (Optional) Modifier for the underlying prompt, must complete the sentence “Generate a summary _”.

Examples

Summarize a text:
>>> res = co.summarize(text="Stock market report for today...")
>>> print(res.summary)
Summarize a text with a specific model and prompt:
>>> res = co.summarize(
>>>     text="Stock market report for today...",
>>>     model="summarize-xlarge",
>>>     length="long",
>>>     format="bullets",
>>>     temperature=0.3,
>>>     additional_command="focusing on the highest performing stocks")
>>> print(res.summary)
async batch_tokenize(texts: List[str], return_exceptions=False, **kwargs) List[Tokens | Exception]

A batched version of tokenize.

Parameters:
  • texts – list of texts

  • return_exceptions (bool) – Return exceptions as list items rather than raise them. Ensures your entire batch is not lost on one of the items failing.

  • kwargs – other arguments to tokenize

async tokenize(text: str, model: str | None = None) Tokens

Returns a Tokens object of the provided text, see https://docs.cohere.ai/reference/tokenize for advanced usage.

Parameters:
  • text (str) – Text to summarize.

  • model (str) – An optional model name that will ensure that the tokenization uses the tokenizer used by that model, which can be critical for counting tokens properly.

async batch_detokenize(list_of_tokens: List[List[int]], return_exceptions=False, **kwargs) List[Detokenization | Exception]

A batched version of detokenize.

Parameters:
  • list_of_tokens – list of list of tokens

  • return_exceptions (bool) – Return exceptions as list items rather than raise them. Ensures your entire batch is not lost on one of the items failing.

  • kwargs – other arguments to detokenize

async detokenize(tokens: List[int], model: str | None = None) Detokenization

Returns a Detokenization object of the provided tokens, see https://docs.cohere.ai/reference/detokenize for advanced usage.

Parameters:
  • tokens (List[int]) – A list of tokens to convert to strings

  • model (str) – An optional model name. This will ensure that the detokenization is done by the tokenizer used by that model.

async detect_language(texts: List[str]) DetectLanguageResponse

This API is deprecated.

async generate_feedback(request_id: str, good_response: bool, model=None, desired_response: str | None = None, flagged_response: bool | None = None, flagged_reason: str | None = None, prompt: str | None = None, annotator_id: str | None = None) GenerateFeedbackResponse

Give feedback on a response from the Cohere Generate API to improve the model.

Parameters:
  • request_id (str) – The request_id of the generation request to give feedback on.

  • good_response (bool) – Whether the response was good or not.

  • model (str) – (Optional) ID of the model.

  • desired_response (str) – (Optional) The desired response.

  • flagged_response (bool) – (Optional) Whether the response was flagged or not.

  • flagged_reason (str) – (Optional) The reason the response was flagged.

  • prompt (str) – (Optional) The prompt used to generate the response.

  • annotator_id (str) – (Optional) The ID of the annotator.

Examples

A user accepts a model’s suggestion in an assisted writing setting:
>>> generations = co.generate(f"Write me a polite email responding to the one below: {email}. Response:")
>>> if user_accepted_suggestion:
>>>     co.generate_feedback(request_id=generations[0].id, good_response=True)
The user edits the model’s suggestion:
>>> generations = co.generate(f"Write me a polite email responding to the one below: {email}. Response:")
>>> if user_edits_suggestion:
>>>     co.generate_feedback(request_id=generations[0].id, good_response=False, desired_response=user_edited_suggestion)
async generate_preference_feedback(ratings: List[PreferenceRating], model=None, prompt: str | None = None, annotator_id: str | None = None) GeneratePreferenceFeedbackResponse

Give preference feedback on a response from the Cohere Generate API to improve the model.

Parameters:
  • ratings (List[PreferenceRating]) – A list of PreferenceRating objects.

  • model (str) – (Optional) ID of the model.

  • prompt (str) – (Optional) The prompt used to generate the response.

  • annotator_id (str) – (Optional) The ID of the annotator.

Examples

A user accepts a model’s suggestion in an assisted writing setting, and prefers it to a second suggestion: >>> generations = co.generate(f”Write me a polite email responding to the one below: {email}. Response:”, num_generations=2) >>> if user_accepted_idx: // prompt user for which generation they prefer >>> ratings = [] >>> if user_accepted_idx == 0: >>> ratings.append(PreferenceRating(request_id=0, rating=1)) >>> ratings.append(PreferenceRating(request_id=1, rating=0)) >>> else: >>> ratings.append(PreferenceRating(request_id=0, rating=0)) >>> ratings.append(PreferenceRating(request_id=1, rating=1)) >>> co.generate_preference_feedback(ratings=ratings)

async rerank(query: str, documents: List[str] | List[Dict[str, Any]], model: str, top_n: int | None = None, max_chunks_per_doc: int | None = None) Reranking

Returns an ordered list of documents ordered by their relevance to the provided query

Parameters:
  • query (str) – The search query

  • documents (list[str], list[dict]) – The documents to rerank

  • model (str) – The model to use for re-ranking

  • top_n (int) – (optional) The number of results to return, defaults to returning all results

  • max_chunks_per_doc (int) – (optional) The maximum number of chunks derived from a document

async create_dataset(name: str, data: BinaryIO, dataset_type: str, eval_data: BinaryIO | None = None, keep_fields: str | List[str] | None = None, optional_fields: str | List[str] | None = None, parse_info: ParseInfo | None = None) AsyncDataset

Returns a Dataset given input data

Parameters:
  • name (str) – The name of your dataset

  • data (BinaryIO) – The data to be uploaded and validated

  • dataset_type (str) – The type of dataset you want to upload

  • eval_data (BinaryIO) – (optional) If the dataset type supports it upload evaluation data

  • keep_fields (Union[str, List[str]]) – (optional) A list of fields you want to keep in the dataset that are required

  • optional_fields (Union[str, List[str]]) – (optional) A list of fields you want to keep in the dataset that are optional

  • parse_info – ParseInfo: (optional) information on how to parse the raw data

Returns:

Dataset object.

Return type:

AsyncDataset

async get_dataset(id: str) AsyncDataset

Returns a Dataset given a dataset id

Parameters:

id (str) – The name of id of your dataset

Returns:

Dataset object.

Return type:

AsyncDataset

async list_datasets(dataset_type: str | None = None, limit: int | None = None, offset: int | None = None) List[AsyncDataset]

Returns a list of your Datasets

Parameters:
  • dataset_type (str) – (optional) The dataset_type to filter on

  • limit (int) – (optional) The max number of datasets to return

  • offset (int) – (optional) The number of datasets to offset by

Returns:

List of Dataset objects.

Return type:

List[AsyncDataset]

async delete_dataset(id: str) None

Deletes your dataset

Parameters:

id (str) – The id of the dataset to delete

async get_dataset_usage() DatasetUsage

Gets your total storage used in datasets

Returns:

Object containg current dataset usage

Return type:

DatasetUsage

async wait_for_dataset(dataset_id: str, timeout: float | None = None, interval: float = 10) AsyncDataset

Wait for Dataset validation result.

Parameters:
  • dataset_id (str) – Dataset id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

Dataset object.

Return type:

AsyncDataset

async create_cluster_job(input_dataset_id: str | None = None, embeddings_url: str | None = None, min_cluster_size: int | None = None, n_neighbors: int | None = None, is_deterministic: bool | None = None, generate_descriptions: bool | None = None) AsyncClusterJobResult

Create clustering job.

Parameters:
  • input_dataset_id (str) – Id of the dataset to cluster.

  • embeddings_url (str) – File with embeddings to cluster.

  • min_cluster_size (Optional[int], optional) – Minimum number of elements in a cluster. Defaults to 10.

  • n_neighbors (Optional[int], optional) – Number of nearest neighbors used by UMAP to establish the local structure of the data. Defaults to 15. For more information, please refer to https://umap-learn.readthedocs.io/en/latest/parameters.html#n-neighbors

  • is_deterministic (Optional[bool], optional) – Determines whether the output of the cluster job is deterministic. Defaults to True.

  • generate_descriptions (Optional[bool], optional) – Determines whether to generate cluster descriptions. Defaults to False.

Returns:

Created clustering job

Return type:

AsyncClusterJobResult

async get_cluster_job(job_id: str) ClusterJobResult

Get clustering job results.

Parameters:

job_id (str) – Clustering job id.

Raises:

ValueError – “job_id” is empty

Returns:

Clustering job result.

Return type:

ClusterJobResult

async list_cluster_jobs() List[ClusterJobResult]

List clustering jobs.

Returns:

Clustering jobs created.

Return type:

List[ClusterJobResult]

async wait_for_cluster_job(job_id: str, timeout: float | None = None, interval: float = 10) ClusterJobResult

Wait for clustering job result.

Parameters:
  • job_id (str) – Clustering job id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

Clustering job result.

Return type:

ClusterJobResult

async create_embed_job(dataset_id: str, model: str, input_type: str, name: str | None = None, truncate: str | None = None, embedding_types: List[str] | None = None) AsyncEmbedJob

Create embed job.

Parameters:
  • dataset_id (str) – dataset id with text to embed.

  • model (str) – The model ID to use for embedding the text.

  • input_type (str) – One of “classification”, “clustering”, “search_document”, “search_query”. The type of input text provided to embed.

  • truncate (Optional[str], optional) – How the API handles text longer than the maximum token length. Defaults to None.

  • name (Optional[str], optional) – The name of the embed job. Defaults to None.

  • embedding_types (List[str]) – (Optional) Specifies the types of embeddings you want to get back. Not required and default is None, which returns the float embeddings in the response’s embeddings field. Can be one or more of the following types: “float”, “int8”, “uint8”, “binary”, “ubinary”.

Returns:

The created embed job

Return type:

AsyncEmbedJob

async list_embed_jobs() List[AsyncEmbedJob]

List embed jobs.

Returns:

embed jobs.

Return type:

List[AsyncEmbedJob]

async get_embed_job(job_id: str) AsyncEmbedJob

Get embed job.

Parameters:

job_id (str) – embed job id.

Raises:

ValueError – “job_id” is empty

Returns:

embed job.

Return type:

AsyncEmbedJob

async cancel_embed_job(job_id: str) None

Cancel embed job.

Parameters:

job_id (str) – embed job id.

Raises:

ValueError – “job_id” is empty

async wait_for_embed_job(job_id: str, timeout: float | None = None, interval: float = 10) AsyncEmbedJob

Wait for embed job completion.

Parameters:
  • job_id (str) – embed job id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

embed job.

Return type:

AsyncEmbedJob

async create_custom_model(name: str, model_type: Literal['GENERATIVE', 'CLASSIFY', 'RERANK', 'CHAT'], dataset: Dataset | str, base_model: str | None = None, hyperparameters: HyperParametersInput | None = None) AsyncCustomModel

Create a new custom model

Parameters:
  • name (str) – name of your custom model, has to be unique across your organization

  • model_type (GENERATIVE, CLASSIFY, RERANK) – type of custom model

  • dataset (Dataset, str) – A dataset or dataset id for your training.

  • base_model (str) –

    base model to use for your custom model. For generative and classify models, base_model has to be None (no option available for now) For rerank models, you can choose between english and multilingual. Defaults to english if not specified.

    The English model is better for English, while the multilingual model should be picked if a non-negligible part of queries/documents will be in other languages

  • hyperparameters (HyperParametersInput) – adjust hyperparameters for your custom model. Only for generative custom models.

Returns:

the custom model that was created

Return type:

CustomModel

Examples

prompt completion custom model with dataset
>>> co = cohere.Client("YOUR_API_KEY")
>>> ds = co.create_dataset(name="prompt-completion-datset", data=open("/path/to/your/file.csv", "rb"), dataset_type="prompt-completion-finetune-input")
>>> ds.await_validation()
>>> co.create_custom_model("prompt-completion-ft", model_type="GENERATIVE", train_dataset=ds.id)
classification custom model with train and evaluation data
>>> co = cohere.Client("YOUR_API_KEY")
>>> ds = co.create_dataset(name="classify-datset", data=open("train_file.csv", "rb"), eval_data=open("eval_file", "rb"), dataset_type="single-label-classification-finetune-input")
>>> ds.await_validation()
>>> co.create_custom_model("classify-ft", model_type="CLASSIFY", train_dataset=ds.id)
async wait_for_custom_model(custom_model_id: str, timeout: float | None = None, interval: float = 60) AsyncCustomModel

Wait for custom model training completion.

Parameters:
  • custom_model_id (str) – Custom model id.

  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

Returns:

Custom model.

Return type:

AsyncCustomModel

async get_custom_model(custom_model_id: str) AsyncCustomModel

Get a custom model by id.

Parameters:

custom_model_id (str) – custom model id

Returns:

the custom model

Return type:

CustomModel

async get_custom_model_by_name(name: str) AsyncCustomModel

Get a custom model by name.

Parameters:

name (str) – custom model name

Returns:

the custom model

Return type:

CustomModel

async get_custom_model_metrics(custom_model_id: str) List[ModelMetric]

Get model metrics by id

Parameters:

custom_model_id (str) – custom model id

Returns:

a list of model metrics

Return type:

List[ModelMetric]

async list_custom_models(statuses: List[Literal['UNKNOWN', 'CREATED', 'TRAINING', 'DEPLOYING', 'READY', 'FAILED', 'DELETED', 'TEMPORARILY_OFFLINE', 'PAUSED', 'QUEUED']] | None = None, before: datetime | None = None, after: datetime | None = None, order_by: Literal['asc', 'desc'] | None = None) List[AsyncCustomModel]

List custom models of your organization.

Parameters:
  • statuses (CUSTOM_MODEL_STATUS, optional) – search for finetunes which are in one of these states

  • before (datetime, optional) – search for custom models that were created before this timestamp

  • after (datetime, optional) – search for custom models that were created after this timestamp

  • order_by (Literal["asc", "desc"], optional) – sort custom models by created at, either asc or desc

Returns:

a list of custom models.

Return type:

List[CustomModel]

async create_connector(name: str, url: str, active: bool = True, continue_on_failure: bool = False, excludes: List[str] | None = None, oauth: dict | None = None, service_auth: dict | None = None) Connector

Creates a Connector with the provided information

Parameters:
  • name (str) – The name of your connector

  • url (str) – The URL of the connector that will be used to search for documents

  • active (bool) – (optional) Whether the connector is active or not

  • continue_on_failure (bool) – (optional) Whether a chat request should continue or not if the request to this connector fails

  • excludes (List[str]) – (optional) A list of fields to exclude from the prompt (fields remain in the document)

  • oauth (dict) – (optional) The OAuth 2.0 configuration for the connector.

  • service_auth – (dict): (optional) The service to service authentication configuration for the connector

Returns:

Connector object.

Return type:

Connector

async update_connector(id: str, name: str | None = None, url: str | None = None, active: bool | None = None, continue_on_failure: bool | None = None, excludes: List[str] | None = None, oauth: dict | None = None, service_auth: dict | None = None) Connector

Updates a Connector with the provided id

Parameters:
  • id (str) – The ID of the connector you wish to update.

  • name (str) – (optional) The name of your connector

  • url (str) – (optional) The URL of the connector that will be used to search for documents

  • active (bool) – (optional) Whether the connector is active or not

  • continue_on_failure (bool) – (optional) Whether a chat request should continue or not if the request to this connector fails

  • excludes (List[str]) – (optional) A list of fields to exclude from the prompt (fields remain in the document)

  • oauth (dict) – (optional) The OAuth 2.0 configuration for the connector.

  • service_auth – (dict): (optional) The service to service authentication configuration for the connector

Returns:

Connector object.

Return type:

Connector

async get_connector(id: str) Connector

Returns a Connector given an id

Parameters:

id (str) – The id of your connector

Returns:

Connector object.

Return type:

Connector

async list_connectors(limit: int | None = None, offset: int | None = None) List[Connector]

Returns a list of your Connectors

Parameters:
  • limit (int) – (optional) The max number of connectors to return

  • offset (int) – (optional) The number of connectors to offset by

Returns:

List of Connector objects.

Return type:

List[Connector]

async delete_connector(id: str) None

Deletes a Connector given an id

Parameters:

id (str) – The id of your connector

async oauth_authorize_connector(id: str, after_token_redirect: str | None = None) str

Returns a URL which when navigated to will start the OAuth 2.0 flow.

Parameters:

id (str) – The id of your connector

Returns:

A URL that starts the OAuth 2.0 flow.

Return type:

str

API response objects

class cohere.responses.generation.TokenLikelihood(token, likelihood)
likelihood: float

Alias for field number 1

token: str

Alias for field number 0

class cohere.responses.generation.Generation(text: str, *_, **__)
class cohere.responses.generation.Generations(generations, return_likelihoods: str, meta: Dict[str, Any] | None = None)
property prompt: str

Returns the prompt used as input

class cohere.responses.generation.StreamingText(index, text, is_finished)
index: int | None

Alias for field number 0

is_finished: bool

Alias for field number 2

text: str

Alias for field number 1

class cohere.responses.classify.LabelPrediction(confidence)
confidence: float

Alias for field number 0

class cohere.responses.classify.Example(text, label)
label: str

Alias for field number 1

text: str

Alias for field number 0

class cohere.responses.feedback.GenerateFeedbackResponse(id)
id: str

Alias for field number 0

class cohere.responses.feedback.GeneratePreferenceFeedbackResponse(id)
id: str

Alias for field number 0

class cohere.responses.feedback.PreferenceRating(request_id: str, rating: float, generation: str)
cohere.responses.rerank.RerankDocument

alias of Document

class cohere.responses.summarize.SummarizeResponse(id: str, summary: str, meta: Dict[str, Any] | None)

Returned by co.summarize, which generates a summary of the specified length for the provided text.

Example: ` res = co.summarize(text="Stock market report for today...") print(res.summary) `

Example: ``` res = co.summarize(

text=”Stock market report for today…”, model=”summarize-xlarge”, length=”long”, format=”bullets”, temperature=0.3, additional_command=”focusing on the highest performing stocks”)

print(res.summary) ```

id: str

Alias for field number 0

meta: Dict[str, Any] | None

Alias for field number 2

summary: str

Alias for field number 1

class cohere.responses.chat.StreamEvent(value)

An enumeration.

class cohere.responses.cluster.BaseClusterJobResult(job_id: str, status: str, output_clusters_url: str | None, output_outliers_url: str | None, clusters: List[Cluster] | None, error: str | None, is_final_state: bool, meta: Dict[str, Any] | None = None, wait_fn=None)
class cohere.responses.cluster.ClusterJobResult(job_id: str, status: str, output_clusters_url: str | None, output_outliers_url: str | None, clusters: List[Cluster] | None, error: str | None, is_final_state: bool, meta: Dict[str, Any] | None = None, wait_fn=None)
wait(timeout: float | None = None, interval: float = 10) ClusterJobResult

Wait for cluster job completion and updates attributes once finished.

Parameters:
  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

class cohere.responses.cluster.AsyncClusterJobResult(job_id: str, status: str, output_clusters_url: str | None, output_outliers_url: str | None, clusters: List[Cluster] | None, error: str | None, is_final_state: bool, meta: Dict[str, Any] | None = None, wait_fn=None)
async wait(timeout: float | None = None, interval: float = 10) ClusterJobResult

Wait for cluster job completion and updates attributes once finished.

Parameters:
  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 10.

Raises:

TimeoutError – wait timed out

class cohere.responses.custom_model.HyperParameters(early_stopping_patience: int, early_stopping_threshold: float, train_batch_size: int, train_steps: int, train_epochs: int, learning_rate: float)
class cohere.responses.custom_model.HyperParametersInput

early_stopping_patience: int (default=6, min=0, max=10) early_stopping_threshold: float (default=0.01, min=0, max=0.1) train_batch_size: int (default=16, min=2, max=16) train_epochs: int (default=1, min=1, max=10) learning_rate: float (default=0.01, min=0.000005, max=0.1)

class cohere.responses.custom_model.FinetuneBilling(train_epochs: int, num_training_tokens: int, unit_price: float, total_cost: float)
class cohere.responses.custom_model.BaseCustomModel(wait_fn, id: str, name: str, status: Literal['UNKNOWN', 'CREATED', 'TRAINING', 'DEPLOYING', 'READY', 'FAILED', 'DELETED', 'TEMPORARILY_OFFLINE', 'PAUSED', 'QUEUED'], model_type: Literal['GENERATIVE', 'CLASSIFY', 'RERANK', 'CHAT'], created_at: datetime, completed_at: datetime | None, base_model: str | None = None, model_id: str | None = None, hyperparameters: HyperParameters | None = None, dataset_id: str | None = None, billing: FinetuneBilling | None = None)
class cohere.responses.custom_model.CustomModel(wait_fn, id: str, name: str, status: Literal['UNKNOWN', 'CREATED', 'TRAINING', 'DEPLOYING', 'READY', 'FAILED', 'DELETED', 'TEMPORARILY_OFFLINE', 'PAUSED', 'QUEUED'], model_type: Literal['GENERATIVE', 'CLASSIFY', 'RERANK', 'CHAT'], created_at: datetime, completed_at: datetime | None, base_model: str | None = None, model_id: str | None = None, hyperparameters: HyperParameters | None = None, dataset_id: str | None = None, billing: FinetuneBilling | None = None)
wait(timeout: float | None = None, interval: float = 60) CustomModel

Wait for custom model job completion.

Parameters:
  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 60.

Raises:

TimeoutError – wait timed out

Returns:

custom model.

Return type:

CustomModel

class cohere.responses.custom_model.AsyncCustomModel(wait_fn, id: str, name: str, status: Literal['UNKNOWN', 'CREATED', 'TRAINING', 'DEPLOYING', 'READY', 'FAILED', 'DELETED', 'TEMPORARILY_OFFLINE', 'PAUSED', 'QUEUED'], model_type: Literal['GENERATIVE', 'CLASSIFY', 'RERANK', 'CHAT'], created_at: datetime, completed_at: datetime | None, base_model: str | None = None, model_id: str | None = None, hyperparameters: HyperParameters | None = None, dataset_id: str | None = None, billing: FinetuneBilling | None = None)
async wait(timeout: float | None = None, interval: float = 60) CustomModel

Wait for custom model job completion.

Parameters:
  • timeout (Optional[float], optional) – Wait timeout in seconds, if None - there is no limit to the wait time. Defaults to None.

  • interval (float, optional) – Wait poll interval in seconds. Defaults to 60.

Raises:

TimeoutError – wait timed out

Returns:

custom model.

Return type:

CustomModel

class cohere.responses.custom_model.ModelMetric(created_at: datetime.datetime, step_num: int, loss: float | None = None, accuracy: float | None = None, f1: float | None = None, precision: float | None = None, recall: float | None = None)

CustomModelDataset

class cohere.custom_model_dataset.CustomModelDataset
class cohere.custom_model_dataset.CsvDataset(train_file: str, delimiter: str, eval_file: str | None = None, has_header: bool | None = False)

A dataset consisting of local csv files. Each row should contain two items. E.g.: for prompt completion: this is the prompt,and this the completion another prompt, another completion

class cohere.custom_model_dataset.JsonlDataset(train_file: str, eval_file: str | None = None)

A dataset consisting of local jsonl files. .. rubric:: Examples

prompt completion: {“prompt”: “this is the prompt”, “completion”: “this is the completion”}

class cohere.custom_model_dataset.TextDataset(train_file: str, separator: str | None, eval_file: Path | None = None)

A dataset consisting of local text files. Can be used for generative custom models.

class cohere.custom_model_dataset.InMemoryDataset(training_data: Iterable[Tuple[str, str]], eval_data: Iterable[Tuple[str, str]] | None = None)

A dataset existing in memory. You may pass a generator to avoid loading your whole dataset into memory at once.

Examples

>>> InMemoryDataset([("this is a prompt", "this is a completion")])
>>> InMemoryDataset([("example1", "label1"), ("example2", "label1"), ("example3", "label2")])

Exceptions

exception cohere.error.CohereError(message=None)

Base Exception class, returned when nothing more specific applies

exception cohere.error.CohereAPIError(message: str | None = None, http_status: int | None = None, headers: Dict | None = None)

Returned when the API responds with an error message

exception cohere.error.CohereConnectionError(message=None)

Returned when the SDK can not reach the API server for any reason