Chat With Your PDF Using OpenAI Assistant API

Getting Started with OpenAI Assistant API

Nov 26, 2023

The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three tools: Code Interpreter, Retrieval, and Function calling. In the future, we plan to release more OpenAI-built tools and allow you to provide your own tools on our platform.

At a high level, a typical integration of the Assistants API has the following flow:

Create an Assistant in the API by defining its custom instructions and picking a model. If helpful, enable tools like Code Interpreter, Retrieval, and Function calling.
Create a Thread when a user starts a conversation.
Add Messages to the Thread as the user asks questions.
Run the Assistant on the Thread to trigger responses. This automatically calls the relevant tools.

In this article, we will use the Assistant API to chat with our PDF and ask questions to get its answers.

1. How does Assistant API Work?

Assistant is a purpose-built AI that uses OpenAI’s models and calls tools. A conversation session between an Assistant and a user is called a thread. Threads store Messages and automatically handle truncation to fit content into a model’s context.

A message created by an Assistant or a user can include text, images, and other files. Messages are stored as a list on the Thread.

The Assistant uses its configuration and the Thread’s Messages to perform tasks by calling models and tools. As part of a Run, the Assistant appends Messages to the Thread.

A detailed list of steps the Assistant took as part of a Run. An Assistant can call tools or create Messages during its run. Examining Run Steps allows you to introspect how the Assistant is getting to its final results.

The new assistant API has the following features:

Assistants can call OpenAI’s models with specific instructions to tune their personality and capabilities.
Assistants can access multiple tools in parallel. These can be both OpenAI-hosted tools — like Code interpreter and Knowledge retrieval — or tools you build / host (via Function calling).
Assistants can access persistent Threads. Threads simplify AI application development by storing message history and truncating it when the conversation gets too long for the model’s context length. You create a Thread once, and simply append Messages to it as your users reply.
Assistants can access Files in several formats — either as part of their creation or as part of Threads between Assistants and users. When using tools, Assistants can also create files (e.g., images, spreadsheets, etc) and cite files they reference in the Messages they create.

The tools that are currently available to be used are:

Code Interpreter: This tool allows an assistant to write and run Python code inside a sandboxed environment. It can also generate graphs and charts, and process a variety of files.
Retrieval: The Retrieval tool augments the assistant with knowledge from files that you provide.
Function Calling: Using function call, you can describe custom functions or external APIs to the assistant. The assistant can then call those functions by outputting a relevant JSON object with appropriate arguments.

2. Installing OpenAI

First, you install and upgrade the OpenAI package using the following code:

!pip install --upgrade openai

After installing the OpenAI package, we will load the API key and create an OpenAI client.

from openai import OpenAI
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

client = OpenAI(api_key=user_secrets.get_secret("openai_api"))

3. Upload Files to OpenAI

Let's start by loading the pdf file to the client using the files .create method.

file  =  client.files.create(
    file=open("file_path", 'rb'),
    purpose="assistants"

)

We can see the file ID and metadata using print as shown below:

print(file)

4. Create OpenAI Assistant

An Assistant represents an entity that can be configured to respond to users’ Messages using several parameters:

Instructions: how the Assistant and model should behave or respond
Model: you can specify any GPT-3.5 or GPT-4 models, including fine-tuned models. The Retrieval tool requires gpt-3.5-turbo-1106 and gpt-4–1106-preview models.
Tools: the API supports Code Interpreter and Retrieval and is built and hosted by OpenAI.
Functions: the API allows you to define custom function signatures, with similar behavior as our function calling feature.

Here we will call the assistant PDF Helper as we need it to answer questions from the given PDF. After that, we will give it the instruction to understand the task which is answering questions from the given PDF. We will define the tool as retrieval and we will choose the model to be gpt-4–1106-preview. Finally, we will pass the file ID to it.

assistant = client.beta.assistants.create(
    name = "PDF Helper",
    instructions = "You are my assistant who can answer questions from the given pdf",
    tools = [{"type":"retrieval"}],
    model = "gpt-4-1106-preview",
    file_ids = [file.id]
)

The OpenAI team recommends using OpenAI’s latest models with the Assistants API for the best results and maximum compatibility with tools.

5. Create a Thread

A Thread represents a conversation. We recommend creating one Thread per user as soon as the user initiates the conversation. Pass any user-specific context and files in this thread by creating Messages.

thread = client.beta.threads.create()
print(thread)

Threads don’t have a size limit. You can add as many messages as you want to a Thread. The Assistant will ensure that requests to the model fit within the maximum context window, using relevant optimization techniques such as truncation which we have tested extensively with ChatGPT.

When you use the Assistants API, you delegate control over how many input tokens are passed to the model for any given Run, this means you have less control over the cost of running your Assistant in some cases but do not have to deal with the complexity of managing the context window yourself.

6. Add a Message to a Thread

A Message contains text, and optionally any files that you allow the user to upload. Messages need to be added to a specific Thread. Adding images via message objects like in Chat Completions using GPT-4 with Vision is not supported today, but they plan to add support for them in the coming months. You can still upload images and have them processed via retrieval.

Here we will pass the role of the user and the content of the message as the questions we would like to ask to get the answers from the PDF file.

message = client.beta.threads.messages.create(
    thread_id = thread.id,
    role = "user",
    content = "Your Question?"

)

7. Run The Assistant

For the Assistant to respond to the user message, you need to create a Run. This makes the Assistant read the Thread and decide whether to call tools (if they are enabled) or simply use the model to best answer the query. As the run progresses, the assistant appends Messages to the thread with the role=” assistant”.

The Assistant will also automatically decide what previous Messages to include in the context window for the model. This has both an impact on pricing as well as model performance.

run = client.beta.threads.runs.create(
    thread_id = thread.id,
    assistant_id= assistant.id

)

8. Display The Assistants Response

A Run goes into the queued state. So we created a while loop to keep running till the status has moved to completed. Once the Run completes, you can list the Messages added to the Thread by the Assistant.

import time 

while True:
# Retrieve the run status
    run_status = client.beta.threads.runs.retrieve(thread_id=thread.id,run_id=run.id)
    time.sleep(10)
    if run_status.status == 'completed':
        messages = client.beta.threads.messages.list(thread_id=thread.id)
        break
    else:
        ### sleep again
        time.sleep(2)

And finally, we can display the message and the response by the assistant.

for message in reversed(messages.data):
  print(message.role + ":" + message.content[0].text.value)

9. Limitations

During this beta, there are several known limitations the OpenAI team is looking to address in the coming weeks and months. They will publish a changelog on this page when we add support for additional functionality.

Support for streaming output (including Messages and Run Steps).
Support for notifications to share object status updates without the need for polling.
Support for DALL·E as a tool.
Support for user message creation with images.

Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:

Mentoring sessions: https://lnkd.in/dXeg3KPW
Long-term mentoring: https://lnkd.in/dtdUYBrM

To Data & Beyond

Discussion about this post