Skip to content

PDF Query Configuration

The pdf_query tool allows you to query information from PDF documents that have been indexed using LlamaIndex. Below, you will find the configuration and the code needed for this tool to work correctly.


  1. Tool Details
  • Name: pdf_query
  • Description: Queries information from PDF documents indexed with LlamaIndex.
  • Signature: query:str -> str
  • Configuration:
    json
    {
      "url":"url of pdf",
      "name":"reference name",
      "description":"description of pdf"
    }
  1. Add the Tool to the Agent
  • Go to your agent’s configuration section or the custom tool where you want to integrate this function.

  • Copy and paste the configuration structure shown above into the appropriate place.

  • Make sure to replace the values with your own (i.e., the URL of the PDF you want to index, the name you want to assign to it, and the appropriate description).


  1. Tool Code

Below is the code that allows you to download the PDF (if it does not exist) and create or load the index to perform queries:

python
import os
from urllib.request import urlretrieve
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core import SimpleDirectoryReader

# Retrieve configuration values
url = config['url']
name = config['name']
storage_path = f'./storage/{name}'
filename = f'{name}.pdf'

# Check if the storage directory exists
if os.path.exists(storage_path):
    # Load existing index
    storage_context = StorageContext.from_defaults(persist_dir=storage_path)
    index = load_index_from_storage(storage_context)
else:
    # Download the PDF if it does not exist
    if not os.path.exists(filename):
        urlretrieve(url, f'./{filename}')

    # Create the storage directory
    os.makedirs(storage_path, exist_ok=True)

    # Load the document
    docs = SimpleDirectoryReader(input_files=[f'./{filename}']).load_data()

    # Create the vector index
    index = VectorStoreIndex.from_documents(docs)

    # Persist the index
    index.storage_context.persist(persist_dir=storage_path)

# Create the query engine and run the query
engine = index.as_query_engine(similarity_top_k=3)
response = engine.query(query)

return str(response)

Example of what the configuration should look like:

An image

Do not change the configuration structure or the general flow of the code unless you understand the internal logic of the tool.

  1. How the Code Works
  • Download or use the PDF: Checks if the file is already stored locally; if not, it downloads it.
  • Create or load the index:
    • If the storage directory (storage_path) exists, it retrieves the saved index.
    • Otherwise, it creates a new vector index from the PDF.
  • Query the index: Creates a query engine (as_query_engine) and executes the search.
  • Return the response: The function return str(response) sends the resulting answer from the query back to the agent.

  1. Testing and Validation
  • Send a query: Once the tool is configured, you can ask something about the PDF’s content to verify its functionality.
  • Observe the response: The agent will return information based on the indexed content.
  • Ensure it is active: Verify that the tool appears as "Active" in your platform or agent.

With this, the pdf_query_custom tool is configured and ready to process queries about your PDF document.

Neuraan Licensed