Appearance
PDF Query Configuration
The pdf_query tool allows you to query information from PDF documents that have been indexed using LlamaIndex. Below, you will find the configuration and the code needed for this tool to work correctly.
- Tool Details
- Name:
pdf_query
- Description: Queries information from PDF documents indexed with LlamaIndex.
- Signature:
query:str -> str
- Configuration:json
{ "url":"url of pdf", "name":"reference name", "description":"description of pdf" }
- Add the Tool to the Agent
Go to your agent’s configuration section or the custom tool where you want to integrate this function.
Copy and paste the configuration structure shown above into the appropriate place.
Make sure to replace the values with your own (i.e., the URL of the PDF you want to index, the name you want to assign to it, and the appropriate description).
- Tool Code
Below is the code that allows you to download the PDF (if it does not exist) and create or load the index to perform queries:
python
import os
from urllib.request import urlretrieve
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core import SimpleDirectoryReader
# Retrieve configuration values
url = config['url']
name = config['name']
storage_path = f'./storage/{name}'
filename = f'{name}.pdf'
# Check if the storage directory exists
if os.path.exists(storage_path):
# Load existing index
storage_context = StorageContext.from_defaults(persist_dir=storage_path)
index = load_index_from_storage(storage_context)
else:
# Download the PDF if it does not exist
if not os.path.exists(filename):
urlretrieve(url, f'./{filename}')
# Create the storage directory
os.makedirs(storage_path, exist_ok=True)
# Load the document
docs = SimpleDirectoryReader(input_files=[f'./{filename}']).load_data()
# Create the vector index
index = VectorStoreIndex.from_documents(docs)
# Persist the index
index.storage_context.persist(persist_dir=storage_path)
# Create the query engine and run the query
engine = index.as_query_engine(similarity_top_k=3)
response = engine.query(query)
return str(response)
Example of what the configuration should look like:
Do not change the configuration structure or the general flow of the code unless you understand the internal logic of the tool.
- How the Code Works
- Download or use the PDF: Checks if the file is already stored locally; if not, it downloads it.
- Create or load the index:
- If the storage directory (
storage_path
) exists, it retrieves the saved index. - Otherwise, it creates a new vector index from the PDF.
- If the storage directory (
- Query the index: Creates a query engine (
as_query_engine
) and executes the search. - Return the response: The function
return str(response)
sends the resulting answer from the query back to the agent.
- Testing and Validation
- Send a query: Once the tool is configured, you can ask something about the PDF’s content to verify its functionality.
- Observe the response: The agent will return information based on the indexed content.
- Ensure it is active: Verify that the tool appears as "Active" in your platform or agent.
With this, the pdf_query_custom tool is configured and ready to process queries about your PDF document.