Metadesign Solutions

How to Build a Custom API for Document Search in Google Drive Using Google AI

How to Build a Custom API for Document Search in Google Drive Using Google AI
  • Amit Gupta
  • 7 minutes read

Blog Description

How to Build a Custom API for Document Search in Google Drive Using Google AI

Managing vast amounts of data stored in Google Drive can become overwhelming, especially when you need to search through numerous documents to find specific content. Google Workspace, combined with Google Apps Script, provides an ideal platform to develop custom APIs that can automate this search process. By integrating Google’s powerful AI model—Google Gemini—you can build a robust API that compares input text with Google Docs and returns the most relevant matches.

In this blog, we’ll guide you through building such an API, using Google Drive as a content repository and leveraging Google AI capabilities for content similarity search.

Why Use Google Drive and Google Workspace?

Google Drive is an efficient cloud platform for securely storing and organizing documents. With Google Workspace, your organization can access additional features like collaboration, automation, and integration with other Google services like Docs, Sheets, and Gmail.

Using Google Drive as a content repository allows businesses to:

  • Centralize and organize vast amounts of content
  • Securely manage access and permissions
  • Automate processes by integrating with Google APIs and Apps Script

Google Apps Script: Building the API

Google Apps Script is a cloud-based JavaScript platform that allows you to automate tasks across Google services. In this project, we’ll use Apps Script to search through Google Drive documents, extract their content, and use Google Gemini AI to find matches based on content similarity.

Here’s a high-level approach to building your API:

Step 1: Setting Up the API with Google Apps Script

We start by creating a custom API that accepts a paragraph of text as input and searches for similar content in Google Docs stored in a specific Google Drive folder.

Javascript Code:

				
					function doPost(e) {
  const params = JSON.parse(e.postData.contents);
  const inputText = params.text;
  const folderId = params.folderId;

  if (!inputText || !folderId) {
    return ContentService.createTextOutput('Invalid input').setMimeType(ContentService.MimeType.TEXT);
  }

  const matches = findSimilarDocuments(inputText, folderId);
  
  const response = {
    matches: matches
  };
  
  return ContentService.createTextOutput(JSON.stringify(response))
    .setMimeType(ContentService.MimeType.JSON);
}

				
			

This function accepts an input paragraph and folder ID from the user, processes the request, and returns matching documents.

Step 2: Fetching Documents from Google Drive

The next step is to retrieve documents stored in Google Drive. We’ll recursively search through the specified folder and its subfolders, extracting the text from each Google Doc.

Javascript code:

				
					function getDocumentsInFolder(folder) {
  const docs = [];
  
  const files = folder.getFilesByType(MimeType.GOOGLE_DOCS);
  while (files.hasNext()) {
    const file = files.next();
    docs.push(file);
  }
  
  const subfolders = folder.getFolders();
  while (subfolders.hasNext()) {
    const subfolder = subfolders.next();
    docs.push(...getDocumentsInFolder(subfolder));
  }
  
  return docs;
}

				
			

This function retrieves all the Google Docs in a folder, including its subfolders, and stores them in an array.

Step 3: Comparing Document Content with Google Gemini AI

Now that we’ve fetched the documents, the next step is to compare the document content with the input text using Google Gemini AI. Here’s how you can call Google Gemini to perform content similarity checks.

Javascript code:

				
					function fetchContentWithAI(prompt) {
  const apiKey = getRandomAPIKey();  // Get a random API key
  
  const url = `https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=${apiKey}`;
  const payload = { contents: [{ parts: [{ text: prompt }] }] };

  const response = UrlFetchApp.fetch(url, {
    payload: JSON.stringify(payload),
    contentType: 'application/json',
  });

  const obj = JSON.parse(response.getContentText());
  
  if (obj.candidates && obj.candidates.length > 0) {
    return obj.candidates[0].content.parts[0].text;
  } else {
    return '0'; // No match
  }
}
				
			

This function formulates a prompt that includes both the input paragraph and the document content, and sends it to Google Gemini API for similarity comparison.

Step 4: Making the API Fail-Safe with a Retry Mechanism

When working with external APIs, there are chances of failure due to rate limits or temporary network issues. It’s important to implement a retry mechanism to make the API more robust. In our case, we also use a pool of API keys to rotate between them, reducing the likelihood of hitting rate limits.

Javascript code:

				
					function getRandomAPIKey() {
  const apiKeys = [
    'AIzaSyCtlPD-hiwrsoS9lfHoVxV_vmwQEHoIDOs',
    'AIzaSyAQP7HqVktPCg2mU78SQ9rVm30ioqfQFpE',
    'AIzaSyCoUDyZ3CFfPhfZzMUQcc3VZ68ghKkNwN0',
    'AIzaSyDAAjZa-oU2ROGOXERlFZ4UgIyBeHiUnNs'
  ];

  const randomIndex = Math.floor(Math.random() * apiKeys.length);
  return apiKeys[randomIndex];
}

function fetchContentWithRetry(prompt, maxRetries = 5) {
  let retryCount = 0;
  let success = false;
  let result = 'N/A';
  
  while (retryCount < maxRetries && !success) {
    try {
      result = fetchContentWithAI(prompt);  // Call the AI API
      success = true; // Exit the loop on success
    } catch (error) {
      if (error.toString().includes('429')) {  // Handle rate limiting
        retryCount++;
        Utilities.sleep(retryCount * 2000);  // Exponential backoff
      } else {
        throw new Error(`API failed: ${error}`);
      }
    }
  }
  
  if (!success) {
    throw new Error(`Failed after ${maxRetries} attempts.`);
  }
  
  return result;
}

				
			

This code provides a robust way to handle API rate limits and other issues, ensuring that your custom API can handle failure scenarios. The fetchContentWithRetry function tries the request multiple times with exponential backoff between retries. It also rotates between a pool of API keys to distribute the load and avoid hitting rate limits on a single key.

Step 5: Return Matching Documents

Once the API has processed the request, we return the documents that have a high similarity score.

Javascript code:

				
					function findSimilarDocuments(inputText, folderId) {
  const folder = DriveApp.getFolderById(folderId);
  const docs = getDocumentsInFolder(folder);
  
  const matches = [];
  
  docs.forEach(doc => {
    const docText = getDocumentText(doc.getId());
    const prompt = `Compare this text: "${inputText}" with the following document content: "${docText}". Provide a similarity score.`;
    
    const similarityScore = fetchContentWithRetry(prompt);
    
    if (parseFloat(similarityScore) > 0.75) {  // Assuming a 75% similarity threshold
      matches.push({
        name: doc.getName(),
        link: doc.getUrl(),
        similarityScore: similarityScore
      });
    }
  });
  
  return matches;
}

				
			

This function compares the input text with each document and returns only the documents that exceed the similarity threshold.

Conclusion

Building a custom API for document search using Google Workspace, Google Apps Script, and Google AI opens up a wide range of possibilities for businesses and developers alike. By integrating Google Drive as a content repository and leveraging AI for content similarity, you can create intelligent document search solutions that fit your specific needs.

By implementing fail-safes, such as retry mechanisms and API key rotation, you can ensure your API is reliable even under high traffic or temporary network issues.

If you’re looking to build a similar solution for your organization or want the full code, reach out to us at sales@metadesignsolutions.com.

0 0 votes
Blog Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top

GET a QUOTE

Contact Us for your project estimation
We keep all information confidential and automatically agree to NDA.

About

MetaDesign Solutions is an innovative IT company dedicated to delivering cutting-edge technology solutions tailored to meet the unique needs of its clients.