The Department of Homeland Security (DHS) has a history of leading the way in integrating Artificial Intelligence (AI)/Machine Learning (ML) in their operations. On January 7, 2025, DHS released their AI playbook for Generative AI Deployment. Given that references to open-source tools are sprinkled throughout the document, I wanted to describe three areas of Generative AI deployment where open-source tools would help DHS achieve their critical mission. The tools recommended below perform very specific tasks in a straightforward manner and can immediately improve Generative AI deployments for almost any federal agency. Having witnessed our talented teammates create various internal proofs of concept in the Generative AI space, I’m happy to offer a few insights resulting from these experiments.
Text Sanitization: It is no accident that government agency mission statements do not include embarrassing, incorrect, sloppy, or profane writing. Humans naturally know to avoid such language. How do we restrict such language in the context of LLMs? Two open source technologies come to mind: Detoxify and Guardrails AI. Detoxify is a Python package that classifies toxic comments in several languages and even categories the type of toxicity (e.g., obscenities, threats, insults, hate speech). Toxicity is only one type of unwelcome output an agency wants to detect from their Generative AI context. Guardrails AI takes protection a step further by integrating safety procedures directly into the deployment architecture; Guardrails AI integrates protection with an Input Guard before prompts reach the LLM and an Output Guard to review content before presenting back to the user. To date, Guardrails AI has built 64 validators that protect from unhelpful output, including but not limited to banning specific words, checking for protected-class biases, detecting personally identifiable information (PII), secrets, and yes, even profanity. Using very simple functions from the Guardrails Python package, government agencies can easily apply critical checks to their generated output.
Minimizing hallucination: While Detoxify and Guardrails do a fantastic job of classifying harmful inputs and outputs for Generative AI Deployment, they play a small part in the larger fabric of reducing the likelihood of unhelpful errors. Our partners in various government agencies are asked to adhere to policies and procedures; it is reasonable to require all output from Generative AI products to do the same. Just as agency policies are housed in an easily accessible location for federal employees, the same must be true for Large Language Models. For this reason, it may be no surprise that Retrieval Augmented Generation (RAG) is mentioned so frequently alongside any mention of LLM deployment. Simply stated, a RAG is a framework where an LLM is deployed alongside a database. As part of the response generation, the LLM is required to find and cite relevant documents from a vector database, forcing the response to make use of part or all of the required reference documents. So far, Analytica has had success using Chroma and Neo4j: two open-source technologies that are readily suited to supporting LLM use cases. Specifically, both tools are useful for experimenting on local environments – a requirement from many of our customers – and also enable the use of GraphRAG, a generalized improvement to RAG. GraphRAG organizes the document database into a graph structure, tracking relationships between documents and possibly the portions (often called “chunks”) of text from those documents. These connections give the LLM added context when referencing retrieved text, increasing the accuracy of the response.
Optimize: While improving prompt responses, RAG introduces new design concerns: namely, ensuring efficient document retrieval. While RAG can mitigate the problem of hallucinations, it can also drastically increase response time. LangChain is one key open-source technology we’ve been using for our own internal demos. By leveraging LangChain, we improve performance of our RAGs by enabling us to apply algorithms on top of the database and design AI Agents (i.e. Agentic RAG) that provide instruction for intelligent document retrieval. Specifically, Agentic RAG can decide which siloed database to query based on the user prompt. The scope of the search is thus reduced and response time improved. Another promising framework for optimizing RAGs is LightRAG. LightRAG builds upon nearest neighbors database search techniques by performing two types of searches: low-level and high-level. By separating the graph traversal with two levels of similarity, LightRAG is able to respond to prompts requiring specific answers, while also responding well to broader topics consisting of complex overarching themes.
Seemingly every day a new exciting piece of Generative AI technology is released. Rapid changes to this space can make it difficult to settle on specific technology, but as technical consultants, it’s our job to make our customers aware of all the exciting developments and recommend solutions that address their biggest pain points. The good news is that open-source software continues to be a viable path forward for all interested parties.
About Analytica:
As one of a select group of companies capable of bridging the gap between functional silos, Analytica specializes in providing a holistic approach to an organization’s financial, analytics, and information technology needs. We are an SBA-certified 8(a) small business that supports public-sector civilian, national security, and health missions. We are committed to ensuring quality and consistency in the services and technologies we deliver. We demonstrate this commitment through our appraisal at the Software Engineering Institute’s CMMI® V2.0 Maturity Level 3, ISO 9001:2015, ISO/IEC 20000-1:2018, ISO/IEC 27001:2013, and ITIL certification.
We have been honored as one of the 250 fastest-growing businesses in the U.S. for three consecutive years by Inc. Our ability to succeed and grow is credited to our people and the great work they do. We are an organization that embraces different ideas, perspectives, and people. Every one of us at Analytica offers a unique background and different characteristics that adds to our quality of work and help us better serve our clients. Interested in joining a team that enjoys working together and truly loves what they do? Visit our Careers page to check out employee testimonies, the benefits we offer, and our open positions!