Decoding AI agents

Why specialized AI agents are bringing more control, not chaos, to business operations, and a practical framework for getting started

Nov 26, 2024

Summary

While ChatGPT dazzled us with its general intelligence over the last two years, AI agents are emerging as the practical way forward. These specialized AI systems can reliably handle complex tasks by breaking them down and staying grounded in real data. Through examples from NeuralHQ and our own experiments, we'll explore how agents are delivering tangible business value today through specialized focus and good system design.

It’s been two years since ChatGPT was launched and took the business world by storm. GPT first amazed us with its brilliance and then frustrated us with its hallucinations and lack of precision.

And now, AI agents are emerging as the next big shift in the AI world. Unlike standalone GPT models that follow a fixed prompt-response paradigm, AI agents act autonomously, deciding whether to answer directly, search for data first, or break the task into manageable steps. They’re smarter, more adaptable, and better suited to real-world business challenges. For the technically curious, we've covered the building blocks of agents in our previous post.

We recently sat down with Amrit Anand, Co-Founder and CTO at NeuralHQ, to talk about agents. Between their experience building agents at NeuralHQ and our own experiments automating content workflows, we've spotted some patterns that might help you start unlocking tangible value from AI agents.

What is NeuralHQ?
NeuralHQ is a startup specializing in AI products and embedding models that help businesses leverage AI better. One of their products is an AI-powered shopping assistant that replicates the personalized experience of an in-store salesperson for e-commerce customers, guiding them to find the perfect products. We'll explore their development process in detail later.

In this post, we'll explore:

Divide and conquer: How agents work
Proof of the pudding: Two real-world stories
Agents in the wild: Recent breakthroughs
Reality check: Where agents work and where they don’t
Building with agents: A practical way forward

Let’s dive in! 🚀

Divide and conquer: How agents work

Imagine you're a strategy associate working on a competitor analysis. Like any complex task, you’d rarely complete it in one shot. You’d create a plan, gather data, analyze it and then write your insights. Next, you’d review and refine a few times until everything clicks. Interestingly, an AI agent works in the same way.

Now, consider this at a larger scale. Say, a consulting firm needs to create a 100-page market report. Instead of a single person, you'd have a team led by a manager to deliver this project. Each specialist would own different sections and the team would create a draft. The manager would review the draft and then the team would refine it. Several late evenings, espresso shots and iterations later, they would finalize the report. 😛

This is exactly how a multi-agent system operates. Multiple AI agents with specialized roles work in tandem, often under orchestration from a reviewer agent. "But wait," you say, "why not just have one clever AI do it all?" Just like humans, AI actually performs better and hallucinates less when handling specialized sub-tasks rather than trying to be a jack-of-all-trades. Finally, more AI doesn’t mean more chaos. 🥳

Note: If you’re interested in the science behind this, check out Andrew Ng’s talk to learn how agents are outperforming LLMs on industry benchmarks.

Proof of the pudding: Two real-world stories

By now, you’re probably wondering, “All the theory is OK, but does all this actually work?” Let's examine two implementations that demonstrate the impact of agent-based systems: a customer-facing e-commerce solution and an internal content creation workflow.

E-commerce product discovery assistant: NeuralHQ example

How do you replicate a helpful in-store experience on an e-commerce website? NeuralHQ tackled this challenge with a chatbot using a two-agent system.

NeuralHQ’s e-commerce product discovery assistant

When a customer asks for "Camera for wildlife photography", here's how it works:

A Conversation Agent handles all user interaction:
- It is designed with a deep contextual understanding of both user intent and store inventory
- Fine-tuned for specific retail domains, it engages customers in natural conversation and asks relevant follow-up questions to understand their preferences
A Query Builder Agent focuses solely on understanding the user’s query and retrieving relevant products from available inventory:
- For retrieval, NeuralHQ uses embeddings and hybrid search, ensuring the recommendations are grounded to the store inventory
The two agents, along with decision layers and state maintenance, make up the overall system that works reliably to help users with personalized product recommendations

Sidebar: Why use two agents?
Early attempts to build the product using a single agent didn’t work. The LLMs had limited context windows and were generally less reliable on edge cases. It was also difficult to introduce guardrails and simultaneously fine-tune LLMs to reliably perform tool calls and maintain the conversation flow. By dividing the tasks and using two agents, the overall system performed better and handled most edge cases.

Content at scale: Powered by agents

For a prototype, we used agents to optimize our content creation process from a week-long effort to a 24-hour process.

Content Automation Agent — A prototype for an automated content engine to write blogs

Here's how we achieved this:

Phase 1: Research & Planning

A Researcher agent analyzes customer discussions on Reddit and Quora, studies top-ranking competitor articles and creates a detailed research note
An Editor agent reviews the research, compares it against the company's expertise, creates a differentiated outline and title. It saves everything in an organized folder
A human reviewer refines and enhances the outline

Phase 2: Content Development & Quality Control

An Outline Writer Agent creates a section-wise plan based on the approved strategy
A Content Writer Agent produces the full article while maintaining brand voice
A Reviewer Agent performs quality checks and optimizes the article for SEO
After final human review, the Publisher Agent publishes the article on the blog CMS

Note: We added human checks at key decision points - for reviewing the outline and for editing the final draft. This way, agents handled the time-consuming research and writing, while humans controlled the strategic direction.

Agents in the wild: Recent breakthroughs

It's still early days for agents in production, but we're seeing fascinating examples of agents at scale.

Microsoft Copilot stands out as the most relevant example for business professionals. By embedding agents across the Microsoft 365 suite, it transforms everyday work - from analyzing Excel data to creating presentations and drafting emails - all while maintaining perfect context across applications.

The software development ecosystem has benefited most from agents. As non-technical folks, we can't overstate how game-changing this is. With Replit and Bolt.new, you can now prototype and deploy new ideas in minutes, with zero developer bandwidth. We were skeptical at first, but after trying it ourselves, we were amazed - you can literally build while stuck in traffic! Under the hood, these tools orchestrate multiple specialized agents for everything from architecture to deployment, while maintaining human oversight for critical decisions.

These examples point to a crucial evolution: we're moving from general-purpose AI assistants to domain-specific, specialized agents. However, as with any transformative technology, success lies not just in understanding their potential, but also in recognizing their limitations. Let's examine where agents truly shine – and where they might not be the right solution.

Reality check: Where agents work and where they don’t

The real power of agents lies in their ability to stay grounded, unlike standalone LLMs. Agents achieve this in two ways:

First, through data access. When an agent has access to your business data (like product catalogs or company policies), it references this information directly rather than making things up
Second, through specialized focus. A group of specialized agents often perform better at specific sub-tasks than one model trying to do everything

However, agents aren't a silver bullet. Here's what to watch out for:

Unnecessary complexity: For simple tasks, agents are an overkill. The setup and maintenance overhead only makes sense for complex, multi-step processes
High cost: Multiple LLM calls can add up quickly. Smart architecture choices (like using smaller models for simpler subtasks) are crucial for cost control
Clear boundaries: Without proper guardrails defining completion criteria and iteration limits, agent systems can go into infinite loops

Building with agents: A practical way forward

We suggest three key principles for leveraging agents successfully:

Choose your battles wisely. Start with problems that:
1. Are repetitive, taking 3+ hours per task, but require multiple types of expertise
2. Are too complex for a single LLM but still well-defined
3. Would benefit from real-time data access
Invest in smart architecture:
1. Define explicit handoff points between agents
2. Identify decision points where agents must hand off to humans
3. Set maximum iterations for each agent (e.g., 3 attempts before escalation)
4. Implement automatic human escalation triggers
5. Evaluate each agent’s standalone performance vs. that of the overall system
Start small, scale smart:
1. Start with a small use case. Test the agent in parallel to existing workflows for 2-4 weeks
2. Cache common queries and responses
3. Cap the tokens per interaction
4. Use smaller models where possible to optimize cost

Remember that the best agent systems often mirror how human teams would handle the same process. So, mapping your mental models into your agent workflows could well become your competitive moat. Until next time! 👋

Business Meets AI

Comments