That one sentence highlights the issues that organizations are having deploying agents into production. Large numbers of agents – even quite complex ones with a huge ROI – have been built but are stuck. There isn’t enough transparency to give executives the confidence to give them the green light.
Waymo vs Agentforce
Let’s think about Waymo. It is a remarkable feat of engineering, software, and AI. It is performing a high-risk, highly visible task where the implications of getting it wrong are huge. So let’s draw some parallels with getting Waymo operational vs an Agentforce agent. And then we can come back to the question – “Would you get in one? And what if the windows were all blacked out?”
Platform/foundations
Waymo: It requires established technologies: an electric car, a road system with signposts, rules of the road, training data on roads, bikes and pedestrians, and a mobile app with credit card billing.
Agentforce Agent: The foundations are documented metadata in the platform (Flow, Apex, Prompt Templates) and well-managed data (Customer360, Data Cloud, external systems).
Task/process
Waymo: This is well understood. Get from A to B using the best route and obeying all of the general traffic rules, local city rules, and signposted signs. The scope is also very narrow. Waymo only operates in San Francisco and Phoenix city areas.
Agentforce Agent: This could be almost any task. So the process needs to be tightly scoped and explained. The higher the risk, the higher the visibility of the task, the tighter the scope, and the more explicit the process/rules need to be.
Agents are digital labor. You wouldn’t get a new employee and give them a list of instructions and a set of actions and expect them to figure out how to apply them without giving them some view of the approach (i.e., process) you want them to follow.
Whilst this seems obvious, we are seeing many (most?) agents being built by writing instructions and when the agent is not performing, more and more instructions are added to counter each prompt or scenario where the agent failed. This leads to a huge list of instructions and actions that becomes impossible to reconcile, validate, or refine.
Training and deployment
Waymo: There were months that the Waymos were driving around San Francisco with a driver monitoring what they were doing. And then there was a formal sign-off by the City.
BTW in that time, the locals had become more comfortable that they were safe, but it also worked out that a Waymo will never take risks. So now other cars can “take advantage” of them by pulling out in front or pushing in when there are lines of traffic. Consequently, a Waymo takes longer than an Uber or Lyft ride!!
Agentforce Agent: They are so new and are evolving so quickly that there hasn’t been enough time to establish the best practices for development and testing. Also, there seems to be an undue urgency to get high profile, customer-facing agents live and into production. And some are not performing well, which is (unfortunately) knocking the confidence in the Agentforce platform.
So, would you get in a Waymo?
Yes, and I have done. But I understand what it is doing, and I can monitor it. I can evaluate how it is driving and look on Google Maps to see the route it has chosen.
But what if all the windows were blacked out? How different would that feel? You get in, and the doors lock and only unlock when you get to the destination. That is how an agent feels to the business owners (and those who built it).
Agentforce – the confidence factor
There is a lack of confidence in the Agentforce Agents that are being built. This is because there is no visibility into what they are doing, i.e no process. And they are not behaving reliabily. I.e. build approach
At Elements.cloud, we don’t lack confidence. We are having huge success building agents and getting them deployed in days. Even very complex ones.
Deployed. In. Days.
The Agent Builder tooling means it is very quick to build an agent. But the true metric is not from “idea to agent built” but from “idea to live in production”. And that requires sign-off from the business stakeholders.
The reason so many agents that have been built are stuck in the “evaluate” phase and have not gone live is a lack of confidence due to a lack of transparency.
There is no visibility into what the agent is doing
The AID (Agent Instruction Diagram) provides a number of benefits to increase confidence
- able to engage business users and understand what they want
- it is far easier to understand than a list of 40-100 instructions
- the diagram generates unabiguous instructions
- easily see where actions vs instructions vs guardrails are required
- easier to explain agent behavior and get sign-off from business users
There is no repeatable way to get reliable complex agents
A process-led approach with the AID means it is quicker to identify the unusual behavior and which area to change. The approach:
- speeds up the testing and evaluation,
- provides consistency and reliability.
- understand when the agent is ready for deployment.
The alternative is simply adding more and more instructions and hoping that it makes a difference. Every time it doesn’t, it knocks the confidence of the business users. It also means that there is no way to understand when the agent is ready for deployment.
There is no audit trail or governance
Whilst the agent metadata is versioned when you push it to production. But this does not help understand what the agent is doing during the build and testing phases. The AID has a change log during the development of the agent and then formal versioning and sign-off when the agent is deployed. The benefits:
- speeds up the testing and evaluation
- ensures that the compliance teams can sign off the agent
- monitorig and improvements are easier to manage
Proven approach. Not yet widely adopted
This process-driven approach has been proven to deploy agents 90% faster, but it is taking time for organizations to adopt the approach. This needs to be led by consulting firms who are being engaged to build agents. And they have a vested interest in delivering reliable, complex agents more quickly. Even more important is the audit trail, which means they are protected when the agents’ performance changes after they have left the client.
Salesforce Professional Services recently presented at TDX25 and talked about how they used this approach to deliver internal agents significantly more quickly.
We have provided training on the approach to help others develop agents that the business can feel confident in deploying.
Ian Gotts
Founder & CEO