Webinar Recording
Five Questions Every HR Leader Should Ask About AI Agents
Two of Valence’s AI researchers, Chief AI Scientist Jeff Dalton and Director of Applied AI John Foley, sit down with Head of Content Das Rush to demystify AI agents for HR leaders. The session traces the history of agents from Shakey the Robot to modern multi-agent systems, defines the difference between a chatbot, an agent, and a coordinated agent harness, and unpacks how Nadia is architected as a multi-agent AI coaching system. The conversation closes with concrete guidance on how HR leaders can pressure-test vendor claims and avoid agentic sprawl in their HR tech stack.
Key Takeaways
- AI agents are not a new idea, but LLMs changed what they can do: Research on agents stretches back to the 1972 Shakey robot and the reinforcement-learning agents of the following decades. Large language models did not invent the agent paradigm, they unlocked a new substrate: agents that understand natural language, reason flexibly, and adapt to open-ended environments.
- A chatbot, an agent, and a multi-agent system are not the same thing: A chatbot is a customized layer on top of a foundation model. An agent operates in an environment, uses tools, and acts on a plan. A multi-agent system coordinates specialized agents through a harness that orchestrates which agent handles which task. The vendor claim that something is an "agent" should be tested against this definition.
- The most common enterprise mistake is thinking too small: Jeff and John see organizations starting with small, in-house models or trying to automate a single existing pipeline like a survey. The higher-leverage question is whether AI can replace the workflow entirely. Can the agent read the documents, observe the environment, and skip the survey step altogether?
- Nadia is architected as a coordinated multi-agent system, not a single chatbot: Inside Nadia, sub-agents are constantly working in the background: planning the conversation, tracking goals, watching for blind spots, deciding which underlying model is best suited to each moment. More agents run between conversations than during them, continuously updating the plan for the user.
- Pressure-test vendor claims by piloting the actual system: Anyone can claim guardrails, memory, and agentic architecture on a slide. What matters is whether those capabilities work in practice. Define your use cases, build internal benchmarks for what good looks like, and have end users actually use the system. Being an "AI whisperer" means spending time with the tool to find its jagged edges.
- Agentic sprawl is a real risk for HR leaders: As agents proliferate across the HR stack, the failure mode is friction: tools buried behind Workday, behind legacy systems, behind layers that make users give up. The guidance is to remove barriers to adoption, understand how agents are actually being used, and gather feedback through the work itself rather than through yet another survey.
.png)
Jeff Dalton
Chief AI Scientist
.png)




Key Points
Welcome and Speaker Introductions
[00:14] Das Rush: Alright. Hello. Can you hear me okay, everyone? Welcome. Welcome. So we're gonna give a minute here just to let people trickle in. And as we do, My name is Das. I'll be your moderator today. We have a really terrific session for you. This is really, designed to be educational. It's a chance for you to bring all your questions. We want it to be interactive. So as we get started, I wanna just invite people to use the chat, share your name, where you're calling from, and we'd love to know, what is one thing that you really wanna learn or better understand today about AI and specifically about AI agents. We know it's something that everybody is hearing a lot about, but sometimes it's not a term that's really clearly defined or completely understood, and we have two foremost experts here. Just to give you a quick, overview, we'll have about thirty minutes, twenty to thirty minutes of presentation, and then we'll have plenty of time at the end for q and a. So please ask lots of questions. The more interactive, the more you'll get out of it. And with that, I kind of wanna introduce, our experts. We have two AI researchers with us today, who have been building agents and agents systems for decades. Jeff Dalton is our chief AI scientist at Valence. He's also a Turing fellow, a professor at the University of Edinburgh, where he runs the grill lab, which is a research group that has been focused on conversational assistance, as well as kind of the future of foundation models and agents' information access. Before, before he was at Valence as our chief AI scientist, he led language understanding for the Google Assistant, helped build knowledge graphs that underneath that are underneath the Google search that we've all used for years, and he's published over a 100 research papers in the field. So just to say Jeff is a heavy hitter and a terrific like, I've learned a lot from him personally. Joining him, we have John Foley, our director of applied AI at Valence, who leads the team behind Nadia. He is a PhD from UMass Amherst and decades of building kind of production ready conversational AI systems. And so with that intro, I wanna actually, kind of introduce a fun fact here, which is Jeff and John actually met at a research lab in UMass. And so as a little bit of a fun intro, I thought, Jeff, do you wanna give a little context on this photo and how it connects to your time at UMass?
From Jeopardy to UMass: A Shared Background in AI Research
[02:46] Jeff Dalton: Thank you for the introduction. So this is what you'll see here, this is Jeopardy. You may be familiar with Jeopardy. And I'll tell you a little bit of a backstory about me and John, kind of where we come from. We did our PhDs in information retrieval, also known as, the science behind search and how search engines work. And one of the kind of fundamental kind of ways of that search can understand language and how agents can understand language and then make that useful. And for this case, it was question that was question answering. Can we do can you do Jeopardy? And one of the underlying systems that we have for Jeopardy is a specialized search system that would that was built by some of the technology that the lab that John and I both came from, the intelligent information retrieval lab at EMACS. And we both worked on kind of the some of the fundamental infrastructure for, this. We didn't work we didn't work on Watson specifically, but we worked on similar systems and architectures for the future of complex question answering and complex tasks, which was really foundational AI. And I think what I what I would have highlight here is that the Watson QA system is a highly specialized AI system built for one particular purpose built just for jeopardy, being really good at jeopardy, being hyper efficient at search, being able to search lots of knowledge bases really, really quickly with lots of different specialized search systems as part of it, all going together to make a really successful bespoke product. And it was really hard to generalize to lots of new domains and applications, as I've been finding. So that's starting to change in the in our current AgenTek world, and so we're gonna be talking a little bit about what that means. But, John, do you want to chime in and talk a little bit more about the history or what you must?
[04:29] John Foley: When I was, 22 year old, I decided to go to grad school because I wanted to teach. I had a degree in electrical engineering and I had some idea of what computer science was. And I joined this IR information retrieval class kind of at on a whim. And then by the end of that semester, I was doing independent study. And by the end of that semester, I joined the lab and Jeff was my mentor that summer, on a, a grant funded project for actual actually video retrieval, which is kinda interesting. And I sort of so I tell people I sort of wandered accidentally into one of the highest ranked search labs in the world, and I found out that our lab search engine had been used at Jeopardy, and I always thought that was the coolest thing. So, yeah. Since then, I've obviously done a lot of things. But, when I think back to how it started, I sort of wandered in and met Jeff, and it all took off from there. It's kinda fun.
[05:24] Das Rush: It's and it wasn't just so your lab kind of was part of the team that was behind IBM Watson's Jeopardy performance, but it's also I think it was a neighboring lab that we wanted to talk through on this next one, is really where a lot of the initial multi agent systems that we hear about today were born. So as we get into kind of presentation mode here, Jeff, John, can you give a little bit of context on kind of the origins and what was happening at UMass and maybe, some context on what we're looking at here?
[05:54] Jeff Dalton: Sure. So what we're looking at here, and this wasn't just at UMass. It actually started back at CMU, Carnegie Mellon University. That was what that was there as well as, other SRI, at the time research lab as well as University of Southern California and others that we see here. And so this work was really some of the first work in the conversational systems. How can how can computers understand speech? And they actually built a multi agent system that combine lots of different speech systems together to make an Uber speech system that they call it hearsay two. And so this was, one of the first steps of how can we have a multi agent system, really, for a very single bespoke task. The first conversational system that really started to lay the foundations for future conversational systems like, that became the basis for Siri and for the first generation of assistants. And I think what I want just wanna highlight here is what you have on the right hand side is kind of a blackboard. So it's a place where agents can talk to each other. So it's a place where agents can put information that they can share with each other at different levels. In this case, it's speech. So I went from, a very low level kind of syllabus, syllable level, or segment level all the way up to, phrases and chunks, and be able to have different hypotheses about the world and modeling of the world and share information with each other. And this actually, this architecture is still used for multi agent coordination today, in terms of some of this. So this is 50 old, and it's still impactful and still some of the state of the art for our the current set of agents. And so this what I what I wanna emphasize here is that we have fifty years of research on building multi agent systems, algorithms, data structures, and all the understanding. And so this is not new. This is becoming a lot more popular as the agents become a lot more general and powerful. We're
The History of AI Agents: From Shakey the Robot to Today
[07:44] John Foley: Gonna talk
[07:46] Das Rush: We'll talk a little bit, I think, here about why, and how that history, what has and hasn't changed. So I think this was one of my favorites that you all had. So as we kind of finish up our history lesson and get into agents today, Jeff and John, can you tell me a little bit about Shakey?
[08:05] Jeff Dalton: Sure. So Shakey, Shakey is really fun. Here's that Shakey was a first robot. Again, kind of coming with the, the one of the underlying it was the first one of the first systems robotic systems that you could talk to with natural language. And so the goal is that you could tell it what to do, and then it would go and actually do it. So it could go push this thing off the desk, and then it could take actions in a very defined space using its video camera to observe the world and see where it is, be able to find the object that it could push off, and then actually take an action in the world to actually shove, both move and shove something off the table. And it worked back in 1972, and it worked maybe a little too primitive. But in many ways, this agent was actually much more advanced than a lot of the agents that we're seeing, where people are calling agents, in some of the hype cycle that we're in about agents today. So before we go further, what I actually want to go back was we wanna have a really crisp definition of, what is an agent and what is not an agent. And so this is the kind of fundamental kind of textbook definition. If you look up the introduction to AI by Peter Norvig, you'll pull this out. This hasn't changed in many, many years, and it's kind of and it's foundational. So let's walk through each of the different parts of this so that we can understand this. First is that an agent is a system that has a model of the world. It has a model of specific discrete representation of what is in the world, where it is in the world, and what it can do in the world. So that's called the state space. Then once it has that world model, it formulates a plan. That plan has actions that it's going to take, the next step that it's going to take in that in that world model. So whether it's moving or turning or language models that might be generating the next word in that or the next sentence, it's building a plan that can then be acted on. Then it's executing that plan. It's actually taking action in the world. It's not just thinking about it. It's going manipulating this environment. It's making a step. It's generating text. It's, making a tool call. It's performing actions that change the world. Then it's taking and seeing what actually happened. Not just taking an action, but reflecting back, observing the behavior. Where am I now? I meant to go here. Did I actually go here? Because sometimes the actions are not always, don't always have the effects that we intend. So then what's my current new I'm gonna update my model of the world where I am now. See, is that a good place to be? Is that a bad place to be? Is that where I intended to be? And then it's going to actually learn from that behavior. So it has a policy. What are the what's the next thing that I should be doing? So it has a we call that a policy where it's figuring out what the what the next action space it should be doing. And, ideally, most agents should actually adapt that policy and have it evolve over time so the agents are getting smarter and the policies are getting better, and the learning is happening and the agents are evolving. Is there anything that I missed there that, John, that you wanna add?
[11:15] John Foley: No. I think, agents have been around for a long time. They talk about reinforcement learning agents, which build upon that, and, that's really get where the policy comes in and, how to learn from it. And this model of agents that, like you have your state, you have what what you're going to do, that has existed for a long time. And as you're gonna get into, Jeff, recent technological improvements have made agents much more powerful. What actions agents can do and, how they understand the world and predict the next action is what's dramatically changed.
[11:51] Jeff Dalton: Cool. Let's talk to that. Let's dive in.
What's Changed: Large Language Models and the Rise of Modern Agents
[11:54] Das Rush: Alright. And we've already got some great stuff coming in the chat here. So, I'll just as you as you go into this next one to kind of explain, the definition hasn't changed, but what has a couple of things that are coming in, is really this question of, how do we coordinate agents, especially as we see agents start to sprawl, and how do you build this kind of unified architecture when you have all these specialized systems. So, we'll get into what's kind of what has changed, and then we'll be touching on sort of single task personal tie agent systems here. So, Jeff, go ahead.
[12:27] Jeff Dalton: Thank you. Those great questions, and please keep those comments coming. We'll make sure that we get to them and have time to dig into them. Just wanna continue because we're going from here, how things have been evolving and where things are today. So going back, we saw Shakey. We saw Shakey in that 1972 in a very kind of fixed, simplistic world And, normally, there's block world or one plus world with a very small fixed number of moves. We probably play video games, scrolling games where we can go up and down, left and right. And those are kind of a very simple role model, very simple action space. And what's fundamentally shifted that kind of John alluded to is that the world in which these agents are operating in, one agent, multiple agents as well, these are now operating in the information space. So they have access to our information tools that we have, our communication platforms like Slack, our granola notes for taking meeting notes, our Zoom meetings, our drive integrations, our Teams integrations, and our docs, and they can live and they can operate across the space. They can read them. They can also write to them. So we have protocols like MCP, model con model control protocol, that standardized ways for agents to be able to interact with these systems are becoming increasingly common and standardized across the across these agents. And that means the actions that they can take is not just a handful of moves, but it's writing a whole email. It's updating your calendar. It's writing a whole program or an app with code and having really detailed kind of plans and orchestrations that are working together and coordinating with other people. And that and that kind of fundamentally means that the space that the agent can work on is massively evolving, and it continues to evolve as these systems and agents become able to work on much longer and more complex problems.
[14:19] Das Rush: Great.
[14:24] Jeff Dalton: So speaking of, what's been changed, how is this possible? What's the actual what's actually changing, and how does that change how I like to think about these agents, we kind of talked about kind of very simplistic systems with 1972 that we're starting to understand language. If you look at things like, Arthur Samuel in the nineteen fifties, built a chess engine that had 33 bits, for its world for its world state that they had to fit into a custom a custom hardware that they built just for it. Very bespoke. Whereas what we've shifted today was a lot of what we're able to do with general purpose computing, just the laws, the scaling laws that we see at the evolution of computing power and AI power that have really been able to drive use of neural network based approaches, The ability to parallelize these with GPUs within one GPU and across multiple GPUs is really transformative in terms of what the algorithms that we could actually run on these systems are. It makes these neural networks feasible for the first time that people are probably pretty familiar with. But let's keep going, and let's look at kind of where and how that how that compute power has changed the underlying agents and brought them into the information age. Wanna keep going to us. So the I was at Google in 2013. I was just kind of graduating from my PhD at the time if that tells me anything about me. This was this was Atari. So this was the use of DeepMind for the first time that, again, playing video games. We see a lot of agents in video games if you in terms of that. And the really the innovation here was all it gave access to was it could it could watch the screen. So it didn't have access to control, but it could watch the screen, understand what was happening on the screen, and be able to then take actions and control. It was learned end to end with reinforcement learning. This is a real big feat at the time. And now what we're actually seeing is that computer use, these agents using our computers and using our screens is the same underlying fundamental technology just at scale and actually being deployed now over a decade later, not just for games, but for our everyday information tasks that are that are that are happening with us. And that's continued that's continued to evolve. So the underlying that has been scale, with scaling laws for language models. The transformers being a scalable way for us to suddenly do training of these models that cost huge corpora. We can now train them on the entire practically train on the entire web. And now we're moving into to video and moving into simulated environments because we've used up all the information on the web, and so now we have to go find new information. And that level of training at scale has meant that information available and how these systems are able to operate and understand the world is fundamentally different from what we had before because they now understand language and speech and images and text, and now also how we use our computers in simulated world. So that's the transformers and intentions all you need with data and compute at scale to be able to make that possible. So we're now into a place where we were with old model a few bits. Now these models are order trillion parameters, right, which means they're running dozens or hundreds of GPUs spread out across different systems and machines to be able to make and use them, to be able to use them fast so that these agents can now operate at scale with really rich, capabilities. And we have lots of different small models, large models, all used for different tasks, and so we are gonna have different agents of different capabilities. And so what we're seeing here is that once we've seen that evolve, 2022, the launch of Chat GPT, same cloud code for coding, and enterprise multi agent systems of different sizes and capabilities. They're just really richly evolving, fast evolving ecosystem that's happening there. So we wanna get really down to the fundamentals. There are things that you need to know that we want that you want to ask. This is going back to, again, what's the definition of the agents? We wanna separate we wanna be able to separate the hype from the actual capabilities of what the system and the agent can actually do. So the first is going back to the definition. What state environment does this operate system? Does it exist in your file system? Does it run on your local computer? Does it run-in the cloud? What tools does it have access to? What capabilities is it connected to? And what how does that actually what does it have for its model of what the world is and particularly your world? Because that's that fundamentally affects and shapes what it can do and how it thinks about the world. Once it has a model of the world, next thing is what can it actually do? The system, typically, they can they can some of them can they run commands locally on your computer? Can they just generate text? Can they generate rich HTML? Can they make tool calls? What artifacts are being created in the in the knowledge world that aren't there before? So each of these systems has those different capabilities to be able to take those different actions and it's how you configure them and how you build them for what you can actually do. And how does it actually plan? So, of course, going back to the fundamentally, what is what is it actually doing? We're moving into a place where the most of these models are called reasoning models. So they don't it's not actual reasoning, but these models actually operate by talking to themselves. So before they perform a task, during the task that they're actually performing, they're saying, is this good? What's my plan? They make a plan. You might have plans. Some of them have an explicit plan mode that they go into at the beginning and they write a spec, and then they actually go and they then start working on it. They have sub agents working on parts of the plan to delegate them. And so it's really fundamentally what's changed is the fact that the models can talk with natural language. So they're talking to themselves, sometimes, and they're talking to other agents to communicate via different kind of blackboard mechanisms or other forms of kind of coordination mechanisms. And one of the fundamental parts of this is what does it actually remember? So you might have heard about memory systems. So a fundamental thing about an agent is that it has a model of a world, but also has a model of where it's been. It has the history of the trace of what the past was so that it can learn from the past, and it can learn from the behavior of what worked and what didn't work so that it can then improve across time. So maybe a chatbot just kind of the memory resets in every point, and you're starting you're starting fresh. But in these agent systems, what we're actually seeing is that there's evolving memory systems that has accumulated knowledge of what it needs to know about you to be able to help accomplish the task. And under the hood, an underlying question that you need to be thinking about is what is the actual architecture? Many of the things that are being called AI agents are not actual agents. Some of them are just old school machine learning systems that are built for a single task. Maybe they're a classifier, but those aren't really agents. So is it a prompt? Is it a model? You have to, actually dig in to understand the core underlying technology and map that back to the definitions of what we have and what we know about an agent and how it learns to be able to say, what's the approach that we have? Is it a single pass based agent? Is it a multi agent system? Is it a single specific agent? And how do all of those work together? And how does how are the weights updated? How is their contacts retrieved and updated over time?
Defining the Difference Between a Chatbot, an Agent, and a Multi-Agent System
[22:17] Das Rush: Great. Now we're gonna, in a moment here, I think, get into a little bit. This is sort of, I think, a really foundational layer to understand about agents and what they are. And then we're gonna talk a little bit about why we see coaching as a use case that really requires an agent approach and the approach that we've taken to build Nadia. But before we do, I think, Jeff and John, we've had a couple of questions, as I mentioned, really come in. And so I wanna pause and just maybe take some of the current questions before we dive into Nadia, because one of them has really been this question of agent coordination. Could you talk a little bit, maybe double click on the architecture underneath? What is the difference in your mind between kind of a chatbot, a single task based agent, and a multi agent system?
[23:10] John Foley: So I can take a stab at this first. So a chatbot, to me, is something that's, more or less a layer on top of cloud or chat g p t. One of these fundamental LLMs where it has specialized knowledge. Maybe it's answering support questions for you or maybe it's giving you access to some kind of other knowledge base. And so it's customized and domain specific and brings in a lot of useful information and maybe, like this is where the rag hype from a couple years ago comes in where, maybe they're, customizing the agent's answers with a particular knowledge base on the fly. And then, a task based agent, I think of, the canonical example is, cloud code or cloud co work or deep research or one of these things. Where you set out the agent a task and it might actually push back and help you form a plan to compute this task or it runs with your goal. And it does deep research and it'll search the web and it'll, invoke many other subsystems and maybe take, minutes or hours to respond depending on how much you ask it to invest. And then when you get to multi agent systems, right, this is really in the world of really sort of, bespoke systems that are very complex. And so you have things where it's not just one agent going very deep on a task and, you start getting into spawning sub agents where, part of this task is gonna be go solved by, this one agent that really understands the financial markets, and you have another agent that really understands, engagement on social media. And they're all, reporting their findings back and sharing communication, and the agents sort of, interoperate by asking each other questions. There's other ways to coordinate agents, but, when you get into the extremely, multi agent system, that's sort of what's happening. I think there's, another level maybe, in and around there where, sometimes people think of multi agent systems and they're thinking about, how do I get these, products to communicate with each other? And that's, a little bit different. Because to talk to Workday, you need the Workday API or something like that. And protocols like MCP can make this easier for agents, but that's not necessarily something that, every, quote, unquote, agent product comes with out of the box. Jeff, I don't feel free to I'm just
[25:36] Jeff Dalton: Gonna jump there. That was really great, John. I 100% agree with what you said. And I think there's a hype there around harnesses. So people are now saying, what's the harness around your agents, and how does and how do what's the meta harness around those kind of multi agent systems? And if you actually dig it back, it's kind of what we started at the beginning. There's a blackboard system that underlines one implementation, and they each have different coordination mechanisms. And many of them proliferate, whether it's open claw or clawed code or must have they all have different harnesses and it's that we're calling us that system on top of the agents state. The environment that the again, same questions we're having. So here what's the environment that multi agent system is operating in? How does it evolve? What are the capabilities? What are the safety mechanisms that are built in so that we know that it's operating, as expected and traceable and observability? And those are all part of that extra meta layer, around our multi agent system to ensure that things are things are functioning as we expect them to be.
What Is an Agent Harness?
[26:34] Das Rush: Great. And you introduced a new term there, and, I've got a few questions before we go into the next session. But, I want to make sure that we've kinda defined crystal. You introduced this idea of harness, which I think is becoming a word that people are hearing more and more as agents come in. How what would be kind of your crisp definition of what a harness is?
[26:54] Jeff Dalton: Good question. The harness is the and there's probably better definite the people that have, really clear definitions. And I think that definite definition has been changing a little bit, as the systems are evolving. But the harness is the operating point, the system within that is the orchestration for how the agents operate and some agents operate within each other. So it's the mechanism of what tools they have access to. It's like what environment is again, something like the environment that they can operate within. It's setting up how the agents coordinate with each other and how the sub agents work and how the meta agents all work together. Those are all the meta system. That's all that's all part that's effectively the harness definition.
Security and Coordination Across Multiple Agents
[27:41] Das Rush: Great. And then, a couple of questions that have come in that I think are good before we get into some of the maybe coaching specific aspects of an, multi agent system. So we had a question here, a great one from, Jeff Dalton on securing a single agent seems straightforward, but what thoughts do you have on how you build a unified architecture so that as you have hundreds of specialized multi agent teams, you're not creating conflicts between them or security black holes?
[28:07] Jeff Dalton: Sure. So, I'll be asked, the I'm not a security expert. John is actually the security expert for some of these. And I think also that this is actually I think it's an open research area. So there is this is not something that is solved. There's there are active developments of research. I think perplexity just released an open source, again, environment that's effectively monitoring the multi agent systems. So the ability to have that extra safety layers, and observability layers are rapidly involved. And we're just talking with Fiddler, for example, who also has a whole observability and traceability stack. If you wanna check out our previous podcast, for some of the work that we did and talk about issues with them. But was there something that you wanted to add, John?
[28:52] John Foley: I guess my instinct would just be to say that, securing agents is the same as sort of, securing users. You need to think about, sandboxing them. You need to think about, where are people gonna share files. Is it through Google Drive? Is it through, this? How are we gonna structure our folders? And, with humans, you can take things that are just policy, and that's okay. And with agents, sometimes you have to be a little bit more careful because they won't necessarily remember the training while they're mid task. Or something like that. But I think there's a lot of things you can do there to, sandbox, secure, control how things are communicating, but it really, like there are good security principles to follow, and there's obviously, specifics that are going to matter here. But I think there are ways to design through this. But it is a challenge that's growing, and everybody's feeling it. I think I was reading a blog post from GitHub the other day, right, where they're talking about their struggles with the massive increase in coding that we've been seeing. And so, yeah, this is this is a place where we're learning fast, and I think I think there's light on the horizon. But
Common Mistakes Enterprises Make When Adopting AI Agents
[30:07] Das Rush: Great. So kind of a real frontier, I think, as we're talking about agents, what that coordination is. And we're gonna get in here in a minute to kind of how we've structured the coordination within Nadia as a multi, agent system. But I have two great questions here, and I'll make this a two parter for you. This is from Moira, Linda, calling in from North Carolina. And two questions for you is what early mistakes have you seen companies make with the agentic AI adoption? And what's one thing from your kind of areas of expertise in AgenTic AI that you don't think most HR leaders are thinking about right now?
[30:49] John Foley: Jeff, do you have an instinct here?
[30:51] Jeff Dalton: So I think, for the first I'll let me start with the first question. I think some of the mistakes I've seen people making is, sometimes they start with small models that they that they wanna run-in house. I saw this up in Burton. We're gonna build our own models. We're gonna build our own systems. We're gonna we're gonna do it do it ourselves. And, that and then what effectively put a lot of time and resources into building that capability. And then the evolution of change that are happening in the Frontier Labs, well, actually, if you just want a bigger model and the scale wins, and so you can do it. So you can do really simple things with the Frontier models that just weren't possible, and they're continuing to evolve really rapidly. So it's like a so I've seen systems go, I'm able to specialize fine tune model for x y z. And it's kind of like or the rate that these systems are evolving, I think it's better to stay at the stay at the frontier and make sure that's that you're at that forefront. Sometimes people are thinking too small. Sometimes they're thinking about, I just want a QA bot, when actually you could be rethinking of, what does it mean to rethink an entire workflow in someone in someone's, in an AI first way with all having all the data connected to each other and really thinking big about what's possible. Instead of thinking about, a small pilot, what does it mean to have an AI transformation in terms of, transforming someone's role foundationally? So a big thing here is, oh, we gave people access to our complete encursor, and so it's, better code completion versus the link to quad code where it's like, we have not found this coding agent that's writing our software. And that's so I think that's the kind of shift of thinking small to thinking big and being an AI first approach.
[32:37] Das Rush: Great. John, anything you wanna add before we, maybe talk a little bit about what Nadia as an, multi agent system looks like and how she works?
[32:47] John Foley: I think, I'll just briefly echo Jeff's point, right, on, the thinking big. I think we see a lot of people looking to, replace existing pipelines or something like that. And or we see, very simple, surveys that people are looking to automate away. And I think, yeah, stepping back and thinking, like what are what are the problems we're facing, right, that we can't tackle? And, how could an agent make a difference there? maybe it could be, personalizing the survey or following up with people, right, and increasing engagement. And so, thinking a little bit more out of the box and yeah, I would agree with Jeff that, the frontier models are so good, so fast that, it's sort of hard to keep up with that in house and that fine tuning, is a race that you sometimes lose.
[33:39] Jeff Dalton: And I can wish on this that people think about building a better survey, but actually, really thinking, do you need the survey with AI? Can it look at the documents? Can it look at the environment in the workspace and actually, come up with, here's one question I don't know about yet. Can you verify this, and be able to rethink what that what that looks like in a in an agent first world?
[34:01] Das Rush: That's such an that's a great example because I think it's this question of, do you need the survey, or can AI get you straight to what the survey was trying to give you to begin with? And I think that's a question that's a good guiding question across a lot of these workflows. I wanna make sure we get a chance here to get into some of how we've architected Nadia and how I think this has all been very theoretical and a good foundation. But to kinda get into some specifics, let's talk a little bit here Jeff. Why have we taken an agent approach with? And why do we feel that a coaching in particular, enterprise AI coaching, requires having an agent approach?
Inside Nadia: How a Multi-Agent Coaching System Works
[34:38] Jeff Dalton: So here what you might have on the left side, you might kinda come in task that people are looking at. You wanna do something, you might do this with your favorite task assistant. Help me write this email. I'm a I'm a manager. I started at a company. I'm staring at a blank screen. I'm facing a difficult situation. And if you have a draft, an AI can help you help you, work on that draft. But an agent first approach is different because an agent first approach is going to look at your background information. It's gonna look at the interaction patterns with your manager. It's gonna look and have hypotheses about what's happening in the world and have a plan before you even sit down to that keyboard. It's gonna it actually is like, should I write that email? No. I that shouldn't be an email. You should go talk and have a one on one with that manager. So it's, again, a simple chatbot or a simple element there might be able to help you write that email a little bit faster, but and agentic approach really fundamentally changes that work that workspace and that work mode to a different level of planning and kind of strategic and tactical work to be much more effective. And if you look on the right hand side, what we have here is that as you're looking at a manager, you're operating you're working with complex teams, you're coordinating across teams, you're looking coordinating across different, different systems that are there, complex relationships, the set of spaces, and how we're going to interact and what you expect of an agent, you should expect more of it. So, yes, it should help you. It's gonna have to help you role play that conversation with your manager before you're one on one. It's gonna help you, work through those key performance moments and help you not just write a slightly better, a slightly better performance review, but it's actually going to pull in and have a data first, agent first approach that's connected to all of your people in your world have an objective view of performance in a way that just wasn't possible before, with without business. And that's what we're building with Nadia.
[36:31] Das Rush: Great. And I think with that, let's maybe talk about how Nadia works, or how some of how we've seen Nadia work. And, John, you wanna tell us a bit about, who's Maya Chen and who does she represent?
[36:44] John Foley: So this is Maya Chen. And I joke that she used to be a stock photo model, and now she's a senior manager at, Acme Corporation, right, who provides most of the technology for coyotes. And but, to get serious, right, coming into an enterprise job, a new senior management role, you have a bunch of new reports. You have to learn. You have to build trust with them. You have to listen. You need to meet your peers. You need to build political capital with them. You need to understand what's important to them, all the different projects that your team is working on, and that's growing. And you have to learn the company as well. It's not just how to be a good manager, but, how is the culture of the company? How do you fit in with that? How do you help people meet their goals and fit and, excel on these metrics. And then on top of all of that, right, it's still your first hundred days on the job. You're looking for, that visible win. And so if you come into, right, a sort of, one on one or you're getting nervous about a one on one, you're not going to send deep research out to be, what should I do? I'm nervous. You don't want an answer four hours from now or minutes from now. And so, the thing that we've spent a lot of time on and think about a lot is, when someone comes to Nadia and has a concern and, yeah, next slide now, that's Like, they come in with, I need to prepare for this difficult one zero one. There's this direct report I'm really not clicking with yet. And, how do I how do I make this work? And Nadia actually right. This is where the multi agent system at least starts. As it comes in, and the very first thing we do is we start thinking about almost at the design level, what underlying models are best to use. We have, not only, small models and big models, but we have reasoning models and thinking models and different levels of reasoning now. And all of these things feed into, how do we handle, the different components of the system and how do we address sort of Maya's stress and concern here. And so sort of the first level of responding to a user's message is kicking off our situation analysis. What does Maya need here? Does she need a script? Does she need a sounding board? Or is she writing an email? Is this a tough communication? Is she planning a step back? What's happening here? And in this case, right, there's a little bit of urgency to it, and the situation analysis agent is gonna notice that. It's gonna prioritize that. It's not going to spend forever thinking before we get to value. The other thing or sort of the next agent that's, firing and thinking is our memory agents. And so we have many types of memory inside of Nadia. There's longitudinal memory of understanding Maya and her history and the different conversations she's had with Nadia and, the different interactions, that have worked and have not worked in coaching that she's responded to. And then we also have relational knowledge, and we find this to be incredibly important in a coaching context. Because it's how you relate to your direct reports, your peers, your managers. And that's a very important layer of, memory for our agents to understand what's going on. And then we, of course, have organizational knowledge. We know things about Acme that maybe Maya doesn't even quite know yet or hasn't internalized yet even though she's been to onboarding in different trainings. And then the next level, right, of agent that we're always thinking about is, are we using the right coaching tools for this task? Should we be pulling in the growth framework? Should we be suggesting role playing? Should we be bringing in different parts of ICF, knowledge here? What are the validated frameworks that are going to work? And, specifically, what are, the modifications we've done to them to make them work better in an AI coaching framework. And then in addition to just, the immediate response that our system gives a user when they come talk to them. Like that chat interface that everybody has. We're also thinking, what actions, reminders, artifacts are needed here. Maybe we're proposing, right, an agenda for this one on one. Maybe we're proposing, oh, hey. We can remind you five minutes before it starts to, take a deep breath. And here's the agenda again. We can set up those actions as reminders. We can set up a post meeting debrief, and we can set up a planning session for next week so that we can continue those action items and make sure Maya is going to those meetings fresh. And then sort of going back to that security question a little bit, security is multidimensional. It's not just, what access do these agents have, but also what are they saying. Like and they're sort of, legal and compliance risks in lots of ways. An AI agent shouldn't be making certain decisions by law in certain regions. That is a sort of, baseline level of trust and safety that you want that the upstream APIs don't always, hedge to. But, what other sorts of special care do we have to take here? Is this an instance of bias we need to be careful about? Is there some solicitation of, personal information that we shouldn't do even though the agent wants it in order to help? And so while this is a very important layer that we spend a lot of time on and we work very closely with our organizations on because it is specific to field and domain. And, this is a level of trust that, is instrumental in helping Maya for her challenges. She doesn't just want advice. She wants advice that is going to help her be compliant and legal and be effective. And then these are a lot of systems. We run many, many systems in parallel when users submit a message, and all of this happens very fast in seconds. And then Maya gets her response. In this case, the agent has chosen to suggest a role play, right, based on goals in the past from the memory system. And, the coaching knowledge, we think that Maya would get the most benefit of prepping for this one zero one by sort of playing it out. And so
[43:05] Jeff Dalton: I was also just gonna add there. It's also, what was not on that screen that was also there because this is not a single message, and we'll talk next about, what that means to go beyond the journey. But, there's an underlying director. We have more agents working after the conversation than during the conversation, and that we're it's updating going. It's updating the plan for my end. It's updating the coaching plans, and it's seeing the patterns that are there and seeing if that role play was effective and updating the, all of the model, what we know about what works and what doesn't work, and be able to make that, effective end to end as part of, the intel like, kind of Nadia's intelligence.
[43:43] John Foley: Exactly. That is a deep dive right into our single message response. But, over time, we have some of Nadia's, sub agents thinking about, goals and patterns and trends and, what are the blind spots and where did coaching fail? Should the coach have called out this particular thing? Or, if there wasn't the time crunch, should we have called out something else? And so when you first go to Nadia, we have a small number of data points about you. Maybe we know your job title. We know your reports and things from your HRF system, like Workday or whatever. And so we know relatively little about Maya. We know relatively little about her team because we're respecting privacy. And we know a little bit about talent moments and how, not like, Maya's performance has gone. She is a senior manager, so, that's a baseline level of performance that we expect. And what we know about the organization is typically much higher. So we know things about the org. Again, like I mentioned, then Maya may not know herself. But then as time passes and Maya comes to Nadia for more and more tasks, maybe to write an email, maybe to prep for a meeting, maybe to debrief, maybe to, think through strategy and, whether she's spending her time in the right place or how to get more free time and more focused time, how to restore that onto her calendar. We learn more data points. We learn what Maya excels at. We learn what Maya struggles with. We learn more about her team. And we see, perhaps goal setting for the year. Or we see a midyear review, and we start to understand more about, the team multidimensionally and also Maya herself. And then by the time a year has passed, we have just thousands and thousands of data points. Messages shared, goals set, interactions, delays, role plays. We have integrations like we use granola note taking software a lot, and we have an MCP integration to that. Nadia, and it's great to debrief meetings with the actual transcript and have that grounding in truth. We have calendar integration, which helps, reveal kind of, the informal management structures that you have around the company. Where, the people you meet with are necessarily, just your org chart surrounding you, which we know on day one, but we learn over time who is important, who are the stakeholders, who are the influencers. And, the really powerful thing here is that as you invest in, a truly agentic system that has that memory, can learn, can take actions, and can grow with you, right, and changes plans based on you. That is something that, becomes really powerful. In month 12, when Maya comes with a similar concern to week one, right, Nadia has all this context of, what's worked for her in the past, the suggestions she's had, the commitments she's had, the challenges she's had for herself. And can move on from there. And, one of the things that really excites me about Nadia, right, is what this does to an organization if all of the managers, if all of the people are working towards being better with Nadia.
[46:51] Jeff Dalton: Awesome. Thanks, John. This is kind of kind of what we think here is obviously, we have this, we have Nadia for the employee as a trusted as a trusted partner, as a trusted assistant. And I think what's really exciting is the fundamental shift of, Nadia as an assistant or just as a tool. So Nadia is a thought partner. Nadia is shifting that. And what we see over time, not saying that move for ARC is maybe so it starts out as a tool. And then over time, it's as it builds trust as we learn those data points as it evolves, that shift. The fact some some users are really quick and some users take more time. Shift to, this is now a bot partner, and treating it and giving it rich information, is a fundamental exciting shift to see. And that really changes it from being here as the tool to this is now my coach. And I would actually be able to go deeply on your goals and be proactive and to challenge you and to effect change for you in a positive way. And that's what that's what really excites me is to actually see that change happen in people over time, as a result of using Nadia. And that can translate across, not just for you, but for across, the organization. Quite, quite exciting. And so it's not just not yet for me, but it's not yet for the organization. And then democratizing that, giving it to everyone, and it's a platform that the that the company can build upon. It's a multi agent system with capabilities that are growing and evolving, that are deeply integrated into your learning systems and deeply integrated into the tools that you're using to communicate and to increasingly take richer actions, with the user's involvement. And, for these bill to see and control that in a in a way that is both transparent and customizable for everyone. So it's really looking forward to seeing. We're just getting started there, and we have a lot to do, but we're really seeing how that can start to transform both people, organizations, and, the HR function.
Five Questions HR Leaders Should Ask When Evaluating AI Agents
[48:47] Das Rush: And I know we've got about, sort of ten minutes left here for questions, as folks have them. And, I think one of the things I really wanna emphasize is part of the reason that we have sort of leading AI researchers like yourselves is that the problems that have to get solved to build this sort of multi agent system for coaching are really at the frontier of AI. These questions of how do you build a system that can remember things and pick the right information the information retrieval that you all are working on in the lab fifteen years ago is now such an important question. And so I think with that in mind, and while please, others chime in the chat with anything you'd like to ask these two, I would I would love to say, like a lot of HR leaders today are having vendors come to them saying all sorts of claims about this is an agent or that is an agent. And, unfortunately, a lot of things, as we kind of pointed out at the top, don't always meet the formal definition of an agent. And maybe you don't always need, a big multi agent system. So what advice would you give to HR leaders who have all these different kind of vendors and claims coming at them? What does honest verification look like for somebody who's not technical? How do they pressure test and really see, what are the answers to those five questions? And is this the system I need for the job I'm trying to get done?
[50:15] John Foley: I think I think a big part of that, right, is just piloting. You have to use the system. You have to interact with it. You have to try things. And, you have to think through, who's my user here, and what should they care about? And if someone has great memory, that's really cool. When we were building our memory system, right, there were a lot of very exciting, rag, graph rag, fact extraction. The big providers use a variant of this. I think Jeff has referred to Claude as dreaming nowadays. And, one of the choices we made in the building of our memory system was to, preserve as much, verbatim, user content as possible to the memory system. Because, in coaching, it's not just, what you said or that you had a conflict with so and so, but, literally the exact phrasing you used in that email that you send or in that Slack message or whatever. That matters for coaching. How you said it is almost as important as, that you said it. And that's something that kind of required bespoke system design. If you're looking at something else where you're just like, a simpler system, I'm trying to think of existing things that HR folks would be evaluating. It depends on what you're going for. What is the, what's the value you're getting out of it, if a human does it more manually? And then, is this tool going to preserve that? Is it going to enhance that? And so that's sort of how I think about these things. But, of course, definitely trying it. People can promise all sorts of things. But if you don't you don't see it and touch it and use it, it's not it's not always there.
[51:56] Jeff Dalton: And people will claim, I have x y z output guardrails while or I have x y z. But what really matters is are they effective? Do they do they do they actually and having the test cases, having do you have actually defined what do you want the agent to do? I can be able to actually have your own internal kind of benchmarks that says, these are my use cases. This is how I know what good looks like. This is the capabilities that you have. So it's not even just to have a connector, but, if does it actually use it at the right time? Like, other where that tool call is working reliably or that agent call is working reliably. You don't want a really complex or screen out a complex architecture, but what you actually wanna solve is you wanna solve the problem that the user has and solve it as fast as possible. And that's having a clear library of use cases, having a clear set of abilities that you expect the agent to be able to do in those circumstances, and be able to actually test and evaluate, is this meeting my needs? Is it are the guardrails? Are all those different systems actually working effectively? And that takes a little bit kind of deeper knowledge and deeper probing. And the best way I describe that is, you have to go a little bit of an AI whisperer. You have to spend time actually using the system, defining that to understand what its edges are and knowing that, that discoverability of these AI systems is that it has those jagged edges. It's really good at some things, and then it's like it and then it forgets what day it is, right, on the on the other side because the models have date cutoffs. And so being able to understand those limitations and capabilities, are really important. I think going back going back to, what I what we're saying early definition of agents, what's the environment it's operating in? Is it running on my desktop computer? Is it running in the cloud? Is it running in a special sandbox? What does it have access to? What data does it have access to? What tools does it have access to that allow it to make sure that it can work reliably for the tasks and use cases that you intend to. And a lot of systems, fall short across those if you actually if you actually dig into those different dimensions.
[53:51] Das Rush: And I know one of the conversations we've had, and this is reminding me of it, is and I think it's an experience probably most of us on this on this call have had is three years ago when we or even four years ago, starting to use AI, in these large language models, the real limitation when we first use them or when I first use them was, I couldn't get all the things I needed into the prompt window. I wanna do this document and this document, and it wouldn't all fit. The context was so small that it could take. And now it can handle like, we can put so much into it. And the challenge I often find is, how do you get the AI to pick the right pieces of information? And so this question of memory seems like it's the underlying mechanism that's really important there, and I know that's something we've talked about. As we kinda go into our last few minutes here, I think two final questions I'd love you to leave HR leaders, with some really kind of tangible advice. One of the biggest things that came through in the chat here is this, how do you think about governing not just standing up a single working agent or even a single working multi agent system, but how do you think about governing an entire agent ecosystem so that you avoid sprawl, duplication, and kind of confusion about when to use each tool? And I think related to that, if you zoom out from the specific use cases, what do you see as kind of the most important two to three decisions that HR leaders have to make right now as they adopt AgenTek AI?
Avoiding Agentic Sprawl Across the HR Tech Stack
[55:25] Jeff Dalton: So
[55:27] John Foley: No. Let's go inside. Yeah. I think AgenTic sprawl is definitely, an issue. It's an issue we've had going back a long time. I can't tell you how many times I've locked logged into, an HR landing page, and you have all these systems and all these tools. And tech companies have a habit of naming things in ways where you have no idea what they are, right, until you click through and figure it out. So I think, it's a new problem, but it's also not fully a new problem. And I think, a really tough question to ask is, is this a new agentic tool that's delivering significant value over, a standard LOM harness. Is this something people can do in that way or not? And I think there's different levels of, comparative advantage and different levels of, uniqueness of competencies, in different agentic tools. And so the question is, are these new tools bringing in something that's, truly valuable, or is there ways to, have training have a little bit more unified system? And for us and for Nadia, right, a big part of, what Nadia brings to other agentic systems don't is, the memory system, the bias and trust, and specifically, that sort of, safe space for employees that are, separate from your other, tools. I think those are my that's sort of, my first reaction to that. I think you asked two questions, but I forgot the point there.
[57:01] Jeff Dalton: I'm a go to the top. I'll try to be concise. I would say the first thing is don't put barriers to put people using the tools. What we see is people put it they bury it behind they bury it behind the work day. They bury it behind deep systems, existing systems that add a lot of friction in order for users to be able to get there. We put working to, like get to getting to getting to Nadia, getting to that whatever system should be easy and simple and, reduce those barriers for people using those tools. Put them in a centralized place that people can see and have documentation for, this is where we're gonna go for this. It's really basics, but a lot of organizations are just not getting some of those basics kind of right in terms of what that looks like. Give people access to the different tools, but also don't just give them access. Monitor how they use it. Are you tracking how many people use this every day? What are they using it for? And have actually having that per that those logs and visibility into what people are doing and what people are doing with agents. Are they using it? How often are they using it? Are they coming back? Having that instrumentation built into their flow of work, is really is really important to be able to track and understand what patterns people are using different agents for.
[58:12] John Foley: Because we're gonna have we're going
[58:12] Jeff Dalton: To have lots of different agents in different systems. So the important thing is to understand how they're being used, when they're being used, and if, and what systems are most useful for people to be able to gather that feedback in a way that's not just another survey.
Closing Thoughts
[58:27] Das Rush: Great. Wonderful. Thank you so much, Jeff and John. And I wanna say thank you to everybody who joined us here today. We hope that everybody is leaving with, a little bit more of an understanding of agents and what they are, and very specifically, how you then can use and make really, informed decisions about the agents and the agentic design that you're bringing to your organization. So, thank you everybody, and we'll see you, hopefully, at
[58:54] Jeff Dalton: Our next one. Thanks, Jeff and John.
[58:58] John Foley: Thank you.
Two of Valence’s AI researchers, Chief AI Scientist Jeff Dalton and Director of Applied AI John Foley, sit down with Head of Content Das Rush to demystify AI agents for HR leaders. The session traces the history of agents from Shakey the Robot to modern multi-agent systems, defines the difference between a chatbot, an agent, and a coordinated agent harness, and unpacks how Nadia is architected as a multi-agent AI coaching system. The conversation closes with concrete guidance on how HR leaders can pressure-test vendor claims and avoid agentic sprawl in their HR tech stack.
Key Takeaways
- AI agents are not a new idea, but LLMs changed what they can do: Research on agents stretches back to the 1972 Shakey robot and the reinforcement-learning agents of the following decades. Large language models did not invent the agent paradigm, they unlocked a new substrate: agents that understand natural language, reason flexibly, and adapt to open-ended environments.
- A chatbot, an agent, and a multi-agent system are not the same thing: A chatbot is a customized layer on top of a foundation model. An agent operates in an environment, uses tools, and acts on a plan. A multi-agent system coordinates specialized agents through a harness that orchestrates which agent handles which task. The vendor claim that something is an "agent" should be tested against this definition.
- The most common enterprise mistake is thinking too small: Jeff and John see organizations starting with small, in-house models or trying to automate a single existing pipeline like a survey. The higher-leverage question is whether AI can replace the workflow entirely. Can the agent read the documents, observe the environment, and skip the survey step altogether?
- Nadia is architected as a coordinated multi-agent system, not a single chatbot: Inside Nadia, sub-agents are constantly working in the background: planning the conversation, tracking goals, watching for blind spots, deciding which underlying model is best suited to each moment. More agents run between conversations than during them, continuously updating the plan for the user.
- Pressure-test vendor claims by piloting the actual system: Anyone can claim guardrails, memory, and agentic architecture on a slide. What matters is whether those capabilities work in practice. Define your use cases, build internal benchmarks for what good looks like, and have end users actually use the system. Being an "AI whisperer" means spending time with the tool to find its jagged edges.
- Agentic sprawl is a real risk for HR leaders: As agents proliferate across the HR stack, the failure mode is friction: tools buried behind Workday, behind legacy systems, behind layers that make users give up. The guidance is to remove barriers to adoption, understand how agents are actually being used, and gather feedback through the work itself rather than through yet another survey.
Contact Us

.png)