May 19, 2026 at 12:00 PM ET
An Agentic Research Administration Case Study
Please fill out this form to access the recorded webinar.
Julie Swaringim-Griffin built an AI agent for her sponsored programs team. It got the deadlines wrong. The pilot collapsed. The team came out stronger for it. Here is what happened, and why it matters for any research office building with AI right now.
Most AI case studies you read end with a tidy ROI number. This one does not. When Julie Swaringim-Griffin, Assistant Vice President for Central Sponsored Programs Administration at Oklahoma State University, gathered her pilot team three months in to review the AI agent she had spent weeks building, the verdict was almost unanimous: stop using it. What she learned in that conference room turned out to be more valuable than the tool itself.
This is the story she shared on our recent webinar, "Building AI, Building Trust." Two agents. One failure. One success. And a set of lessons that apply whether you are at a flagship research university or a community college just starting to think about what AI means for your office.
OSU runs a decentralized model. Six colleges each have their own sponsored programs office, plus a library office, a wellness office, and a central team that supports everyone else. Nine offices, nine sets of workflows, one institution. Any tool that is going to work has to work across all of them.
Julie, who has spent 11 years on both the proposal-writing and administration sides of research, started where many research administrators do: with the tools already paid for. In OSU's case, that meant Microsoft Copilot.
Her goal was modest and practical. She wanted an agent that could read a federal funding solicitation and produce two things: a checklist of required documents, and a quick summary of anything unusual the team should flag for faculty. The kind of work that takes a pre-award specialist 30 to 60 minutes per opportunity, repeated across hundreds of solicitations a year.
Platform: Microsoft Copilot · Pilot size: 9 people, 3 months · Outcome: Decommissioned
The agent worked. It produced the checklist. It flagged sticky requirements. It even drafted a templated email to send to faculty with internal routing timelines. By Julie's own assessment, the bones were good.
Then the deadlines started coming back wrong.
In one telling example, the agent confidently reported the NSF CAREER deadline as July 23rd. The actual deadline was July 22nd. A single day. In any other context, a rounding error. In pre-award, where a missed deadline can mean a year of lost faculty time, a single day is everything.
"Something as easy as a due date, that should be an easy, obtainable piece of information, but it was getting it wrong. That decreased confidence in the tool altogether." — Julie Swaringim-Griffin
Julie did what any reasonable pilot lead would do. She called the team together, put the agent's configuration up on the conference room screen, and started editing the prompt to fix it. And that is when someone on the team interrupted her.
"Julie. Let's not."
The team had lost confidence in the tool. They did not want it tweaked. They wanted to scrap it and try a different approach. As Julie put it on the call, her first reaction was heartbreak: she had championed this, built it, recruited the pilot team. She felt she had failed them.
Then she looked around the room.
Eight people from eight different colleges, each with their own workflows and their own incentives, telling her, the central office, to stop. Without softening it. Without trying to make her feel better. Without saying "let's keep going, you worked so hard."
"It wasn't an AI agent that worked, but it did show us that we trust one another, that we can say the real thing that needs to be said, without fear of repercussion. In that moment, the AI agent did fail, but the team really didn't." — Julie Swaringim-Griffin
This is the part of the story that gets glossed over in most AI-in-research conversations, and it is the part that should not. The hardest thing about deploying AI inside a research office is not technical. It is whether your people will tell you the truth about whether it is working. If they will not, you will end up with a tool that everyone says they use and no one actually does, which is worse than no tool at all.
The second agent Julie built, named Monroe after the road that cuts through OSU's Stillwater campus, took a very different path. And that difference is instructive.
Platform: Microsoft Copilot · Users: 2 contract reviewers · Outcome: In active production, V2 underway
Contracts are higher stakes than solicitations. Solicitations are public. Terms and conditions on a specific award are not. Before writing a single prompt, Julie went and got approval from IT, legal counsel, her direct supervisor, and the VP for Research. That alone took weeks. It was the right call.
Then she built Monroe with three deliberate design choices that made the difference:
Define the red flags explicitly. Julie did not assume the model knew what unacceptable contract language looked like for OSU. She fed it the institution's policies, the relevant Code of Federal Regulations, and a written list of language the institution cannot agree to.
Pin the output format. If you have built with LLMs, you know the output drifts. Julie uploaded a sample Word document showing exactly what the response should look like: a table with the red flag, where it appears in the contract, and what the issue is. Every time. No improvisation.
Stress-test before you ship. She wrote three fake contracts laced with deliberately glaring red flag language and ran them through. Monroe caught every one. Then she ran real, previously-reviewed contracts and compared the output to her team's prior assessments. The agent matched.
Once it passed those tests, she rolled it out to her two contract reviewers in the simplest possible way. She runs the agent herself when she assigns a contract, and hands them the output as a head start. They can scan the table, see how many red flags they are walking into, and budget their time accordingly. The agent is doing pre-reading. The humans are still doing the review.
Monroe is now in its second version. Julie is teaching it to suggest the alternative language OSU prefers when a red flag appears, so it can help with negotiation, not just identification.
Start with low-stakes data. Public solicitations are a safer first test than contracts or proprietary documents. Build your team's instincts on something where a mistake costs nothing.
Pin your output before you scale. Give the agent a sample of exactly what a good answer looks like. Format drift kills trust faster than hallucinations do, because it makes the tool feel unreliable even when it is right.
Build the human pilot, not just the agent. A diverse pilot team across units catches problems a single power user never will. Set the expectation, in writing, that honest feedback is the deliverable.
Tell the model what NOT to do. One of the sharpest tips from the webinar's Q&A: instruct your agent explicitly that if it is unsure of a specific fact like a date, it should say so rather than guess. Negative prompting is underused.
The agent is the tool. The team is the work. If your office is genuinely safe enough that people will tell you a tool is not working, you have something more valuable than any agent: a culture that can adopt the next ten you build.
For all the lessons in this story, Julie was candid that she has not solved the deadline problem yet. She has upgraded her Copilot license. She has tried instructing the model to look up deadlines directly. She has tried telling it not to include a deadline if it is uncertain. It still hallucinates, occasionally. This is honest, and it is worth saying.
If you are working on this same problem in your own office, or have solved it, Julie would genuinely like to hear from you. She is on LinkedIn, and she takes the door-knocking joke about half-seriously.
At Atom, we work with research development and pre-award teams at universities, research hospitals, and independent research organizations. The question we hear most often right now is not "should we be using AI." It is "how do we use it without breaking the trust we have spent years building with our faculty and our staff." Julie's story is the best answer we have heard to that question. We are grateful she shared it.
Up next: Our next webinar features Chancellor Johnny and the VP of Research at Montana Tech on positioning a small institution against federal priorities, and how it opened international doors.