Aired:
July 24, 2025
Category:
Podcast

Teaching the Biopharma Workforce to Coexist with AI

In This Episode

In this thought-provoking episode of the Life Sciences DNA Podcast, host Nagaraja Srivatsan speaks with Sunil Talathi—Senior Director and Head of Data Insights, Data Sciences, and Analytics at BeOne Medicines. Together, they explore how AI is revolutionizing the biopharma workforce and the challenges that come with it. The conversation uncovers the role of AI in empowering workers, improving efficiency, and fostering collaboration, rather than replacing human expertise. Sunil shares his insights into the importance of upskilling the workforce to coexist with AI and the strategies required to ensure seamless integration in the biopharma industry.

Episode highlights
  • Predictability Over Risk Aversion: Sunil emphasizes that the biopharma industry isn't avoiding risks, but rather aiming for predictability in outcomes. AI is a tool that drives precision and consistency in operations, rather than disruption.
  • AI as a Teammate: Sunil talks about AI as a “digital teammate” in the workforce—working alongside humans to augment capabilities and decision-making. He shares the importance of treating AI like a co-worker, ensuring it's well-integrated into daily operations.
  • Humanizing AI: Learn how biopharma organizations are leveraging familiar interfaces and personalized digital agents, such as AI systems with designated roles, to ease the integration of AI into their workflows.
  • Change Management in AI Adoption: Sunil discusses the importance of starting with grassroots-level experiments and pilot projects within organizations, which can often provide more value than the final AI tools themselves.
  • Preparing the Workforce for AI: Hear why it’s crucial to upskill employees and create a culture where human expertise and AI work hand-in-hand to drive innovation in the biopharma space.
  • The Experimental Imperative: Sunil urges organizations to encourage hands-on experimentation with AI across teams. This approach enables future-ready businesses and fosters deeper understanding and trust in AI technologies.
  • AI in Clinical Trials: Sunil shares examples of how patient-centric organizations in biopharma are leading the way in AI-driven clinical trials and offering scalable models for AI adoption, particularly in rare disease research.

Transcript

Daniel Levine

The Life Sciences DNA podcast is sponsored by Agilisium Labs, a collaborative space where Agilisium works with its clients to co-develop and incubate POCs, products, and solutions. To learn how Agilisium Labs can use the power of its generative AI for life sciences analytics, visit them at labs.agilicium.com. We've got Sunil Talathi on the show today. Who is Sunil?  

Nagaraja Srivatsan

Danny Sunil is a senior director and head of data insights, data sciences, and analytics for BeOne Medicines, recently rebranded from BeiGene. Sunil is a visionary leader, and he's an AI strategist, technology expert, and an innovator with extensive experience in data science, analytics, clinical systems, operations, risk-based monitoring, and clinical data management across several life sciences companies. I'm really excited to have him on the show.  

Daniel Levine

What are you hoping to discuss with Sunil today?  

Nagaraja Srivatsan

I think Sunil is a practical and pragmatic problem solver. He's driven automation use cases, which can be taken into the enterprises and go from POCs to scale. So, I really want to hear his journey on those use cases and what he faced and how he addressed them and what was the success metrics in delivering them.

Daniel Levine

Before we begin, I want to remind our audience they can stay up on the latest episodes of Life Sciences DNA by hitting the subscribe button. If you enjoyed the content, be sure to hit the like button and let us know your thoughts in the comments section. With that, let's welcome Sunil to the show.

Nagaraja Srivatsan

Hey Sunil, so good to have you on the show. Really excited to talk about some good AI use cases. Sunil, why don't we jump right in, share with us what your journey was with AI and with some very good examples.  

Sunil Talathi

Sounds good, Srivatsan. Again, first of all, thank you so much for having me on the podcast. I've been following you on the podcast. I'm very excited to be part of one right now. I'll start with a ⁓ journey before generative AI. I think it's important to know the challenges that we had before generative AI. Back in 2016, we were working on protocol authoring, right? And protocol amendment was one of the biggest pain points we had. One of the goals for us was, how can you minimize protocol amendment? And those days, we're talking about natural language processing, NLP. And those days, I know, NLP, it was, again, was early days of NLP for us, trying to mine protocol, different protocol across studies, different protocol templates. We were there, but we were not successful enough to say, hey, have you got some insights on that, right, from being all 100, 200 protocols, right? In those days, we were also trying to build an AI chatbot. This is before generative AI. And imagine, and I know the nightmare that we had to go through to build the chatbot, because when you keep the chatbot much more open the way we have for GPTs today, right, you ask any question, you get the answer. But before generative AI, you have to train your model. You have to train your data. And with the NLP, with the variation of question that's coming your way. So when we kept it open, we realized our failure rate was way too high. So we quickly pivoted that, let's not keep it open-ended. Let's pre-define your prompts. So we kind of reduced our failure rate. again, including us, not many were quite successful in building chatbots, which looked very futuristic that point of time, again, before generative AI. Those were the days we lived in where it took a lot to build a chatbot that we talk about today every now and then. It was like a nightmare to build a chatbot that could make it efficient, right? But today, the two cases that I want to focus on is much more mature cases. One of the cases we're talking about is safety use cases, which I'm much more involved in, one of the first AI use case for us. And we chose to go safety, which not many people will go that way, Patient safety, people stay away from that, we don't trust AI, I don't know how to react to that, but we choose for some reason that, right? Because it all depends on how leadership looks up to that use case, right? We started with a journey where the problem statement was, you know, patient narrative, And patient narrative could be a page long or a two page long, right? We look at, read that, you get lost into that, right? And some way you have to diagnose what's events were, what was the start date, what was the end date, what was the patient going through? I was wondering, my God, I'm glad I'm not the reviewer of that because they can dare for me to review. Imagine if you're working on a phase three trial, you could have 50,000 cases, right? Or 20,000 cases. And imagine reviewing that on end of the study. It's going to be nightmare. This is where we said, hey, let's look at that point. We already knew that generative AI was good on summarizing data. So we started the journey. Let's summarize patient data quickly. We were surprised. They did a good job of summarizing. We were able to summarize data, provide key insight. Structured that into a format that much more user friendly, right? But very quickly we realized, my God, let's see if AI can interpret data better and safety interpretation we're talking about. So we went on, let's see if AI can tell me, just looking at the interpretation, right? Looking at the data, tell me what your feeling is about the AI grading. And we were like, let me do that. And we were not expecting AI to be anywhere close to what we were thinking about, right? We were just testing out. To our surprise, AI actually challenged a lot of, you know, it's going to be controversial when I say that, but it challenged a lot of our PI assessment, right? When people say it's a grade two, AI said it's a grade four. And we were like, why would AI say grade four when PI really says grade two? And it had a rationale behind that, the chain of thoughts. What led to that grade four grading, right? So it looked at the narrative and said, hey, there are some labs, your ALT and XT, are way too elevated. If you have a lab that elevated, there is no way a patient could be grade two, it has to be grade four. And we were really surprised and bogged down with the analysis that has provided them. And that slowly started building us trust on AI. And guess what? Today, the journey is we are using AI not for summarization, but for interpretation.  

Nagaraja Srivatsan

Interpretation. We're talking about- Yes, Sunil, let's peel that use case. You told us the final scene, AI is working very well and it's giving you that interpretation. You talked a little bit about chain of thought. You talked about some reliability and transparency of its output. But that's a journey because as you started that, there were skeptics. The PIs would not trust the AI. There was hallucination issues in that. So walk me through how...How did you go step by step to this final stage? Because the final stage is amazing, right? You're doing safety interpretation. It's reviewing it. It's as good a peer as what a PI is, if not sometimes better, but it's a good sounding board. It's a true co-pilot, but it didn't become a co-pilot. So it walked me through, when it was a trainee, how did you make it from a trainee to a real co-pilot? That must have been quite the journey of what are the coursework you had to do to make the trainee a co-pilot.  

Sunil Talathi

So, let's walk through that journey. There was a lot of nervous imagery, right? We are using AI on safety cases and not learn up trust. Even today, lot of people don't trust, right? But the way we started up is, you know, let's put this at the parallel solution, right? On what you are doing and let's use this system or an application as a referential tool, right? And then see what AI says and see what our safety science team says. And when we did the comparison, that's where the trust within the safety science team came in, right? So, we actually did not release the tool in production. We said, this is going to be in a qualified environment for you to utilize as a referential tool. We got the endorsement, not use it for production, but just as a referential. The moment our safety science team, which is a big team right now, started using this for a few studies, very quickly the confidence came in. And there were two purposes. One is to get them confident from a business user perspective. Secondly, from a developer perspective, I was getting confident because we can do all the testing that we need to do to get the confidence. But unless you have a real production verification, you never feel confident to release your application. So, for me, that sealed the deal that, hey, our user got confident with the AI analysis, and I was confident that our model was working to the expectation. That checked the box that would take you to the production stage.

Nagaraja Srivatsan

So tell me, Sunil, were you using a few short examples with chat GPT, or were building your own SLM model in the McKinsey, Maker, Shaper, Taker, whether you were in that continuum? And then also talk about how did you be able to land it in that review workflow? Because at the end of the day, people don't want to go into 10 systems and stuff. So walk me through that. What is the recipe for this? What are the core ingredients? And then also tell me how those ingredients were put together as a finished product, which was in line with the workflow of what the safety reviewers were doing.  

Sunil Talathi

When we started up, again, we didn't expect a lot from AI, right? Because we didn't know the power at that point of the time. We knew it does good summarization, but we didn't expect AI to tell me the grading, the CTCAE grading, which is very domain specific. So we got to use RAG at the very early stage, right? Providing the knowledge base of RAG this is our CTCAE grading, this is what we want - leverage its RAG and make the analysis, whether the grading is from one to four or one to five, right? But as the models became a little more evolving, they became much more powerful. And again, I will tell you just talking about the GPT, the foundation model, even a year back was strong enough to do the analysis. Today it's on a different scale, right? We're talking about nano, small models, nano models today, but I'm talking about 3.5 version And when we were utilizing that point, we have now replaced that. But 3.5 was actually good enough to do the interpretation. So we didn't have to, we were thinking about maybe we have to do a refining of the model, use much more RAG application, use much more knowledge base. But we, from our production prototype, we actually removed all the knowledge base because here it's not required because GPT by itself is making a very accurate assessment. And the RACL SLL was around 80 to 90%.  

Nagaraja Srivatsan

Basically, you're are you saying you removed RAG completely and that you just went with a prompt and a few short examples and chain of thought to go directly to these GPT models to get it out? That's amazing, right? That means the classic 3.5 model was I'll build a RAG with my knowledge, use the vector to get to the zone of influence and then go to the GPT, get the answer, and then formulate it into a good response for the user. Now what you're saying is you're cutting off all of that because the inherent model itself is now possessing that gradation knowledge, which is as accurate as what you would have trained it from all the body of your data. That's amazing.  

Sunil Talathi

And again, we were surprised, right? Because we're thinking about, this is foundation model. It will do a foundational stuff, nothing medical stuff, right? We thought we may have to go to the med labs of the world to go and use that interpretation, right? But the real thought, yeah, foundation model is strong enough. So we dropped all the knowledge base, which made it efficient for us to maintain a few short examples. And again, truly prompt engineering, right? Providing a few short examples really helped us to get what we really wanted to accomplish, right? And yeah, so that was basically how it started, how it ended up. Yeah.  

Nagaraja Srivatsan

So walk me through that journey, right? I mean, where your landing is at, very good place, right? So as you say, hey, the market is that these large language models actually have capability to interpret some core domain terms in medicine. But with that said, the efficiencies of the prompt is what is going to lead you to the efficiencies of the results. So did you go outsourcing for prompts? How did you train your team on prompts? Is that a playbook you can share, which we could all use on how you should be giving those few short guideline examples? Walk me through that prompt, because that's a new skill everybody has to learn. So how did you go about building that skill?  

Sunil Talathi

Yeah, I think it was self-learning, honestly, Srivatasan , again. We normally do a build and buy approach to see where can we build and where can we buy. And mostly we end up being a build org, especially on the requirements. So it became a build, and again, ideation, from ideation to prototype to production, right? So imagine this was something that, let's try. And again, all we needed is an advanced Python programmer. Let me start over there. If you are a Python programmer, you can easily call the API. And then from there on, you're now learning frameworks of prompt engineering, right? Whether it's the RISE scheme, but now there is a lot of framework, right? Where you're looking at response, you're looking at interpretation, you're looking at context, you're at examples also. Who meant you write your prompt engineering following a certain framework? You have better results. I think we were very clear. So it didn't take a lot for my data scientist to pick up, who was already a Python advanced programmer, and asking them to call the foundation models, use the RISE framework and implement that, right? So it didn't take a lot, of course, there's a lot of pivoting around to get the accuracy level, but from a upscaling from a resource perspective, if you have Python foundation, I found it was a little more easier than people who are not from Python background.  

Nagaraja Srivatsan

Now that's a wonderful way, because everybody's trying to build these type of prompt capabilities. And as you have democratized that, I love it what you said, hey, let's pick a framework. There's so many different frameworks available. So you pick the RISE framework. Tell a little bit about what the RISE framework does so that people know what is the components of it and what happens in a few short examples for doing that stuff.  

Sunil Talathi

RISE actually spits out on what is the context that you can provide. What are the example? E stands for examples, C stands for context.  I have to see what the R stands for, but the framework actually said how to write your prompt in a more systematic fashion, Where we say context is given, your examples are given. So when you follow the framework, how to write the prompts, then you're doing a better job, right? Rather than just writing the prompt without any context, without any interpretation, without any examples, you're never going to go anywhere with that, right? And we started with that. We started without RISE and we started with RISE and we saw the difference between it actually.  

Nagaraja Srivatsan

And you're saying a very important example, which I want to just highlight, right? Hey, pick up framework. doesn't matter which one. Keep it consistent. Build a library of these things, and then as the outputs come, keep refining it. That is one section which you said. The second is to improve the efficiencies of these things is how many few-shot examples you're giving them, right? And the more you can give, the better it is for it to learn and discern and do that. And the third you said is that you could do all of this, but don't do prompt in the chat window with GPT and others. Create an API framework so that then it becomes a part of the workflow rather than just you're using chat GPT and then control C, control V into another place. That doesn't scale. As you build an API infrastructure, the API is being called by the workflow to say, tell me what the interpretation is. And the output of the interpretation goes into the workflow. Is that a fair way to say what you built?  

Sunil Talathi

One of the secret sauce also for us is we didn't go and write Python scripting directly, right? Write a script, call the API, and create the RISE team work, right? Follow the process. What we normally do is we actually play with GPT, straight away, chat GPT. We actually test our prompts over there, right? With the RISE team work, before you even write Python script, we actually write a whole prompt in chat GPT and see the assessment of that prompt. Only when you're fair, confident, the response is coming much more friendly. That's where we'd go and now write the Python script to call the API, right? So this is basically the foundational thing that we built up here. Let's not go and write the Python scripting because that's a rework by itself, right? But you are better off using an interface to adjust your prompts, to interpret your prompts much better. See where you feel, now it's fair enough to get the good results. Now let me use a Python script to call the API.  

Nagaraja Talathi

So Sunil, one of the topics everybody has in mind is as you start to write prompts in chat GPT and put it into API, cost is an issue. Token costs, I know, are coming down by 80%, but still, there's an input context windows, the output windows. So walk me through, one is the quality of output, right? But is the cost of ownership, is that a key consideration? How did you go about measuring and metrics, how much it costs? How are you doing from a cost standpoint? Walk me through that journey. that a big deal, did you do an evaluation of that? What are the different cost components in driving this?  

Sunil Talathi

We did that, because there was a perception that it's going to be expensive to use LLMs, to use LLM to the extent that we're not talking about. And specifically, we're talking about a patient who are, we're talking about 40,000, 50,000 cases. In each case, this interpretation will cost us. And so the perception that it's going to be expensive to run a model, first of all, and then maintaining it. So we quickly went for evaluation, right? How much cost are we spending today to do this work? And we were engaging our external partners at FSP to do a lot of work also around that. So we said, hey, this is spend that we have. And we say, hey, let's see the cost of evaluating this, cost of maintaining this. We give you that was actually 1 /10th of the cost.  

Nagaraja Srivatsan

Okay. 1 /10th of the human labor costs. That's what you're saying. If I paid an FTA X then the interpretation of a PI work of that is one-tenth of what that cost would be. Correct. Okay. But inherently on this one-tenth of the cost, are there certain components of the cost which is higher like, hey, compute is higher or my token cost. Is that something from a cost perspective you're monitoring it such that you're not having suddenly a big fat bill show up in your door from Azure or Amazon or Google or whoever you're using?

Sunil Talathi

Yeah, I think this is where we were smart enough to know that how many cases are we going to process annually, right? So we knew  historically from the last two years how many cases we process. And we also estimated how many cases we will have. So of course, it buffered up, right? So we say here we were expecting 10,000 cases this year. We planned for 15,000 cases and say, would that cost still will be acceptable for the function, right? And that cost was still acceptable, right? We knew it is not going to be over because we said it's going to be averaging 10K cases. We planned for 15K because it could never be 20K because that's not good for the company. So we kind of did a buffering. We knew what some marks to expect and we did the costing based on that. So we knew that we never going to be reaching that state.  

Nagaraja Srivatsan

Okay. Fair enough, so now one is the cost part of it. The other is everybody is very concerned about this PI journey. You said you made them reviewers and all of that. Always, even when you have a champion challenger, where the champion is the PI and the challenger is the AI, how did you do from a change management that now it's a co-pilot? Did you incent the PIs in a different way or did you land the co-pilot at a workflow or every savings they did or did you name the co-pilot to be a little Sunil so that they all feel comfortable using it? What strategies do you do to make this adoption happen?  Change is the biggest challenge in adoption of AI, correct?

Sunil Talathi

Correct. Again, this tool was much more an internal tool for safety science team, right? So when they look at the safety cases and based on whatever the recommendation is, they work with the PIs. It was not, it was exposed to a PI, but it was exposed to the safety science team who looks at the safety cases after you. So, you know, for me, I always felt that there's never a bottom up approach. It's top down approach, right? Your leader is already on board that this is going to create a value. So your leadership mindset, it was there, your leader already felt a promise of seeing the product, seeing the value. And when your VP of the org is using the product and acknowledging the usage, then your team tends to follow you. I feel that's an important for any org, right? If your leader doesn't believe in it, your team is not going to look into it, right? But when your leader believes in it and is applying and utilizing for safety reviews and imagine he's utilizing it with my R &D head and he's providing the response to that, right? So when your leader is utilizing it, and I would, he's one of the most frequent users of the product. We look at the usage and he's one of them. then you know, they talk about the habit, right? The good habits tend to build down, right? And so it was initially, they did not trust it, to start with. It was not easy for them to trust. But that's where, when we did the pilot for a few studies, they got the confidence in. And again, adoption was 90 % on day one. And guess what? One year down the road, the adoption still rounds around, revolves around 80 to 90%. So we just have a higher adoption.  

Nagaraji Srivatsan

It's an unbelievable adoption rate. you hit upon it. I think leadership, the tone on the top really drives what needs to be done. You lead by example, and then people follow. But you also have made it easy and frictionless for them through this API and stuff that they can quickly get that. Almost like it's a cheat sheet, right? They do their review and then they do the cheat sheet and say, so, okay, I did a three and a four. Then they in their mind have the reasons and then they debate against the AI and then say, okay, I agree or disagree, but it's that constant collaboration between the two. Then, so the one last part, which I wanted to probe in this case, tell you before we could go to the second one, is how do you give feedback to the AI? What mechanism did you build? Because everybody talks about reinforced learning. Did those things go into a few short examples again, each time when AI said four and they said three and three was the right thing? Was there an override function and a feedback? How did you fill that human in the loop? Because what you're doing is human in the loop there. The human is now learning from AI, but you're also saying, what's the mechanism for the AI to learn from the human?  

Sunil Talathi

Before I get to the questions, the previous question, I want to just add two things over there. One, I think, the moment we talk about AI, we tend to create AI product, right? I feel that mindset is not right. AI is not a product, it's a capability, right? So one of the things we did is we want to integrate the capabilities in the existing product. I think it's an important thing to mention that they were using the product like whether it's a UI product, so we integrate the kept keeplets in it. So it was much easier for them to utilize because they were already using something, right? Now capabilities comes in. Secondly, it's about how you build your interface around it, right? Imagine you give them the raw columns and you give the AI interpreted columns; both are sitting side by side. So they can look at your raw columns, what coming from the raw data, and you see the AI interpretation, right? When you see side by side, they can easily compare. So for me, it's also about how you're building your interface around it. So two key points for me was, it's not about building a separate AI product, then for me, it's a different problem. I have two products to go through. But we always had a strategy that integrate AI into the existing product that worked for us. And secondly, is build a UI that's much more, you we talk about UI UX, right? Your user experience is always going to be the number one, right? They should never look like, this was AI. They should feel like it's part of the process itself. No, I mean, that UI UX in the new world of AI is so important. And I actually talk a lot about not having two siloed products. And thanks for bringing that up because...People tend to say, hey, this was in the AI product and this is my workflow. then they're doing, the human is comparing it. And what you're saying is as we give it, making it easier from an experience standpoint to compare, contrast. And now that question on feedback. How did the human give feedback back to the AI? So when we launched the products, we didn't have like a thumbs up, thumbs down kind of button where we can automatically say in these cases we're not right, right? So it was much more real time feedback with the safety science team. So we knew it was important product, right? So we engaged with the safety science to provide feedback on a weekly basis, at least for the first few weeks, right? Even after the pilot run, we went live with all 40 odd studies on day one, we actually were working with the safety science team and we got real time feedback, this assessment of AI is not correct, right? And we took that real feedback from real users saying that why this is not right. And this is where we had to do, again, we did some...on the fly, but fixing that, hey, again, a few more examples to make it much more concrete. So you're prompting fine tuning was there real time while we are working with the users. But now it's a little more sophisticated. We are getting that feedback. Which one is really a right assessment, which one is not? The thumbs up, thumbs up button really helping us right now in that font. But think the initial days, our safety signs really say, hey, this is not giving me right context. This is giving me extra information. This is something that I would not expect. So the way we manage is real-time feedback, real-time examples, and real-time fine-tuning of the prompt really after that.  

Nagaraja Srivatsan

It's funny you said that, hey, how do you debug a prompt, right? And the debugging a prompt is nothing more than going and giving more examples on it, right? It's counter to coding where you have to fix the code and then test it and doing it. Again, you have to test it, of course, but you're then giving more examples. It's almost like if I don't understand what you're saying, then you give me more examples for me to make to better understand it. Sunil, this is a good case to explore. I know for sure you have other cases to talk through. Walk me through another one where the same journey before and after and how you went through that.  

Sunil Talathi

There are two other cases, but I'll speak about one right now, which I was recently coming from another conference. Risk-based quality management, RBQM is a big space, a lot of investment from the industry. One of the key piece in RBQM is critical to quality factors. You read the protocol and you say, let me see from the protocol which of these parts are critical to quality, right? And then you have your RBQM lead or clinical operations team looks at the protocol day in, day out, and say, for my study, these are the 40 critical factors, right? And we, of course, we have critical to quality factors. You can imagine that if you have your RBQM lead, no matter how smart, how efficient it is, every RBQM lead will, if you give them two same protocol, they will come with a probably slightly different CTQ factor because it's much more subjective, right? You look at the library, say, hey, for me, they'll come up with this 10 unique one, different RBQM lead for the same protocol. They may give it with probably 11 CTQs. So, for me, there were two things right now. One is the pain on going to the protocol, of course we have to go through that, but I'll define CTQ factors, which was in the library. It's a very manual work that the team did. And secondly, inconsistency around it. We thought, classic case. Let's try it out. And it was very easy when you say, we said, let's look at a simple system. We're asking AI, look at the protocol, read the protocol - much easier than what we used to do in the past through NLP. And then say, hey, we had a CTQ library - say, user drag approach, knowledge base. We said, hey, look at your CTQ library, and look at the protocol, and tell me which one of those are relevant for this protocol. And guess what? It did the job. Not for one study across all studies, very quickly and very easily. so critical to quality factors, which was done manually reading the protocol by every study, which has RBKM, and was very painful in the process, very, very inconsistent, to error as well. This was a classic case where we said, let's do risk identification. And again, this is a very good case for AI right now. Let's use AI for risk identification. That's why we're using AI right now.  

Nagaraja Srivatsan

No, this is a great example because I've worked with several places including our own where we read the protocol and then really enhance lots of value from the protocol. This seems to be very much in that realm of giving a few short examples. You have 40 critical factors or a library of 100. You tell AI that this is the universe of things, and then it can do it. And it can do it much more faster because it doesn't have to think, and it can do it for 40 protocols and 50. And then you can go back and reassess what the  protocols person did before. So this is a great use case. Are you now live in this, or you're still in the build phase?  

Sunil Talathi

The product is not yet live because the product is part of another big product right now. So there is a holding, but I think by end of August, the official production line will happen but we have tested this for at least seven studies right now. We got the initial assessment from the AbbVie lead on this assessment. So it's a ready product, but we think because it is part of the bigger product release actually.  

Nagaraja Srivatsan

So Sunil, let's apply some of the lessons we, you talked about in the previous use case. So in this journey, it seemed like a very obvious based on what you did for safety narrative, you know that, but were there any twists and turns, like anything, no movie is the same as the other? Was that a little gotcha which you didn't think would happen? And how did you go about solving for that?  

Sunil Talathi

This time we were much more prepared, because from the learning that we took on, so we were much better prepared. We knew what to expect. We were much more clear how to write the prompts also. So there was an efficiency right from day one for us, right from the learnings that we have. So the only thing that keeps on coming is, of course, the user trust. How do we address that? Again, we probably use the same framework that we did in the past. But secondly, it's always about a lot of our systems are GXP, regulated GXP systems. It takes a lot of to qualify for GXP. So, it's always going to be working with compliance and quality, how are you going to use the application, right? Because of course, GXP rigor needs a whole nine yards of validation to ensure it's utilized. So, it's always about what you feel about the application, how it's going to be utilized and how compliance feels about it. So that journey is always a little sticky point, right?  

Nagaraja Srivatsan

Let's explore that, Sunil, because that's a very important thing which people can benefit from your way, right? So the previous example is also GXP, but this one is smack on in GXP because you're making a quality assessment and risk-based quality then defines which sites you'll monitor and who you send for more monitoring. It has a whole bunch of impact around that quality measure, right? The higher the quality measure, the more people because there's issues. What is the validation strategy for LLM and a prompt? Because you can't go and validate a chat GPT.  Tell me, what do we do? How do you go about building a use case? How do you prove auditability? How do you prove in the CS GCP model how, as you said, the auditors and the regulators will be comfortable that you're doing everything by GXP? What we do that process?  

Sunil Talathi

Yeah, I think it's important for everyone to know that a business user or development team don't have the addition around that, right? You can only provide your business case, how do you plan to use AI? Normally the process is computer system validation. You capture all the information of the tool, how the tool will be applied and what the tool is going to provide and how the tool is going to be utilized. What is the end output of that? And how would you utilize the output of AI for what purpose? Is it going to be regulatory submission? Then that automatically qualifies for GXP. Are you going to make additions [based on] what AI is recommending? Are you going to use addition for patient safety? Automatically becomes a GXP systems, right? So it depends on  what's going in and what's going out, right?  So what's going in, basically, from a patient data perspective, right? What information is passing in over there? And output wise, what is the output? How it's going to be applied? That is a key driver of driving up GXP, non-GXP. So normally, computer system validation assessment that normally takes care of based on the information, based on the AI use case, based on, I think this is our key, what do call it? Evaluation process that we have to decide, okay, this is now a GXP low, GXP medium, GXP high. When we do GXP, it goes to whole new rigor. Non-GXP is a lower rigor. So our computer system validation evaluates whether it should be GXP, non-GXP. And once it's done, GXP or non-GXP, then we go for an official demo with the entire team of compliance quality, say, hey, this is our rigor on the quality also. It's not that if the GXP we are done now - of course, there is going to be official validation. But they also want to see what is the dev team doing around it. So, our evaluation model, how are we ensuring the accuracy is there around 80%, 90%. What is the model that we take on, right? So GVAL evaluation that we use. There's a lot of in-built evaluation that we can do. We can interrogate GPT themselves right now, right? Evaluate your own response. GVAL is a framework that we've been utilizing right now. So, CSA, computational evaluation, probably evaluate GXP, non-GXP. Once whatever the addition is from the compliance quality, we take it to the next step of see, how do we now see system will be evaluated, what data goes in, and how is the evaluation modeled by the dev team to ensure that there is 80, 90%? This one has got 20%, right? Because if it's 20 % accuracy, it looks very different. But we say, we have an 80% accuracy. This is the reason why we 80%. That helps the compliance quality.

Nagaraja Srivatsan

So, let's peel that on. So, because I just came from the DIA and talked to quite a bit of inputs from the FDA. And so, what you're defining defines very well with what FDA calls about the context of use. And once you define the context of use, and they say if the context of use is for patient safety or regulatory submissions or patient or endpoint collection, then it is not just GXP. It may have to have a very higher burden of proof and transparency and all of that. But in many of these examples, you always had a human in the middle. In the previous example, you had the PI actually look at the data and then make decisions. It's the PI who's saying whether it's three or four. Here in the RBQM, you're giving them an estimate on whether it's 11 or 14 or whatever is the score from the CPQ scores. But it's still the person and the RBQM person who's making that score up. Now, given that both of these things, CSV and low, medium-high validated or non-CXP? That was my first question. The second is, if it was a low, medium, high, what kind of documentation did you provide and what kind of test cases do you write? Because people are always wrestling with, how do I write a test case for a prompt? What are the type of things which I should be doing? As you said, 80 to 90%. Tell me a little bit more about the G-Value and all of that. People are really struggling to understand what should they be doing around this area. So it would be good to break it down into smaller components and share with the team.  

Sunil Talathi

We're talking about fine-tuning of the prompts. So G-Value is a classic framework where you talk about prompt accuracy. So again, the example normally would work.  If you look at a classic test case, where you say, hey, this is my expected result, and you see what the application is providing. If you look at the framework of test script, when you test out, this is the expected result expecting. GVAL is a similar framework, right? You're actually saying that I need this kind of an expected response from my result, right? And GVAL actually say the scoring of that and scoring mechanism that provides you how the accuracy level is, right, around your response, around your prompts. So basically, it's the scoring that allows us to know how accurate the prompts and response are. And again, you're not using your evaluation. It's asking you, GPT here, GVAL evaluation, give me this four factors. It tells the accuracy around that. The scoring helps you define if it's 80 or 90 % around that. Again, we are saying that giving expected results from the prompts, say, and when we send this to the prompt, response compares with our response if the accuracy is a good match to say if it's 80 % accurate or 90 % accurate. So, it's a classic, we take from a traditional testing approach that we use for JBL as well right now.  

Nagaraja Srivatsan

Okay, perfect, perfect. We've explored a lot of really exciting things in this one, even just within these two use cases. So, Sunil, a lot of exciting things. Where's the future? Where are we going? The next two years, three years, where is all AI and what you're doing going to lead us or take us forward?  

Sunil Talathi

I think number one, I said it's important. We realized the importance of AI. I AI was always there, right, Srivatsan? But last two years after the release of GPT, the team has realized, the leadership, they realize the importance of AI. So there is a major re-skilling, up-skilling effort that's happening on right now. I would say in the next two years, your company, your org will be much set up to work on AI, right? Because you have done re-skilling, up-skilling on a scale that we have never seen before. Normally you see a technology change happen in a few years' time, you only change a certain team. This is the functional up-skilling you're talking about, right? Or re-skilling. It's not about building AI systems; it's about using AI systems also. In next two years, I will see every company, every organ function, they will be finishing the upskilling, reskilling program. Second trend I would see, over your upskilling, reskill, is your adoption of AI tools. Today we have a lot of people, we talk about AI, there's still a lot of people in the function who don't use AI, who don't even use GPT. That will change also dramatically. The adoption of this AI tool, as simple as co-pilots, whether it's going to be GPT, you see the adoption goes up. me, the moment the adoption goes up, the simple efficiency really comes in using GPT itself.  I think a lot of the organization, if you look at last two, three years, we have done a lot of production ready products right now. A lot of people are still in the POC phase, still evaluating generative AI. I feel that next year, if we're going to answer that question, all of them may have a chance to finish the pilot. People will building the enterprise scale level application. I think we are also in that journey right now. We have done point solutions. We have proven it works. We have proven the value it works. Now it's not about point solution, it's a framework solution, it's enterprise solution that we're going to build on. So next two years, you have to see that trend is much more not on the point solution, much more on the framework solution, right? I would say the trust - very much better. And the key part is going to be, you speak a lot about AI agents, you speak about a lot, you know, human job, will have job now. I think the next two years is going to be important to see how human and AI will coexist. I think that answer we should have., There is an ambiguity around that right now, right? What will my role in two years time? I feel that two years time that that clarity will be there that how human and AI will coexist and again, there are a lot of other use cases that we can talk about and I feel that the programming will be much more efficient code creation is a big big part of it, but there will be less of content creation, but there will be a lot of content interpretation. I feel that AI would become much , much better for decision making. Today we only look for content creation a lot. But I feel there is maturity, there is trust that will build in next year's time that the team will go and use AI for decision making rather than content creation.  

Nagaraja Srivatsan

Now, this is awesome. I mean, I think what you said is spot on. I think we need to go from pilots to scale. We need to really be re-skilling. And I loved what you said. It's not about AI or the human. It's how we work together. We now have a new worker in our workforce. We need to make sure that we're adopting that worker and integrating that within our processes. Sunil, this has been a fascinating discussion. Really thank you for your time and really appreciate your deep insight into these two case studies and I really wish you well. Thank you Sunil.  

Sunil Talathi

Thank you for having me over here. It was fun.

Daniel Levine

Sri, what did you think?  

Nagaraja Srivatsan

Daniel, it was really good because Sunil really broke down complex problems into very simple components. He really taught about this journey of how you can use prompts to really solve big problems. What was interesting is we're in actually version two of how this journey is evolving. Initially, we would go and take our knowledge and store it into a vector repository and use retrieval, augmented generation models to then coexist with large language models. What he said blew my mind after that. All the large language models are very efficient to do safety interpretation. And what he's now doing is teaching it what we call a few shot examples, teaching it guideline examples of what you should be doing. And it's doing the job 80, 90 % as good as a safety interpretation person or an expert. That's fascinating. That's fascinating.  

Daniel Levine

Well, it's interesting because we've talked a lot about data being critical for the AI to be trained properly, but we haven't talked a lot about prompts, which requires training the user. How should companies think about the role of prompts and getting AI to do what it wants it to do? And what do you think of the approach Sunil described?

Nagaraja Srivatsan

What Sunil described was that prompts are like code. You need to first have a framework on how you're writing it. It's not like search where like, hey, let me tell you, find out what the weather is. It's like, you had to give me a structure. I am in New Jersey and this is the weather and this is what I'm looking for. And so yeah. The first thing he said is pick a framework. He used the RISE framework. There are several frameworks used. What that does is it gives us a structured approach to give that guideline to the prompt. That's the first part. The second is he said that as you build prompts in a clinical space, think about input and output. That's his validation comment. So just say, this is the output of the prompts. He said, pick how will you evaluate the output of the prompt, so setting up this evaluation, GVAL, as he calls it, which is one of the evaluators. There are several other evaluators. Tells you what you're getting as an output is as good as what you desire. So I love that you have a think-like code, use a framework, make sure you're testing it appropriately, and then constantly refining it and delivering it. So, it was a great lesson which he taught us on how you go about building a prompt project.  

Daniel Levine

You also asked about balancing quality and cost. Any surprise in the way he talked about the cost components or how he thinks about that balance?

Nagaraja Srivatsan

There were three things which he talked about. The first is the classic ROI. He said, hey, I took people-based effort, then assessed the cost of what the AI teammate is doing, and said it's a 10th of the cost of a people-based effort. Great. That's your classic ROI. But that, I think, it will start to evolve as we start to build a lot of these prompts, lot of guidelines. And the more you have, you need to start optimizing what I call tokens, input tokens, which is what you ask the LLM to respond to and output tokens, how it responds and how you do that. So I think the cost part of it is going to continue to evolve, but the auto wise were so stark and clear that he didn't have to go into the micro section of all of the different costs and nuances of how you go about building it. But what he said was very interesting. It's not on cost, but how he landed the output of the AI was actually built into the workflow. And the reason I say that that's a very important part of the cost is there's an integration cost to integrate the AI output into the workflow. And once you do that, actually the quality and the output and the adoption benefits increase exponentially. So it's a low cost to integrate, but then a high cost in terms of value and quality.  

Daniel Levine

Well, there was one other thing he said that really rang, which was AI is not a product, it's a capability. What did you think of that?

Nagaraja Srivatsan

As we start to bring in this new workforce in the market, the AI teammates, you have to think about them as a teammate and a capability, not as a product, which is finished and complete, which means you're learning from the output of that teammate and you're giving learnings back to the teammate. And he said it in a very fascinating way. As we look across the next two years, it is how we can coexist in this new ecosystem, new normal. And those who are adaptable to this new team-mate are going to be so much significantly more productive than those who are expecting perfection from this teammate.  

Daniel Levine

Well, it was a great discussion and Sri, thank you as always.  

Nagaraja Srivatsan

Thank you, Danny.

Daniel Levine

Thanks again to our sponsor, Agilisium Labs. Life Sciences DNA is a bi-monthly podcast produced by the Levine Media Group with production support from Fullview Media. Be sure to follow us on your preferred podcast platform. Music for this podcast is provided courtesy of the Jonah Levine Collective. We'd love to hear from you. Pop us a note at danny at levinemediagroup.com. Life Sciences DNA, I'm Daniel Levine.

Thanks for joining us.  

Our Host

Senior executive with over 30 years of experience driving digital transformation, AI, and analytics across global life sciences and healthcare. As CEO of endpoint Clinical, and former SVP & Chief Digital Officer at IQVIA R&D Solutions, Nagaraja champions data-driven modernization and eClinical innovation. He hosts the Life Sciences DNA podcast—exploring real-world AI applications in pharma—and previously launched strategic growth initiatives at EXL, Cognizant, and IQVIA. Recognized twice by PharmaVOICE as one of the “Top 100 Most Inspiring People” in life sciences

Our Speaker

Sunil Talathi is a Senior Director and Head of Data Insights at BeOne Medicines, a forward-thinking biotechnology company focused on advancing precision medicine. With extensive experience in data analytics, data engineering, and operational excellence, Sunil has led key functions in data science, clinical research, and system operations at organizations such as Amgen and Cognizant Technology Solutions. Sunil holds an MBA in Innovation and Entrepreneurship from HEC Paris, an MBA in Clinical Research from the Indian School of Management & Studies, and a Bachelor's degree in Computer Science from Ramnarain Ruia College. Currently, he is furthering his education at the Massachusetts Institute of Technology, deepening his expertise in cutting-edge technologies and data-driven solutions in the life sciences field.