Automated Identification of Key Steps in Robotic-Assisted Radical Prostatectomy Using Artificial Intelligence - Abhinav Khanna

May 29, 2024

Ruchika Talwar interviews  Abhinav Khanna about his work on AI and prostate cancer surgery. Dr. Khanna discusses his team's development of an AI platform that can identify key surgical steps during robotic prostatectomy with high accuracy. By analyzing video footage from minimally invasive surgeries, this technology aims to turn raw surgical video data into a valuable resource for training, quality benchmarking, and real-time video labeling. Dr. Khanna highlights how this AI model, trained on nearly 500 prostatectomy videos, achieves a 93% accuracy rate, comparable to human experts. The technology, already integrated into clinical practice through a partnership with Theator, has potential applications in resident education, efficiency logistics, and predicting surgical complications. Dr. Talwar and Dr. Khanna emphasize the importance of surgeons leading the adoption of such technology to ensure ethical integration and maximize its benefits.


Abhinav Khanna, MD, MPH, Urologic Oncologist and Robotic Surgeon, Mayo Clinic, Rochester, MN

Ruchika Talwar, MD, Urologic Oncology Fellow, Department of Urology, Vanderbilt University Medical Center, Nashville, TN

Read the Full Video Transcript

Ruchika Talwar: Hi everyone. Welcome back to Uro Today's Health Policy Center of Excellence. As always, my name is Ruchika Talwar, and today I'm joined by Dr. Abhinav Khanna, who is from the Mayo Clinic. He'll be sharing some of his recent work in the fields of AI and prostate cancer surgery. Dr. Khanna, we really appreciate your time.

Abhinav Khanna: Yeah, Dr. Talwar, thank you so much for having me. It's an honor to be here. Thank you again to Uro Today and to Dr. Talwar for having me. It's really an honor to be here and to talk about some of this work that we're really excited about. So I'll be speaking today about automated identification of key surgical steps during robotic prostatectomy, using a novel AI platform for computer vision. So the background here is that minimally invasive surgery generates vast amounts of surgical video footage. And as urologists, many of our practices are MIS-based and are, in some form, scope-based. And so by all means, we are part and parcel of this. Most of our practices are potentially capturable sources of data that are essentially going to waste. And so I think many of us would agree that video, surgical video footage, is perhaps one of the richest, but also one of the greatest untapped potentials of data in surgery. Not just in urology, but probably in all of minimally invasive surgery.

I've listed just a few potential applications of video data capturing and structuring here, and we'll talk about some of these in more detail. But I think the idea is, we would probably all agree that video footage is a tremendously powerful resource if we can tap into it. And historically, we've not really had the power to do that. And so most of this raw video footage that we generate by doing endoscopy, laparoscopy, robotics, most of that video footage is essentially lost because it's not in usable form. And so our goal with this project was a very fundamental and elemental question, which is, can we simply put, can we teach a computer to see surgery the way you or I would, right? So you and I can sit down and look at a video of a robotic prostatectomy, and we are at the stage in our surgical careers that we could very quickly tell what's happening in any given moment of that video.

But that might not be the case for, let's say, a more junior resident or a medical student or even a mid-level resident. And so can we perhaps automate that process? Can we get a computer to see what we see as experts in the field? The formal aim, to put it more eloquently, is to develop a computer vision model for automated identification of key surgical steps during robotic prostatectomy. But again, to boil this down into very simple terms, can we teach a computer to see a prostatectomy and to understand what's happening, on video, the way a urologist would? So in order to do this, we gathered robotic prostatectomy videos, full-length surgical videos over the course of a year. We took all of the videos and labeled them manually. So from start to finish, every single frame in the video was labeled with what step was happening on film.

And the table I've outlined here from the manuscript provides the criteria for each step. But this is things that I think would come very naturally and very intuitively to most urologists. Things like, are they doing a lymph node dissection at this point in time? Are they developing a space of Retzius? Is it bladder neck dissection and anastomosis? Things that, again, we might take for granted at this point in our careers, but that may not necessarily be readily appreciated by a lay eye. And the goal is to get a computer to be able to see these things the way we do. And so once we had our full library of videos manually annotated, that was then used to train a computer vision algorithm to start to see if we could have the computer recognize those same steps automatically. So I won't get too technical, but this is our machine learning pipeline.

So this goes from bottom up, this figure. So the bottom of the figure is showing, again, each string of video is broken up into micro frames. And you might imagine, in each frame there are very specific temporal relationships. What's the anatomy that's happening? What was the anatomy five frames ago? What does the anatomy look like five frames down? What are the instruments? What are the maneuvers? And we extract all those features from the frame using something called a VTN architecture. So this is a video transformer network and transformer networks have become quite popular recently. But the goal is to extract the key features that are being seen on video. And then we use a second-order algorithm called an LSTM network to then provide a prediction. So for each individual frame, what is the best guess of, basically of the algorithm, in terms of what step is being performed?

And so we did this for almost 500 prostatectomy videos. We used the bulk of those, almost 80% of those videos, to train and internally validate the model. And then, as is common for these types of studies, we had a completely separate holdout test set. So out of the 474 videos, we had 113 that the model never saw. It never learned on them. It never had any exposure to them. And it was never allowed to learn on them. Those are purely just to test the accuracy of the model, in almost an external validation, although those videos are from the same sort of institution. These are our top-line results. So this is something called a confusion matrix, and it basically plots each step of the surgery or whatever we're trying to have the model predict. And so that blue diagonal line you see going down is the individual accuracy for each particular step.

And so you can see most of the steps' accuracy is probably 90% or greater. There are a couple of steps where it hovers in the eight ranges. And for most of those, wherever the model is potentially misfiring, it's usually an adjacent step. So you see the very, very subtle light blue adjacent to the step that's happening. For instance, maybe it's the anterior bladder neck; the human reviewed that video and said at that particular frame, it's the anterior bladder neck, and maybe the algorithm called it posterior. Or something temporally adjacent that often makes sense. Again, the top-line result here is that the AI achieved a 93% accuracy as compared to the human annotations. When we take that into context, if you had multiple humans review the same video, there's a little bit of leeway. Inter-observer variability is going to be about 5%. Achieving the 93% accuracy is pretty close to on par with a multi-reader human assessment of these types of videos, which is really exciting.

And so that's the basic framework. So we asked this question: can we teach a computer to see surgery and to see a prostatectomy and understand what's happening at any given moment? And I think the answer is yes, at least on this dataset, we were able to do that. And so I think the bigger question, and perhaps the more exciting question, is, what do we do with that? So now you've got the ability to structure and turn raw video footage into a data source. And what can we do with that? And I think there are a lot of potentially exciting applications of this. I'm excited to chat more with you about that. We did this study in partnership with a computer vision surgical intelligence startup called Theator. And so what's exciting about this is that this work is already integrated into their platform, which we've rolled out into our practice.

For instance, one potential application of this technology is real-time video labeling. If we do a robotic prostatectomy, by the time the patient's hitting PACU, that video is already fully annotated by the AI model. And you can see those annotations here on the right side. So if we wanted to go back in between cases and say, "Let me take another look at that bladder neck, it looked a little funny," because these are again labeled automatically, we're not sifting through and clicking through and seeing where was that particular step. We can jump right to it. And so we start to then reframe how we think about these surgical cases and how we think about surgical video. Instead of a catalog of prostatectomy videos, you could imagine we have now a queryable catalog of individual steps. And I think this has potential applications in resident education, resident and fellow education, and surgeon training.

We step away from the paradigm of, "I'm going to learn to do a prostatectomy start to finish," which is a complex task and not really pragmatic to learn start to finish in one go. You can say, "Okay, as a trainee, I'm going to focus and really lean into a specific step." So let's say I'm going to lean into the anterior bladder neck; I'm going to focus on that step. We can now plug the residents into a catalog of anterior bladder necks, which has not previously been doable, to bring that particular skill within the overarching goal of learning prostatectomy. I think there are exciting applications in quality benchmarking. Most of what we look at, in terms of quality metrics in surgery, are usually surrogates for what actually happens in the operating room. So things like length of stay or readmissions or things like that, they are valuable endpoints, but they don't necessarily capture what's happening when the knife goes to skin.

And so here, this is just an example again from prostatectomy. Certain fields of surgery, I think, have been a little bit more forward in terms of the idea of quality metrics and critical views of safety. For instance, laparoscopic cholecystectomy literature is, I'd say, littered with examples of critical views of safety. And that's been widely adopted as a safety metric. And I don't think we have the same analogous concepts in prostatectomy, although I think there's an opportunity for us to do that as a field. But here's just an example of something we think might be analogous, which is, when you're doing a node dissection, do you view the obturator nerve before you put the clip on? And if you ask those who do it, we would all say, "Yeah, we do this 100% of the time." But now, again, when we've got video footage, we can actually drill down and say, "Okay, do we actually do it as much as we think we do? How do we compare to our peers? What's our department average?"

And really empower us to drive quality improvement at a ground level. Similarly, we can think, again, about efficiency and logistics metrics. Surgical administrators are often thinking about OR times, but they're thinking about it in a very umbrella way, start to finish, or in-the-room to out-of-the-room. And now that you've cataloged steps, you can start to think about those on an individual step basis. How does any given surgeon compare to their peers for a particular step? And where might there be outliers? This will be the last example, and then we'll hopefully chat more about this. But this is, again, from a logistical standpoint, when you're trying to allocate resources in the operating room, perhaps even intra-day, there may be a charge nurse who's trying to guess, based on when a case started and what Epic thinks is going to be the case time, how long it's going to take.

But now, again, as you can see these particular steps happening in real-time, the algorithm can actually predict, "Okay, we're on the anterior bladder neck. That's a little bit ahead of schedule or behind schedule, and here's the estimated time to completion," just to facilitate some of that planning. And so I think the take-home here is that, yes, at a fundamental level, we can teach an AI algorithm to see a prostatectomy and glean the key steps that are happening with high accuracy almost on par with humans.

I think what's interesting here is, what's exciting is this is purely based on video footage. So we're not relying on any input from the instruments or the robot or any kinematic data, which means that this is potentially applicable to non-robotic surgery. So we've done this, it's not the topic of our conversation today, but we've done this in laparoscopy. We've done this in endoscopy, cystoscopy, that sort of thing. There are a number of applications of this type of technology that I've gone through. And I think this is exciting because this type of work is really the first step towards the datification and ushering in the digital era of surgery, which I'm personally really excited about.

Ruchika Talwar: Really, really fascinating work. The possibilities of applications here truly are endless. But I'll focus on a couple of things that you touched upon in terms of where we see this work going. So first of all, I think a really interesting application that comes to my mind when I listen to your work, particularly in surgeons taking a look at their own surgical skill and where to improve, there's always room to improve, always room to adopt new techniques. But the thing that came to mind for me was unrecognized complications because it's something that we face a lot. And so tell me a little bit about your thoughts on how potentially AI may be able to catch something being off before we even are able to do so.

Abhinav Khanna: Yeah, that's a great question. So we explored a little bit of that in this particular dataset in terms of prostatectomy. For better or worse, prostatectomy is not a surgery that has a terribly high complication rate in most centers. So I think the luxury of that is that our patients do well. But the downside is when we're trying to train a model to recognize complications or predict outcomes or adverse events, you're limited to things like bleeding events or hemorrhage or that sort of thing, which we did explore. And we have some other data coming out soon, looking at some things like that. I think the other way to think of this, to your point, is, are there certain things that might be surgery agnostic? So when we're doing robotic surgery anywhere in the abdomen, are there things such as bleeding events or proximity of cautery to small bowel, things that we can potentially recognize across surgery types, whether you're in the foregut or working on retroperitoneal or in the pelvis or whatever.

And so those are some of the things that we've started to explore a little bit that's exciting. And I think that's absolutely where the future of this lies is thinking beyond just a particular surgery type. We use prostatectomy as a prototype because it has very discrete steps. But to your point, absolutely I think prediction of adverse events or near misses or that sort of thing, I think is where you really leverage the power of this when you start to think beyond just a surgery and start to be a little bit surgery agnostic and think more about what are common themes that you could see across different surgery types.

Ruchika Talwar: And similarly, the applications in terms of quality assessments as you alluded to, can be used in multiple surgery types, as well. However, I know there is some criticism out there about the ethics behind that or what the ramifications for physician autonomy may be. Share a little bit of your thoughts on perhaps the other side of this.

Abhinav Khanna: Yeah, I totally agree. That's probably one of the biggest questions facing our field. And I think that this technology is here and it's only going to get better as time goes on. And I think we, as surgeons and as urologists, have to be a little critical and introspective in terms of how do we adopt this technology? And how do we want to adopt this technology? And I think the point I would emphasize the most is that, as surgeons, we have to be leading the charge here.

This technology is not going to go away. And I think our options are to have a seat at the table or otherwise. And so I think we should be leading the way. We should be engaged in these efforts. There's those of us who are engaged in this type of thing from an academic standpoint. We've obviously done this as an academic endeavor, but also are hoping to build this into our practice from a real-world standpoint and partnering with Theator. And I think that it's key, surgeons should be at the core of this and should be very much engaged in these efforts.

Ruchika Talwar: Yeah, I couldn't agree with you more on that point. It's important that we pave the way forward so that we're not stuck dealing with the ramifications on the backend. But honestly, congratulations on this work. It is so innovative, so interesting, and I'm excited we had the opportunity to share it with our Uro Today audience. So thanks again for joining us. We appreciate your time.

Abhinav Khanna: Thank you so much for having me.

Ruchika Talwar: Yeah, and we'll definitely have to bring you back on some of those follow-up studies. We look forward to seeing what results are forthcoming. To our audience, thank you so much for joining us, and we'll see you next time.