The Right Way to Launch an AI Initiative


HANNAH BATES: Welcome to HBR On Strategy—case studies and conversations with the world’s top business and management experts, hand-selected to help you unlock new ways of doing business.

How did it go the last time you started an artificial intelligence project at your company? Chances are, some of your colleagues expressed confusion or apprehension—and they never engaged with what you built. Or maybe the whole initiative went sideways after launch—because the AI didn’t work the way you thought it would. If any of that sounds familiar, you’re not alone. Harvard Business School assistant professor and former data scientist Iavor Bojinov says around 80% of AI projects fail. He talked with host Curt Nickisch on HBR IdeaCast in 2023 about why that is—and the best practices leaders should follow to ensure their projects stay on track.

CURT NICKISCH: I want to start with that failure rate. You would think that with all the excitement around AI, there’s so much motivation to succeed, somehow though the failure rate is much higher than past IT projects. Why is that? What’s different here?

IAVOR BOJINOV: I think it begins with the fundamental difference that AI projects are not deterministic like IT projects. With an IT project, you know pretty much the end state and you know that if you run it once, twice, it will always give you the same answer. And that’s not true with AI. So you have all of the challenges that you have with IT projects, but you have this random, this probabilistic nature, which makes things even harder.

With algorithms, the predictions, you may give it the same input. So think something like ChatGPT. Me and you can write the exact same prompt and it would actually give us two different answers. So this adds this layer of complexity and this uncertainty, and it also means that when you start a project, you don’t actually know how good it’s going to be.

So when you look at that 80% failure rate, there’s a number of reasons why these projects fail. Maybe they fail in the beginning where you just pick a project that is never going to add any value, so it just fizzles out. But you could actually go ahead and you could build this. You could spend months getting the right data, building the algorithms, and then the accuracy could be extremely low.

So for example, if you’re trying to pick which of your customers are going to leave you so you can contact them, maybe the algorithm you build is really not able to find people who are going to leave your product at a good enough rate. That’s another reason why these projects could fail. Or for another algorithm, it could do a really good job, but then it could be unfair and it could have some sort of biases. So the number of failure points is just so much greater when it comes to AI compared to traditional IT projects.

CURT NICKISCH: And I suppose there’s also that possibility where you have a very successful product, but if the users don’t trust it, they just don’t use it and that defeats the whole purpose.

IAVOR BOJINOV: Yeah, exactly. And I mean this is exactly, well, actually one of the things that motivated me to leave LinkedIn and join HBS was the fact that I built this, what I thought was a really nice AI product for doing some really complicated data analysis. Essentially when we tested it, it cut down analysis time that used to take weeks into maybe a day or two days. And then when we launched it, we had this really nice launch event. It was really exciting. There were all these announcements and a week or two after it, no one was using it.

CURT NICKISCH: Even though it would save them a lot of time.

IAVOR BOJINOV: Massive amounts of time. And we tried to communicate that and people still weren’t using it and it just came back to trust. People didn’t trust the product we had built. So this is one of those things that’s really interesting, which is if you build it, they will not come. And this is a story that I’ve heard, not just with LinkedIn in my own experience, but time and time again. And I’ve written several cases with large companies where one of the big challenges is they build this amazing AI, they show it’s doing a really, really good job, and then no one uses it. So it’s not really transforming the organization, it’s not really adding any value. If anything, it’s just frustrating people that maybe there’s this new tool that now they have to find a way to avoid using and find reasons why they don’t want to use it.

CURT NICKISCH: So through some of those painful experiences yourself in practice, through some of the consulting work you do, through the research you do now, you have some ideas about how to get a project to succeed. The first step seems obvious, but is really important, it seems. Selecting the right thing, selecting the right project or use case. Where do people go wrong with that?

IAVOR BOJINOV: Oh Curt, they go wrong in so many different places. It sounds like a really obvious no-brainer. Every manager, every leader is consistently prioritizing projects. They’re consistently sequencing projects. But when it comes to AI, there’s a couple of unique aspects that needs to be considered.

CURT NICKISCH: Yeah. In the article, you call them idiosyncrasies, which is not something business leaders like to hear.

IAVOR BOJINOV: Exactly. But I think as we sort of transition into this more AI-driven world, these will become the standard things that people consider. And what I do in the article is I break them down into feasibility and impact. And I always encourage people to start with the impact first. Everyone will say, this is a no-brainer. It’s really this piece of strategic alignment. And you might be thinking, okay, that’s straightforward. I know what my company wants to do. But typically when it comes to AI projects, it’s the data science team that’s actually picking what to work on.

And in my experience, data scientists don’t always understand the business. They don’t understand the strategy, and they just want to use sort of the latest and best technology. So very often there’s this misalignment between the most impactful projects for the business and a project that the data scientist just wants to do because it lets them use the latest and best technology. The reality is with most AI projects, you don’t need to be using the latest and the cutting edge. That’s not necessarily where the value is for most organizations, especially for ones that are just starting their AI journey. The second portion of it is really the feasibility. And of course you have things like, do we have the data? Do we have the infrastructure?

But the one other piece that I want to call out here is what are the ethical implications? So there’s this whole area of responsible AI and ethical AI, which again, you don’t really have with IT projects. Here, you have to think about privacy, you have to think about fairness, you have to think about transparency, and these are things you have to consider before you started the project. Because if you try to do it halfway through the build and try to do it as a bolt-on, the reality is it will be really costly and it could almost require you just restarting the whole thing and which greatly increases the costs and frustration of everyone involved.

CURT NICKISCH: So the easy way ahead is to tackle the hard stuff first. That gets back to the trust that’s necessary, right?

IAVOR BOJINOV: Exactly. And you should have thought about trust at the beginning and all the way through. Because in reality, there’s several different layers to trust. You have trust in the algorithm itself, which is: Is it free from bias? Is it fair? Is it transparent? And that’s really, really important. But in some sense, what’s more important is do I trust the developers, the people who actually build the algorithm? If I’m a Nintendo user, I want to know that this algorithm was designed to work for me to solve the problems that I care about, and in some sense that the people designing the algorithm actually listen to me. That’s why it’s really important when you’re beginning, you need to know who is going to be your intended user so you can bring them in the loop.

CURT NICKISCH: Who is the you in this situation if you need to know who the users are? Is this the leader of the company? Is this the person leading the developer team? Where’s the direction coming from here?

IAVOR BOJINOV: There’s basically two types of AI projects. You have external facing projects where the AI is going to be deployed to your customers. So think like the Netflix ranking algorithm. That’s not really for the Netflix employees, it’s for their customers. Or Google’s ranking algorithm or ChatGPT, these things are deployed to their customers, so those are external facing projects. Internal facing project on the other hand are deployed to the employees. So the intended users are the company’s employees.

So for example, this would be like a sales prioritization tool that basically tells you, okay, call this person instead of this person or it could be an internal chatbot to help your customer support team. Those are all internal facing products. So the first step is to really just figure out who is the intended audience? Who is going to be the customer of this? Is it going to be the employees or is it going to be your actual customers? So very often for most organizations, internal facing projects are called data science, and they fall under the purview of a data science team.

Whereas external facing projects tend to fall under the purview of an AI or a machine learning team. Once you sort of figure out this is going to be internal or external, you know who’s going to be building this and very often you know the amount of interaction you can have with the intended customers. Because if it’s your internal employees, you probably want to bring those people in the room as much as possible, even at the beginning, even at the inception, to make sure you’re solving the right problem. It’s really designed to help them do their job.

Whereas with your customers, of course, you’re going to have focus groups to figure out if this really is the right thing, but you’re probably going to rely more on experimentation to tweak that and make sure your customers are really benefiting from this product.

CURT NICKISCH: One place where difficulty arises for big companies is this tension between speed and effectiveness. They want to experiment quickly, they want to fail faster and get to successes sooner, but they also want to be careful about ethics. They’re very careful about their brand. They want to be able to use the tech in the most helpful places for their business. What’s your recommendation for companies that are kind of struggling between being nimble and being most effective?

IAVOR BOJINOV: The reality is you need to keep trying different things so that you can improve the algorithm. So for example, in one study that I did with LinkedIn, we basically showed that when you leverage experimentation, you can improve your final product by about 20% when it comes to key business indicators. So that notion of we tried something, we used that to learn, and we incorporated the learnings can have substantial boosts on the final product that’s actually delivered. So really for me, it’s about figuring out what is the infrastructure you need to be able to do that type of experimentation really, really rapidly, but also figuring out how can you do that in a really safe way.

One way of doing that in a safe way is basically having people opt into these more experimental versions of whatever it is you are offering. So a lot of companies have ways of you signing up to be like a alpha tester or beta tester, and then you sort of get the latest versions, but you realize that maybe it’ll be a little bit buggy, it’s not going to be the best thing, but maybe you’re a big fan and that doesn’t really matter. You just want to try the new thing. So that’s one thing you can do is sort of create a pool of people who you can experiment on and you can try new things without really risking that brand image.

CURT NICKISCH: So once this experiment is up and running, how do you recognize when it’s failing or when it’s subpar, when you’ve learned things, when it’s time to change course? With so many variables, it sounds like a lot of judgment calls as you’re going along.

IAVOR BOJINOV: Yeah. The thing I always advocate here is to really think about the hypothesis you are testing in your study. There’s a really nice example, and this is from Etsy.

CURT NICKISCH: And Etsy is an online marketplace for a lot of independent or small creators.

IAVOR BOJINOV: Exactly. So a few years back, folks at Etsy had this idea that maybe they should build this infinite scroll feature. Basically, think of your Instagram feed or Facebook feed where you can keep scrolling and it’s just going to load just new things. It’s going to keep loading things. You’re never going to have to click next page.

And what they did was they spent a lot of time because that actually required re-architecting the user interface, and it took them a few months to work this out. So they built the infinite scroll, then they started running the experiment and they saw that there was no effect. And then the question was, well, what did they learn from this? It cost them, let’s say, six months to build this. If you look at this, this is actually two hypotheses that are being tested at the same time. The first hypothesis is, what if I showed more answers on the same page?

If I showed more products on the same page, and maybe instead of showing you 20, I showed you 50, then you might be more likely to buy things. That’s the first hypothesis. The second hypothesis that this is also testing is what if I was able to show you the results quicker? Becauses why do I not like multiple pages? Well, it’s because I have to click next page and it takes a few seconds for that second page to load. At a high level, those are sort of the two hypotheses. Now, there actually was a much easier way to test this hypothesis.

They could have just displayed, instead of having 20 results on one page, they could have had 50 results. And they could have done that in, I don’t know, like a minute, because this is just a parameter, so that required no extra engineering. Showing your results quicker hypothesis, that’s a little bit trickier because it’s hard to speed up a website, but you could do the reverse, which is you could just slow things down artificially where you just make things load a little bit slower. So those are sort of two hypotheses that you could, if you understood those two hypotheses, you would know whether or not you would need to do this infinite scroll and whether it was worth making that investment.

So what they did in a follow-up study is they basically ran those two experiments and they basically showed that there was very little effect of showing 20 versus 50 results on the page. And then the other thing, which was actually counterintuitive to what most other companies have seen, but because of the description you gave actually makes sense is that adding a small delay doesn’t make a huge deal to Etsy because Etsy is a bunch of independent producers of unique products. So it’s not that surprising if you have to wait a second or two seconds to see the results.

So the high level thing is whenever you are running these experiments and developing these AI products, you want to think about not just about the minimum viable product, but really what are the hypotheses that are under underlying the success of this, and are you effectively testing those.

CURT NICKISCH: That gets us into evaluation. That’s an example of where it didn’t work and you found out why. How do you know that it is working or working well enough?

IAVOR BOJINOV: Yeah. Absolutely. I think it’s worth answering first the question of why do evaluation in the first place? You’ve developed this algorithm, you’ve tested it, and you’ve only has good predictive accuracy. Why do you still need to evaluate it on real people? Well, the answer is most products have either a neutral or a negative impact on the very same metrics that were designed to improve. And this is very consistent across many organizations, and there’s a number of reasons why this is true for AI products. The first one is AI doesn’t live in isolation.

It lives usually in the whole ecosystem. So when you make a change or you deploy a new AI algorithm, it can interact with everything else that the company does. So for example, it could, let’s say you have a new recommendation system, that recommendation system could move your customers away from, say, high value activities to low value activities for you whilst increasing, say, engagement. And here, you basically realize that there are all these different trade-offs, so you don’t really know what’s going to happen until you deploy this algorithm.

CURT NICKISCH: So after you’ve evaluated this, what do you need to pay attention to? When this product or these services are adopted, whether they’re externally facing or internal to the organization, what do you need to be paying attention to?

IAVOR BOJINOV: Once you’ve successfully shown in your evaluation that this product does add enough value for it to be widely deployed, and you’ve got people actually using the product, then you sort of move to that final management stage, which is all about monitoring and improving the algorithm. And in addition to monitoring and improving, that’s why you need to actually audit these algorithms and check for unintended consequences.

CURT NICKISCH: Yeah. So what’s an example of an audit? An audit can sound scary.

IAVOR BOJINOV: Yeah, audits can absolutely sound scary. And I think firms are very scared of their audits, but they all have to do it and you sort of need this independent body to come look at it. And that’s essentially what we did with LinkedIn. So there is this, one of the most important algorithms at LinkedIn is this people you may know algorithm, which basically recommends which people you should connect with.

And what that algorithm is trying to do is it’s trying to increase the probability or the likelihood that if I show you this person as a potential connection, you will invite them to connect and they will accept that. So that’s all that algorithm is trying to do. So the metric, the way you measure the success of this algorithm is by basically counting or looking at the ratio of the number of people that people invited to connect, and what percentage of those actually accepted.

CURT NICKISCH: Some sort of conversion metric there.

IAVOR BOJINOV: Exactly. And you want that number to be as high as possible. Now, what we showed, which is really interesting and very surprising in this study that was published in Science, and I have a number of co-authors on it, is that a year down the line, this was actually impacting what jobs people were getting. And in the short term, it was also impacting sort of how many jobs people were applying to, which is really interesting because that’s not what this algorithm was designed to do. That’s an unintended consequence. And if you sort of scratch at this, you can figure out why this is happening.

There’s this whole theory of weak ties that comes from this person called Granovetter. And what this theory says is that the people who are most useful for getting new jobs are arm’s length connections. So people who maybe are in the same industry as you, and maybe they’re say five, six years ahead of you in a different company. People you don’t know very well, but you have something in common with them. This is exactly what was happening is some of these algorithms, they were increasing the proportion of weak ties that a person was suggested that they should connect with. They were seeing more information, they were applying to more jobs, and they were getting more jobs.

CURT NICKISCH: Makes sense. Still kind of amazing.

IAVOR BOJINOV: Exactly. And this is what I mean by these ecosystems. It’s like you’re doing something to try to get people to connect to more people, but at the same time, you’re having this long-term knock-on effect on how many jobs people are applying to and how many jobs people are getting. And this is just one example in one company. If you scale this up and you just think about how we live in this really interconnected world, it’s not like algorithms live in isolation. They have these types of knock-on effects, and most people are not really studying them.

They’re not looking at these long-term effects. And I think it was great example that LinkedIn sort of opened the door. They were transparent about this, they let us publish this research, and then they actually changed their internal practices where in addition to looking at those sort of short-term metrics about who’s connecting whom, how many people are accepting, they started to look at those more long-term effects on the whole sort of how many jobs people are applying to, etc. And I think that’s sort of testimony to how powerful these types of audits can be because they just give you a better sense of how your organization works.

CURT NICKISCH: A lot of what you’ve outlined, and of course the article is very detailed for each of these steps. But a lot of what you have outlined is just how, I don’t know, cyclical almost this process is. It’s almost like you get to the end and you’re starting over again because you’re reassessing and then potentially seeing new opportunities for new tweaks or new products. So to underscore all this, what’s the main takeaway then for leaders?

IAVOR BOJINOV: I think the main takeaway is to realize that AI projects are much harder than pretty much any other project that a company does. But also the payoff and the value that this could add is tremendous. So it’s worth investing the time to work on these projects. It’s not all hopeless. And realizing that there’s sort of multiple stages and putting in infrastructure around how to navigate each of those stages can really reduce the likelihood of failure and really make it so that whatever project you’re working on turns into a product that gets adopted and actually adds tremendous value.

CURT NICKISCH: Iavor, thanks so much for coming on the show to talk about these insights.

IAVOR BOJINOV: Thank you so much for having me.

HANNAH BATES: That was HBS assistant professor Iavor Bojinov in conversation with Curt Nickisch on HBR IdeaCast. Bojinov is the author of the HBR article “Keep Your AI Projects on Track”.

We’ll be back next Wednesday with another hand-picked conversation about business strategy from the Harvard Business Review. If you found this episode helpful, share it with your friends and colleagues, and follow our show on Apple Podcasts, Spotify, or wherever you get your podcasts. While you’re there, be sure to leave us a review.

And when you’re ready for more podcasts, articles, case studies, books, and videos with the world’s top business and management experts, find it all at HBR.org.

This episode was produced by Mary Dooe and me—Hannah Bates. Curt Nickisch is our editor. Special thanks to Ian Fox, Maureen Hoch, Erica Truxler, Ramsey Khabbaz, Nicole Smith, Anne Bartholomew, and you – our listener. See you next week.



Source link

Scroll to Top