Minimizing AI adoption costs + Upskilling data talent

DTP #9: Q&A with Yiqiao Yin

We spoke to Yiqiao Yin, Senior Data Scientist/ML Engineer at Labcorp on stakeholder communication challenges and solutions, minimizing the costs associated with adopting AI, and generalist and specialist data science roles.

Quotations have been lightly edited for concision and readability.

What challenges do you face on a day to day basis in your role as a data scientist?

“There's quite a lot. Number one thing I will say as a part of the challenge is communication with leadership. As data scientists, [our] job is to make sure that the model is doing what it's supposed to do. The code is nicely written, It can go into production, all that stuff, but that's all about the implementation.”

Stakeholder management requires a mindset shift for data scientists (And others in more technical roles):

“Whether the implementation is going in the right direction depends on communication with the stakeholder. So I will say number one challenge is–How do you conduct a good and well informed meeting with stakeholders such that the data scientist is doing exactly what is helpful for the company and that drives the business value?”

It seems like stakeholders would be primarily interested in hearing about potential results. Would you agree with that?

“Yeah, I definitely agree with that. I think when you communicate with the stakeholder, it's results driven. It's not so much of how you get from A to B, it's more of when you get to B, what does B do to the business and how is the value added to the business. If you do B right, you know so. I would probably temporarily ignore how I get from A to B, but I will definitely focus on what exactly B does. Like, what is the functionality of B such that it provides business value?”

Why do you think it's hard to do that for data scientists? Do you think there are things that can help them be better at stakeholder communication?

“Yeah. I think one potential reason is [that] stakeholders and data scientists. They don't necessarily come from the same background, specifically, same education, same training or the same trajectory. So that fundamentally creates that difference or shift of focus.”

Stakeholders may need to put in the effort to learn the more technical aspects of their data science teams work:

“So I will say right off the bat, there is an information imbalance before you even start the meeting [and] talk to the stakeholder. Stakeholders don't necessarily know what the code is doing, so we really need to tune things down and then explain in a very understandable language [on] what this code is doing, and specifically how it's helpful.”

What are your thoughts of the split between focus on technical and soft skills (for businesses)?

“I think there are different layers to the word business. If business is referring to leadership, I think most of the leadership is actually focusing on management (of investment, people, R&D). [In terms of] looking into the technical jargons and improving technical skills then there's definitely a gap there. If referring to the nature of the business [such as healthcare or finance] I believe these industries are actually getting more technical. Meaning that on a year to year basis they are hiring more data scientists [and] senior data scientists. They are really trying to put more money on developing more code, you know really about, you know, it's really about data automation and I believe that the nature of business as an entity is indeed getting more technical.”

Making space for data scientists to learn on the clock. Is that something that you find that employers are doing?

“So from my experience data science as a career requires employees to be learning simultaneously as they're doing their job. I do believe the continuation of learning is kind of like in its job description by nature, meaning that it's very difficult for me to come up with a new model or new code repository if [I] stop learning for five years. If I have to do everything from scratch, that makes my job just very inefficient and usually does not serve the purpose.”

Learning is by necessity an integral part of a data scientist’s work:

“Even though the company may or may not list that on the paper, I do believe learning is part of the job description of a data scientist. [For example] different versions of packages are coming up on a day to day basis, right? You have to really keep yourself up to date and that is fundamentally learning by itself. So from that perspective I do believe it is definitely encouraged for you to step out of your comfort zone and [commit to] constant learning. Absolutely.”

Companies seem to be looking into adopting LLMs in the past year. What do you think are the barriers to entry into using these tools for companies?

“Outside of Google, Amazon, Microsoft, if we're just talking about [the] average company [where] their business is not necessarily in developing these large language models, then I do believe there is a barrier of entry. Typically, it comes down to twofold. The first is how do you hire the right talent? Because not everyone from master's degree or even PhD degree computer science or statistics can immediately pick up a large language model unless the dissertation is particularly targeted on a large language model. I will say there's definitely a learning curve there. So [finding] the correct talent is definitely one big barrier of entry for the regular companies to get into this field. The second thing is money.”

These factors, combined with the amount of infrastructure required makes adopting AI an uphill task:

“So what that means is for a company to get into large language models, you have two ways. You can of course develop your own large language model, meaning that you have to collect your own data. You have to train your own model and then you have to set up an API for your internal colleagues, other departments and enterprise level, and build a chatbot around the cause. Take care of the back end. Take care of the front end. Build that user interface. You really have to take care of the full-stack of development, and that's not something easy to do.”

On the first point you mentioned on hiring the right kind of people, what kind of roles do you see being relevant to AI in particular?

“I don't think there's a clear definition in that regard, of what is a data scientist, or what is a machine learning engineer. I think that line is very blurry. It depends on the nature of the company and how long the company has been involving AI. [So] every company has to come up with their own job description of what it means.”

At this point, nothing is set in stone for AI specific roles:

“So a data scientist as a regular company could be doing SQL and Python or data visualisation [and not] really touch modeling. There's definitely a wide range of definitions [for] the role of data scientist. I think it's up to the company's interpretation and what the team requires at that time.”

Do you find that being a data scientist involves taking on a lot of responsibility that you might not want to take on because of the lack of clear definition of these roles?

“Personally, I don't mind taking on a lot of responsibility. I feel like it's part of the learning curve because. I didn't go into a job expecting, you know, spending 8 hours doing coding. There's definitely a good amount of time spent communicating with stakeholders. Talking to different teams, doing code review, doing peer review on other people's projects. So these are all related responsibilities in addition to just coding or just doing the modeling.”

These additional responsibilities can help data scientists build a well-rounded skillset:

“So personally I definitely agree and I think I actually encourage a data scientists to not just do coding [and] go out of their way to take on more responsibility, whether if it's helping junior data scientists or interns, or talking to stakeholders or talking to your team, or just doing some extra curriculum, learning yourself. Just so that you are, you know, a well rounded data scientist [and know] a little bit of everything.”

Do you see a path for people who want to be specialists as well, or do you think it's the right thing to do to try to be a generalist?

“I'm actually debating on that myself as well. I don't think I want to be a specialist yet in terms of where I am in my career, I think there's definitely still plenty of things to learn. But in terms of a particular domain, I definitely see the needs of becoming a specialist because sometimes human experience can really go beyond machine learning, and especially when you have been thinking about the problems in that domain.”

Starting as a generalist, and then progressing into a specialist is always an option:

“So if you know this is your passion and you found that domain that you're interested in, and then you also are doing data science in that domain, then if all those things are checked, I will actually say perhaps it is time for you to be a specialist in that domain. To answer your question, yes, there is definitely eventually an opportunity for me to say, hey, you know, let's be a specialist, but probably not in the beginning of my career or any data science career, but probably in the middle or in the end of someone's career as a data scientist. I will say that would probably be a good trajectory.”

You mentioned the cost of implementing AI (pretty expensive). Do you think there are some cost friendly ways that may exist for small to medium size companies that want to adopt it?

“I think there's definitely some ways to do that. The more rudimentary that you unpack a piece of code, the cheaper it gets for example. If you want to use a data visualization software such as Power BI I need to pay for a subscription right? Because that package is taken care of by another company. That's the type of situation where I will say I will probably rack up your cost very fast. Depending on how many employees accounts that you're paying for [and] the quotas that you're using.”

The key to instance of minimizing AI adoption costs would be identifying the exact functionalities a business needs for their use case:

“The cost will basically go up pretty fast from there, but if you break things down to let's say all of the power BI functionalities: scatter plot, QQ plot [etc.]. Say you strip away some of these functionalities and then you're hiring a data scientist, build that in a rudimentary level in Python. Then I would say that's definitely much cheaper. So if cost cutting is the main agenda, I will say your best bet is to strip away a software package down to each particular component, and then you figure out a way to piece them together by yourself, using your own talent. So this way you get there with the scientists on your team and then you basically increase your human capital, and you also don't have to pay for that software service anymore.”

Among other things, our conversation with Yiqiao Yin highlights:

  • The need for stakeholders to take time to learn the technical aspects of their data teams work for better communication.

  • The benefits of specialist and generalist data science roles for talent and leaders.

  • How identifying stripped down functionalities can help in adopting custom AI tools for small to medium size businesses.

See you next week,
Mukundan

Do you have a unique perspective on developing and managing data science and AI talent? We want to hear from you! Reach out to us by replying to this email.

Reply

or to participate.