- Data Talent Pulse
- Posts
- 👩💻 Preparing Data Scientists for Real World Scenarios
👩💻 Preparing Data Scientists for Real World Scenarios
DTP #36: Q&A with Marcelo Guerra Hahn
We spoke to Marcelo Guerra Hahn, engineering leader and educator at Lake Washington Institute of Technology, on how to make sure data scientists are prepared to enter the workforce, communication in the corporate world, and the impact of AI in e-commerce.
Quick summary:
Emphasis on understanding the context in data analysis.
Avenues for sourcing real-world examples.
How corporate environment introduces challenges in explaining results.
AI in e-commerce, impacting businesses with predictive analytics and evolving features like chatbots.
Trend towards generalist data scientists as tools evolves.
🌐 AI Weekly News Roundup
of AI. Over 400 federal departments sought Chief AI Officers in 2023. However, some experts caution that the rapid evolution of AI may eventually lead to a more integrated approach across various roles.
Microsoft, once at risk of irrelevance in the smartphone era, has experienced a remarkable turnaround. Reporting its fifth consecutive quarter of record revenue, reaching $62 billion, it surpassed a $3 trillion market capitalization.
What ideas do students have about data science in the real world, and how do you make sure they're prepared to enter the workforce?
There are three types of students: Students with no data science background who are more aligned with what you are asking - they're trying to get into the field and don't have a strong preference for a field.
Then you have students with a field, and data analysis has come into their field. These students are interested in advancing their careers or want to do more in their current jobs; data analysis can be part of that.
Finally, you have a set of students who already have a job and are doing data analysis, but they want to get to the next level of data analysis.
I always tell people that data analysis is only easy to do with context. You give me a data set without the variable names. I can do a lot of math with it and come up with some ideas. However, if I know what the variables mean, I will be able to give you a reasonable answer. Then, you look at things like correlations or other models. The findings depend a lot on the context.
In this context, we want to help them get the data analysis part right, and then they can take that back and combine it with their knowledge of the context.
Suppose they already have a job and are already doing data analysis. In that case, the application is much simpler because now they’re just trying to apply more complicated models to the data you have and are familiar with. Then, in the second category, if they have the context and are trying to do data analysis, we try to get them to apply the data analysis to that.
If they are a student who is not working in any particular field, then we have to show them examples and say, here is an example from this field, and here are examples from other fields, and then you play with all of them and maybe you find one of those that you like best.
So, the context varies by industry, but the methodologies are what you teach them so they can apply it anywhere.
Yes, especially the earliest stage methodologies, like everything that has to do with correlations and regressions. Those are very cross-industry.
You mentioned providing examples for students who may need more experience. How do you formulate those examples to match whatever is a real problem?
There are three primary sources. I've seen a lot of data in my life, so I have some examples I have seen and liked in the work, so I'll use those.
Other students are very good at providing good data sources, so I can take ideas from there once I meet a student in a new field that I have yet to see or that has an interesting data set. I can take the concept and then create some data to match it.
The third one is the Internet. Sites like Kaggle, a popular data repository, are also very useful. After teaching for a while, you start liking certain datasets because certain datasets are simpler for teaching a concept than others.
Do you find that other than the context [for data], there are other things that students may need to learn in their universities but experience for the first time in a corporate environment?
The trickier part in the corporate environment, especially in data analysis, is explaining your results. In the context of a class, say, they have to do a project. When you explain the project, you explain it to the class and the professor, which means we all know what you are doing because we've been doing this kind of analysis for a while.
Unlike when you go to the business context, now you're explaining this to people who need more context on the data analysis. There is not necessarily a lot of context on your project, and likely only a little time. Mainly, this is learning how to present to high-level people.
If you are presenting your analysis to, let’s say, your vice president, it's a very different dynamic than what you do in the class. That is what people end up more surprised about. How do you tell someone the whole story in 2 minutes versus in the class, where we ask you to create a 10-15 minute presentation?
The communication aspect of it takes some adjustment.
Specifically, it is the communication with people who are not your cohort. When you're in the class and talking to other students, you are all speaking the same language in a way - you are all taking the same class, reading the same book, and doing the same homework.
Then you go out where most people need that context. And now, you have to convince people of ideas using data and share your findings. People don't want that once you start showing those charts or there is too much explanation. In the class, we want the opposite - you are trying to explain what you learned, so you're trying to make the chart complex and describe every detail or how you came up with each part, whereas in the real world, it's different. The audience is thinking: I believe you. You can do the analysis. Now tell me your findings.
You’ve also spoken about AI in relation to e-commerce. From your perspective, has it started producing actual impact for businesses?
Yes, but in an exciting way. Amazon has been ahead in this area for a long time.
Recommendations in Amazon have been around since their books era and many other features like suggesting delivery times.
Now, with more AI tools available, more people can do what Amazon does. If I want to do predictive analytics for my company, it's not just an Amazon thing now. It's much easier for me to do it. There is a movement towards democratizing the things that the big companies already have.
Other dimensions are still evolving, like chatbots. Where in the past you had to have a person answering the phone all the time, now you can have a chatbot. This is still very early, though. If you have interacted with a chatbot, you can tell that many people end up typing “representative” or “help” because, for the most part, people need help. However, AI is learning about both customer behavior and the company and is likely to get to a point where it is very usable.
Concerning skill development for data scientists. It is becoming increasingly necessary for data scientists to be skilled in everything from coding to analysis to visualization. Is it to develop all those skills, or is it possible to still be a specialist in a role?
It is still possible to be a specialist in an area as these components have very specialized ideas. It’s one thing for me to draw a simple scatter plot in a tool like Tableau and a very different one for me to do complex analysis using advanced features such as level of detail. There is still space for experts.
What we're seeing, though, is that there is also space for generalists now. In the past, if you wanted to create a visualization, like a map, you had to know a lot about drawing and pictures and latitudes and longitudes. You would have to study quite a bit to do it, whereas now there are a lot of tools that do that very cleanly and make it easier for the user. What is happening is that the tools have evolved so that a person could theoretically do all the pieces of the analysis as long as you can understand if the output makes sense.
💼 AI in Business
6 Ways AI Could Disrupt Your Business
In navigating the transformative landscape of AI, boards of directors must proactively engage with six critical scenarios that could significantly impact their businesses. An article from Harvard Business Review expands:
Gains Through Granularity:
Opportunities or threats as AI scales complexity in managing variables driving EBITDA.
AI's impact on personalization, testing, and generating optimal options for each customer/channel.
A Reshaped Partner Ecosystem:
Changes in partner ecosystems, collaboration nature, and power balance in the AI era.
Examples from the auto industry, emphasizing partnerships evolving due to autonomous driving and AI.
Snowballing Risk and Expansive Regulatory Regimes:
Challenges of expanding risks from AI and the costs of mitigating them.
Key risks include data privacy/security, bias and unfairness, transparency/explainability, regulatory compliance, and de-risking dependency.
Radical Cost Transformation:
AI's impact on cost structures, potentially moving towards software economics.
Disruption in professional service firms' economics, affecting billing structures and talent management.
Value Proposition Redefinition:
AI challenging core business assumptions, prompting reevaluation of value propositions.
Shifting value propositions in health systems, emphasizing proactive wellness through AI-powered virtual health assistants.
Obsolescence:
AI's potential to make core products and offerings obsolete.
Example of offshore IT firms facing existential threats as AI automates coding and testing, shifting developers' roles.
These scenarios provide a structured framework for boards to assess the impact of AI on their business models and develop proactive strategies.
💻 Platform Highlight
Kore.ai - Recently raised $150m to drive AI powered customer and employee experiences.
TextQL - Raised $4.1m in funding for AI development in automating the Data Science lifecycle.
DXwand - Enterprise focused conversational AI platform. Raised $4m in funding.
What differentiates a junior, mid, or senior level data scientist? - A Reddit thread
“My takeaways from attending WEF at Davos last week: " – A tweet
🤖 Prompt of the week
Act as a natural language processing expert. I have a text dataset [describe dataset]. Please help me extract named entities using SpaCy.
See you next week,
Mukundan
Reply