Why being a data scientist 'feels like being a magician'
- 06 October, 2016 23:00
The data scientist role was thrust into the limelight early this year when it was named 2016's "hottest job," and there's been considerable interest in the position ever since. Just recently, the White House singled data scientists out with a special appeal for help.
Those in the job can expect to earn a median base salary of roughly $116,840 -- if they have what it takes. But what is it like to be a data scientist? Read on to hear what three people currently on the front lines had to say.
How the day breaks down
That data scientists spend a lot of time working with data goes without saying. What may be less obvious is that meetings and face-to-face time are also a big part of the picture.
"Typically, the day starts with meetings," said Tanu George, an account manager and data scientist with LatentView Analytics. Those meetings can serve all kinds of purposes, she said, including identifying a client's business problem, tracking progress or discussing reports.
By midmorning the meetings die down, she said. "This is when we start doing the number crunching," typically focused on trying to answer the questions asked in meetings earlier.
Afternoon is often spent on collaborative meetings aimed at interpreting the numbers, followed by sharing analyses and results via email at the end of the day.
Roughly 50 percent of George's time is taken up in meetings, she estimates, with another 20 percent in computation work and 30 percent in interpretation, including visualizing and putting data into actionable form.
Meetings with clients also represent a significant part of the day for Ryan Rosario, an independent data scientist and mentor at online education site Springboard. "Clients explain the problem and what they'd like to see for an outcome," he said.
Next comes a discussion of what kinds of data are needed. "More times than not, the client actually doesn't have the data or know where to get it," Rosario said. "I help develop a plan for how to get it."
A lot of data science is not working with the data per se but more trying to understand the big picture of "what does this mean for a company or client," said Virginia Long, a predictive analytics scientist at healthcare-focused MedeAnalytics. "The first step is understanding the area -- I'll spend a lot of time searching the literature, reading, and trying to understand the problem."
Figuring out who has what kind of data comes next, Long said. "Sometimes that's a challenge," she said. "People really like the idea of using data to inform their decisions, but sometimes they just don't have the right data to do that. Figuring out ways we can collect the right data is sometimes part of my job."
Once that data is in hand, "digging in" and understanding it comes next. "This is the flip side of the basic background research," Long said. "You're really finding out what's actually in the data. It can be tedious, but sometimes you'll find things you might not have noticed otherwise."
Long also spends some of her time creating educational materials for both internal and external use, generally explaining how various data science techniques work.
"Especially with all the hype, people will see something like machine learning and see just the shiny outside. They'll say, 'oh we need to do it,'" she explained. "Part of every day is at least some explaining of what's possible and how it works."
Best and worst parts of the job
Meetings are George's favorite part of her day: "They make me love my job," she said.
For Rosario, whose past roles have included a stint as a machine learning engineer at Facebook, the best parts of the job have shifted over time.
"When I worked in Silicon Valley, my favorite part was massaging the data," he said. "Data often comes to us in a messy format, or understandable only by a particular piece of software. I'd move it into a format to make it digestible."
As consultant, he loves showing people what data can do.
"A lot of people know they need help with data, but they don't know what they can do with it," he said. "It feels like being a magician, opening their minds to the possibilities. That kind of exploration and geeking out is now my favorite part."
Long's favorites are many, including the initial phases of researching the context of the problem to be solved as well as figuring out ways to get the necessary data and then diving into it headfirst.
Though some reports have suggested that data scientists still spend an inordinate amount of their time on "janitorial" tasks, "I don't think of it as janitorial," Long said. "I think of it as part of digging in and understanding it."
As for the less exciting bits, "I prefer not to have to manage projects," Long said. Doing so means "I often have to spend time managing everyone else's priorities while trying to get my own things done."
As for Rosario, who was trained in statistics and data science, systems building and software engineering are the parts he prefers to de-emphasize.
Preparing for the role
It's no secret that data science requires considerable education, and these three professionals are no exception. LatentView Analytics' George holds a bachelor's degree in electrical and electronics engineering along with an MBA, she said.
Rosario holds a BS in statistics and math of computation as well as an MS in statistics and an MS in computer science from UCLA; he's currently finishing his PhD in statistics there.
As for MedeAnalytics' Long, she holds a PhD in behavioral neuroscience, with a focus on learning, memory and motivation.
"I got tired of running after the data," Long quipped, referring to the experiments conducted in the scientific world. "Half of your job as a scientist is doing the data analysis, and I really liked that aspect. I also was interested in making a practical difference."
The next frontier
And where will things go from here?
"I think the future has a lot more data coming," said George, citing developments such as the internet of things (IoT). "Going forward, all senior and mid-management roles will incorporate some aspect of data management."
The growing focus on streaming data means that "a lot more work needs to be done," Rosario agreed. "We'll see a lot more emphasis on developing algorithms and systems that can merge together streams of data. I see things like the IoT and streaming data being the next frontier."
Security and privacy will be major issues to tackle along the way, he added.
Data scientists are still often expected to be "unicorns," Long said, meaning that they're asked to do everything single-handedly, including all the coding, data manipulation, data analysis and more.
"It's hard to have one person responsible for everything," she said. "Hopefully, different types of people with different skill sets will be the future."
Words of advice
For those considering a career in data science, Rosario advocates pursuing at least a master's degree. He also suggests trying to think in terms of data.
"We all have problems around us, whether it's managing our finances or planning a vacation," he said. "Try to think about how you could solve those problems using data. Ask if the data exists, and try to find it."
For early portfolio-building experience, common advice suggests finding a data set from a site such as Kaggle and then figuring out a problem that can be solved using it.
"I suggest the inverse," Rosario said. "Pick a problem and then find the data you'd need to solve it."
"I feel like the best preparation is some sense of the scientific method, or how you approach a problem," said MedeAnalytics' Long. "It will determine how you deal with the data and decide to use it."
Tools can be mastered, but "the sensibility of how to solve the problem is what you need to get good at," she added.
Of course, ultimately, the last mile for data scientists is presenting their results, George pointed out.
"It's a lot of detail," she said. "If you're a good storyteller, and if you can weave a story out of it, then there's nothing like it."