What does it take to be a data scientist at Sentiance?
Our VP Chief Data Scientist, Vincent Spruyt is heading up the application of machine learning methods for context modeling and behavioral profiling based on mobile sensor data.
With his background in computer vision and machine learning, he laid the foundations of several AI solutions that quickly made Sentiance one of the strongest competitors in the field. He is responsible for growing a data science team to further cultivate these early successes. Today, almost three years later, the data science team has grown to include 12 hand-picked and extremely talented people of five different nationalities and with diverse backgrounds ranging from PhDs in physics or mathematics to academic and industrial backgrounds in computer science, robotics and neuro-computing.
One of his most essential achievements is the sourcing of a stellar data science team that is eager to work closely together and to deliver production-ready AI solutions that outperform existing approaches described in academic literature.
We sat down with Vincent to talk about what it takes to become a data scientist at Sentiance.
BEFORE DIGGING INTO THE TECHNICAL REQUIREMENTS, CAN YOU TELL US WHAT ARE THE PROS OF WORKING AS A DATA SCIENTIST AT SENTIANCE?
Sentiance is an extremely dynamic research environment where there is always someone who knows more than you do and who is eager to share their knowledge with the team.
Out of our 50 Sentiance employees, about 45 are data scientists, data engineers and software developers. The whole tech team represents 14 different nationalities and all kinds of backgrounds, ranging from PhDs in machine learning to fresh computer science graduates, with a median age of 30.
Second, because of the nature of our product, we need to be at the forefront of innovation in the domains of machine learning, signal processing and big data architectural design. There aren’t many companies out there who focus on each of these fields simultaneously.
Continuous innovation is at our core. Staying up to date with the state-of-the-art literature is considered standard practice. Improving upon that literature and outperforming what is on the market today, is each and everyone’s mission from a technical perspective.
WHEN YOU’RE HIRING A DATA SCIENTIST, WHAT QUALITIES DO YOU LOOK FOR?
Our senior data scientists are experts in three related fields:
A junior data scientist at Sentiance is expected to have a strong background in at least one of these domains, whereas a data scientist is an expert in two of them.
For new hires, this usually boils down to two requirements; A strong software engineering background, and a mathematical understanding of machine learning concepts such as Bayesian theory, dimensionality reduction, kernel methods, etc.
As it is getting increasingly more difficult to find these unicorns, a third requirement is the ability to work with team members of different backgrounds and with different skill sets. At Sentiance we work in small, cross-functional teams consisting of data scientists, data engineers, mobile developers and solution architects.
Finally, a crucial part of the hiring process is to check if there is a fit with our company culture. Work hard, play hard. A constant strive for perfection, and a realization that the electric light did not come from the gradual improvement of candles. We need people whose start at Sentiance is just a continuation of their hobby.
WHAT ARE THE DAY-TO-DAY RESPONSIBILITIES OF A DATA SCIENTISTS AT SENTIANCE?
I don’t think there is a typical day in the life of a data scientist. There are weeks where you would mostly be reading papers and doing research, and there are weeks where you would be testing and debugging production Python code for most of the day. There are times at which you feel more like a salesperson, trying to explain your algorithms to investors or management, and there are times when you geekout with your colleagues about the latest deep learning paper on Arxiv.
One day-to-day responsibility that is shared by all of us, is innovation; You don’t come to Sentiance to implement a set of given requirements. We don’t hire the brightest minds just to tell them what to do. Instead, we need them to bring us to the next level, to come up with the best ideas and to actually bring these ideas to production.
WHICH SKILLS AND PROGRAMMING LANGUAGES DOES A DATA SCIENTIST AT SENTIANCE MOST FREQUENTLY USE?
Our programming language of choice is Python. The Python eco-system contains a huge amount of libraries and tools for both signal processing and machine learning. For us, Python represents the perfect balance between the need for rapid prototyping and research iterations on the one hand, and delivering production ready code on the other hand. On top of that, important distributed computing frameworks such as Spark support Python out of the box.
Although Python is the language we speak, we expect every data scientist to have a strong software engineering background, allowing him or her to play around with other languages such as C++, Java or even Lua if needed.
On the research front, data scientists should be able to quickly scan, interpret, understand and assess academic papers. This means being familiar with mathematical notations, linear algebra, and academic lingo.
Finally, a crucial skill of any data scientist is the ability to match business requirements with research goals and to understand the bigger picture. Do we really need to use deep learning for every project or is a simple logistic regression approach sufficient for a specific task? Should we maximise accuracy or computational efficiency? Could we reuse a specific approach for other projects in the near future? etc.
ANY ADVICE FOR PEOPLE THAT WANT TO WORK AT SENTIANCE?
Hack around and start some machine learning projects in your spare time. Don’t simply focus on following MOOCs or completing Kaggle competitions using XgBoost. What we need, are not just software engineers or mathematicians, but rather entrepreneurs with an incredibly strong background in these two fields.
Maybe it would be helpful to list what we are definitely not looking for:
- Academic researchers without any software engineering background apart from occasional Matlab scripting
- Self-proclaimed Deep Learning experts without any general machine learning background or experience.
- Citizen data scientists that know how to use machine learning libraries but don’t know how they work internally.
Let me explain each of these three in a bit more detail. The first category is a tough one. We often talk to very talented and intelligent people from the academic world with backgrounds in math or physics that never learned how to program. The problem is that becoming a software engineer is not something you can quickly pick up by reading a book or two; software engineering should be your second nature, obtained by means of experience.
The second category is a rather recent one. Although we obviously love deep learning, and apply it at Sentiance where appropriate, it does not make sense to call yourself a deep learning expert if you just picked up a few MOOCs and completed a TensorFlow tutorial. Before thinking about deep learning, think first about SVMs and kernel methods, Bayesian techniques, Gaussian Mixture Models, Hidden Markov Models, topic modeling, dimensionality reduction, matrix factorization, etc. Consider Bishop’s book on pattern recognition to be the bible, and make sure to truly understand the fundamental concepts explained therein. Deep learning then just becomes a matter of having a specific type of experience as a seasoned machine learning expert.
The third category contains good software engineers that learned how to use libraries such as scikit-learn or even Weka as black box solutions. Coping with noisy data and building state-of-the-art solutions at Sentiance requires a deep understanding of, and the ability to make changes to well-known machine learning concepts. This in turn means that a strong mathematical and analytical background is crucial.
CONSIDERING THE APPLICATION PROCESS, DO YOU HAVE ANY ADVICE FOR APPLICANTS ABOUT THEIR CV AND JOB APPLICATION?
Be open, be honest. Make sure to list your spare time projects, and the algorithms you know well. Don’t just list up a set of hyped keywords; everyone else is doing that already. And, although many machine learning adepts consider it less sexy than algorithmic stuff, definitely cite your software engineering skills as well.
WHAT PROJECT ARE YOU EXCITED ABOUT? ARE THERE NEW EXCITING PROJECTS COMING UP?
At Sentiance we have a healthy mix of traditional machine learning where appropriate, and deep learning based solutions where they really add value. We recently found out that our deep learning based event prediction model is able to make incredibly accurate and specific predictions, and are now extending this framework to actually capture latent variables underlying a user’s actions.
In the form of embeddings, these models closely resemble what Geoffrey Hinton calls ‘thought vectors’, representing a user’s intent. Without giving away too much information for now, let’s just say that our short term innovation roadmap will result in some super-human like capabilities allowing our platform to reason about human behavior on a much deeper level.
That is a great insight! Thank you, Vincent, for the information on the requirements and the advice!