Five Qualities of an Effective Data Scientist
5 Qualities of an Effective Data Scientist
Government organizations recognize the value of data in helping them operate more efficiently and serve the public more effectively. As a consequence, they’re increasingly hiring data scientists to help them gather, manage, understand, and act on their information resources.
Data scientists – and similar job titles, like data architects, engineers, analysts, and administrators – can play a crucial role in your organization, and they can also be challenging to attract and retain. So, it pays to approach their recruitment thoughtfully to bring in individuals with the right acumen, knowledge, and skills to advance your agency’s mission.
Based on GCOM’s long experience as a data-centric organization that fields top data-science talent, here are five traits to look for in a data scientist:
Good data scientists come from many backgrounds, and there’s no single path to success. When vetting candidate for GCOM’s advanced analytics team, we typically look for people with college degrees in data science, computer science, math, statistics, or economics. We also find successful candidates with degrees in social sciences or natural sciences such as physics, astronomy, or chemistry. These programs tend to have a strong quantitative foundation and similar approaches to problem-solving.
In addition, we look for experience with statistical software tools and open source programming languages such as Python, R, Scala, or PySpark. Many of the software libraries we rely on for computation and data analysis, such as pandas and scikit-learn, are written in Python.
Aspiring data scientists in college or considering a career change should start with programming languages and the basics of relational databases. Knowledge of SQL is crucial, because it allows data scientists to analyze how data interacts with other datasets. Because data seldom starts out in the format needed, a basic understanding of data wrangling is also useful.
A knack for problem-solving and pattern recognition
Data scientists need strong logic and problem-solving skills to identify errors in data models and find opportunities for improvements. Double-checking work and second-guessing results are best practices.
Good data scientists can also see patterns and similarities between current projects and completed work. Leveraging past data models can accelerate work on new challenges.
Such attention to detail has been invaluable, for example, in GCOM’s work with the IRS to identify and mitigate causes of the tax gap. The difference between taxes owed and taxes paid voluntarily and on time approaches $500 billion a year. GCOM has deployed a variety of capabilities to help the IRS close this gap. These include:
- Machine learning (ML) and advanced analytics to detect and prevent identity theft and fraud
- Behavioral analytics to help agents identify high-value cases and the best enforcement strategies
- Performance monitoring to measure the effectiveness of enforcement
- Program evaluation to improve internal operations and drive system modernization
- Research to examine how taxpayers interact with the IRS and improve taxpayer experiences
All these efforts are supported by data scientists.
A flair for visualization
A picture is worth a thousand words, and visualizations are often crucial to understanding data outputs. Data scientists who are good at generating visualizations add tremendous value.
Data visualizations have been central to the Virginia Analysis System for Trafficking (VAST). GCOM worked with the Commonwealth to develop data-driven approaches to combating human trafficking in the state. VAST provides the Department of Criminal Justice Services and health and human services (HHS) organizations with insights into victims, offenders, and factors that increase incidence.
Stakeholders can view color-coded maps and breakdowns of incidents by type of trafficking.
They also benefit from charts and graphs that show victim and perpetrator demographics, locations, and related issues such as drug possession. Such visualizations, enabled by our advanced analytics team, can reveal hidden correlations and lead to effective interventions.
The ability to translate for a nontechnical audience
Data analysis produced by technical experts is typically consumed and acted on by nontechnical stakeholders. Good data scientists keep in mind that data science is more than code, and technical information must often be presented to policy and program decision-makers.
Clear understanding of data has been key to the success of the Framework for Addiction Analysis & Community Transformation (FAACT). A collaboration of GCOM and several Virginia agencies, FAACT is a data-sharing platform that’s helping the Commonwealth address opioid use. For instance, one FAACT dashboard plots the locations of overdose incidents and medical facilities, allowing local officials to intelligently deploy resources.
FAACT was enabled by a data trust, a legal framework that defines roles and responsibilities for the organizations that share data. Gaining stakeholder buy-in and equipping participants to consume shared information involved translating technical requirements and benefits into terms leadership could understand and act on. The payoff is a highly successful initiative that has benefited the public and been extended for multiple use cases.
A love of learning
Data science technologies, methodologies, and computational paradigms change fast. For instance, GPU processing, which can accelerate processing-intensive operations, is still new to many experienced data scientists. Aspiring data scientists must be willing to keep up with industry changes and continually learn new technologies and approaches.
Don’t underestimate the value of online training and bootcamps. GCOM includes team members who leveraged such training to make successful mid-career switches from social work, higher education, and scientific research to data science.
Data scientists should also check out Kaggle.com, a data scientist community. Kaggle regularly runs competitions, and it’s interesting to see the business problems presented and the solutions the community responds with. Comparing a wide variety of approaches to the same problem is a great way to learn from experts.
Agencies will continue to find opportunities to leverage data to operate more efficiently and serve constituencies more effectively. Recruiting data scientists with the traits described here can help them transform their data-driven visions into reality.