Category Junior Data Analyst Data Analyst Data Scientist
Responsibilities Reporting and Interpretation Data curation and contextual interpretation Definition, curation, interpretation, and prediction
Data Timeframe Past: what happened Past and Present: what happened and what is changing? Past, Present, and Future: what happened and what should we expect?
Interpretation Mindset Quantitative: Statistical Quantitative and Qualitative: Statistical and Industry-Context Quantitative, Qualitative, Intuitive: Statistical, Industry-Context, Possibilities
Independence Requires specific instruction and hand-holding Works independently and generates new ideas Works independently, generates new ideas direction
Relational Database
Data Definition Language
Has used one or more SQL systems

Has created simple tables with CREATE TABLE or SQL IDE
Has implemented databases according to best practices

Is aware of normal forms (e.g., BCNF, 4th normal form) and can describe them in plain English

Can map schemas in an Entity-Relationship diagram

Understands one SQL implementation well (e.g., MySQL, PostgreSQL, DB2)

Understands foreign key relations and indices

Understands implication of table and index design on query performance
Understands multiple SQL implementations well

Knows how to design systems that move data between multiple DBs, including data governance, timeliness, and robustness

Understands replicas and how they can reduce load and ensure adequate data flow to clients

Understands partitioning, sharding, and other advanced DB topics
Relational Database Data Manipulation Language Has used SQL SELECT, JOIN statements

Leverages knowledge of more senior developers when writing queries

Understands aggregates, indexes, foreign keys, uniqueness, joins

Knows basic ETL and data cleansing: can export and import CSV files from SQL tools

Can ensure data in a file is unique before uploading it

Knows how to utilize built-in SQL functions for strings, arrays, datetimes, casting, etc.
Knows advanced SQL: LEFT/INNER JOINs, subqueries, temporary tables, joins between databases, indices, triggers, views

Knows how to deal with uniqueness, find missing data, etc.

Understands how to analyze query effectiveness using analyze/explain, as well as tools for understanding such an analysis (e.g., https://explain.depesz.com for PSQL)
Understands multiple SQL implementations well

Understands Export, Transform, Load (ETL), data cleansing, data masking, data encryption, data hashing, etc.

Understands how to address performance, including when to re-think the schema, sharding, choice of database sytems, etc.
Relational Database Transactions Understands CRUD (create, retrieve, update, delete) but has no experience with transactions Understands transactions, atomic updates, and two-phase locking

Knows the performance impact of row, table, and system locks

Knows when to use read-uncommitted, read-committed, repeatable read or serializable
Has implemented distributed transactional systems

Knows how to reconcile transactional systems and how to build consistent data between systems and a data warehouse
NoSQL Database
Data Manipulation Language
Has no NoSQL experience

Does not understand the NoSQL use-case
Has worked with NoSQL (e.g., BiqQuery, Redis, Mongo, Hadoop, Cassandra)

Can describe NoSQL at a high level

Understands when to use NoSQL vs. relational DBs
Is familiar with BigQuery, Hadoop, Hive, Pig, Impala, Spark, HUE and/or other Big Data systems

Can design complete Big Data systems including corporate data flow, data goverance, data lifetime management, etc.

Understands normalizations and Big Data denormalization trade-offs

Understands how to analyze NoSQL query effectiveness and optimize NoSQL queries to make better us of resources
Modeling Data Requests Implements what is requested

Identifies the data needed to satisfy a request
Refines the request until it is clear to both the requestor and the implementor: ensures the results are targeted at the question and ensures the results are actionable

Cleanses data before starting work: works to eliminate (or at least identify) bad data, holes, biases, etc.;

Develops additional queries to supplement knowledge
Clearly documents the sources of data and limitations of reports
Designs new ways of looking at the data

Develops new terms and items of interest

Identifies whether the data has impact on the corporate view of the world

Works to build common models that can be applied to other cases
Reporting on Data and Segments Delivers reports and data collections (e.g., segments) as requested Assesses scale and determines how best to build reports/queries

Understands cognitive biases and knows what to do about them

Assesses results for anomalies, flags bad data, identifies interesting opportunities

Draws conclusions, identifies trends
Sets up recurring reports as appropriate

Builds and reports confidence metrics
Designs reports and queries and identifies new segments of interest (e.g., engagement segments)

Identifies ways to improve confidence and reduce error

Reviews analyses for bias, and trains others

Develops models to correct data based on seasonal or industry trends

Builds forecasts and extrapolations (with confidence assessments)
Visualizations Uses Excel to build line, bar, pie and other visualizations

Uses appropriate labels
Knows R, Tableau, SAS, or other visualization tools and builds advanced visualizations

Develops standards to ensure clarity and consistency of reports

Clearly states issues, unknowns and limitations

Highlights the 'take aways'

Can defend methodology used for collecting and analyzing the data
Knows many visualization tools and can teach how best to use them

Designs new visualizations to clarify the data

Documents long-term and external trends and uses them to clarify visualizations and results
Context Implements what is requested  Clarifies requests until both the requestor and the analyst agree on what is needed

Ensure results will answer the question and are actionable

Cleanses data before starting work: identifies and eliminates bad data, holes, biases, etc.

Understands external factors that affect analyses, such as seasonality, industry norms, competing effects, etc.

Clearly documents the sources of data and limitations of reports
Identifies common questions and underlying assumptions

Develops additional queries to supplement knowledge

Is an industry expert and helps develop norms and common understanding

Foresees future problems and opportunities and moves to address them 
Statistics Understands common descriptive statistics, including populations, sampling, probability, outliers, standard deviation, variance, etc.

Can run standard deviation and variance calculations

Knows the difference between correlation and causation
Understands inferential statistics, including z-test, t-test, etc.

Can run p-test and other confidence metrics and can determine if an analysis is significant or not

Measures and evaluates correlation and causation and can make a case for causation when appropriate
Understands serial A/B testing

Is familiar with Bayesian statistics and can compute probabilities based on prior knowledge
Can teach advanced descriptive and inferential statistics

Is expert in confidence tests and can explain which to use

Identifies new approaches for determining significance and confidence

Can apply Bayesian statistics to create meaningful metrics and to evaluate hypotheses
Programming Understands how scripts may be used for file transformation, database loading, etc.

Has some experience with languages like Python or Java but may not be comfortable writing programs

Relies on others to develop solutions that include programming
Understands scripts and programs for file transformation, database loading, etc.

Has worked with statistics in multiple programming languages, such as Python, Scala, R, or Matlab and can write short scripts as needed

Works with developers to design custom solutions
Understands computing solutions for a wide range of Big Data projects

Understands the challenges of writing reliable software in high-level languages like Python and Java, and when to use statistically-focused languages like R or Matlab

Understands how to work with software developers as part of a team
Machine Learning Has no experience with machine learning Has tried machine learning at least once with a toolkit like scikit-learn or TensorFlow

Understands the concepts, how to define the problem and how to run and evaluation the solution

Can build a business case for the use of machine learning
Is familiar with many machine learning tools, such as scikit-learn, TensorFlow, Mahout, Amazon Machine Learning, H20, etc.

Creates proof-of-concept projects that prove business value of machine learning solutions

Can build a training corpus, implement a machine learning algorithm, and test the resulting solution

Knows how to get buy-in from senior management
Hypotheses Works on the tasks as given Develops new hypotheses to test, in response to specific business cases, e.g., adding a new segment will increase a campaign CTR on a particular campaign

Organizes and runs the tests

Controls the results for external factors

Evaluates the results and makes recommendations
Develops new hypotheses to test, based on broad business concerns, e.g., all campaigns should include blacklists

Coordinates testing with other teams

Evaluates the strength of the results and whether they can be accepted as a new norm
Testing Presents results without testing

Consults with AdOps to coordinate a pre-defined testing campaign
Understands the role and challenges of A/B Testing

Designs testing campaigns

Understands the use of Serial A/B Testing

Provides feedback: what did we learn?
Defines goals of testing

Defines parameters of testing: sample sizes, end conditions, significance, etc.

Summarizes and evaluates overall testing strategy and suggests improvements
Trends and Predictions Focuses only on the project at hand Maintains long-term view of performance

Evaluates against industry averages

Identifies significant trends and socializes the results
Spots differentiators and develops models of trends that can be leveraged

Uses forecasts to predict future behavior

Socializes suggestions for better results
Industry-specific Knowledge Knows what the business does (revenue streams)

Is conversant with terms needed to work with Sales and AdOps
Understands campaign life-cylce, ad units, placements, relationships to clients

Knows about industry topics including fraud, viewability, targeting, and how they related to different advertising channels
Understands how the ecosystem fits together

Quickly suggest adjustments when the technology stack is changed or expanded

Has a deep understanding of industry topics, including various types of fraud and how they are typically combatted
System Design Has not done system-level design Understands how data flows in a system

Can decouple dependencies to avoid issues with changes in data flow

Identifies missing data or data match opportunities and escalates them
Understanding of how systems interact and can design architecture to integrate disparate systems

Designs and documents workflows, metrics and team interactions

Able to effectively communicate architecture design including specific examples
Written and Oral Communication Communicates effectively with peers and business stakeholders

Prepares presentations on results and and answer questions clearly, in the business context
Can effectively describe technical approaches or problems in plain English

Communicates up / escalates when needed

Builds standard formats for common questions

Does presentations to ensure everyone is on the same page

Provides context around results, comparing them to company and business norms
Communicates effectively at all levels

Prompts the audience to understand level of understanding discreetly, adapts to the situation as needed

Presents forecasts and predictions to corporate leaders, with appropriate context and caveats
Process Participates in planning / prioritization and wrap-ups / reviews Helps to enforce good process amongst more junior analysts

Has good estimation abilities during planning / prioritization

Identifies areas where process can be improved and escalates this to management / senior analysts
Helps to  improve process where it already exists or implement new process where it does not
Agility Requires multiple days of investigation before implementation begins for complex tasks. E.g., implementing something in a new framework.

Requires several further days for implementation.

Requires several further days for incremental improvements.
Requires 1-2 days of investigation before implementation begins for complex tasks. E.g., implementing something in a new framework.

Requires multiple days days for implementation.

Requires multiple further days for incremental improvements.
Requires <1 day of investigation before implementation begins for complex tasks. E.g., implementing something in a new framework.

Requires multiple days days for implementation.

Requires few days for incremental improvements.
Adaptability Works hand-in-hand with a subject-matter expert on new tasks.

Able to take previous learnings and adapt them to new situations.
Able to understand new and complex tasks and research each component to come up with solutions (to be approved by a subject-matter expert).

Can easily pick up new frameworks for a language in which they are strong.
Able to drop into new frameworks and languages in familiar paradigms.
Thought Leadership Focuses only on the project at hand Investigates upcoming technologies

Shows an interest outside of work in statistics, analysis, and technology

Identifies problems that may effect the organization and escalates them to management / senior analysts
Makes technology recommendations to the organization

Creates trainings and presentations for internal and potential technologies

Acts as a subject-matter-expert, writing on a subject or subjects of expertise (being published somewhere is preferable)
Leadership Focuses only on the project at hand Works with junior analysts to help them to understand difficult topics

Is able to guide junior analysts with respect to problem size estimation
Works with data analysts to help them to understand complex topics and architecture

Provides problem-solving strategies and learning opportunities

Is able to assign work based on skill-level
Interpersonal Skills Works well with others

Takes direction from senior analysts and management

Asks questions respectfully

Shares opinion openly and honestly, but respectfully

Escalates interpersonal issues to leadership for aid with mediation
Works to clarify needs from management and business stakeholders effectively

Disagreement with colleagues are managed with respect and purely on the merits of the issue

Actively builds relationships across the organization
Aids in conflict resolution using analysis and information vs. opinion

Helps to resolve conflict in a timely manner

Executes decisions made by the team and/or management with a positive attitude, regardless of personal opinion

Independently resolves conflict

Manages expectations with stakeholders, identifying and informing key stakeholders at the right times

Asks the right questions of stakeholders to ensure successful execution of projects
Self-Awareness Internalizes feedback.

Can take constructive criticism and use it to improve.
Understands shortcomings and actively seeks to improve with the help of manager / senior analysts.

Actively solicits feedback both professionally and interpersonally to improve skills.
Actively and quickly course-corrects in response to negative feedback (perceived or actual).

Understands short-comings and self-improves through individual learning, coursework, or otherwise.
Career Ownership Self-motivated

Takes an active part in own career path, asking questions and taking guidance from management / senior analysts
Defines own quarterly goals / MBOs

Takes accountability for delivery and communication of their projects and/or components
Actively learns new skills, languages, and tools to enhance  technical and business acumen

Takes accountability for delivery and communication of cross-functional projects (even when dependent on other resources for execution)

Makes plans for development and executes effectively
Problem Ownership Tasks are assigned by manager / team lead.

Completes tasks to the best of their ability.

Requires help from subject-matter experts to overcome obstacles encountered during delivery.

Delivers tasks on time.
Takes accountability for problems / tasks.

Identifies solutions and their timelines for problem resolution.

Raises issues early related to the issue and /or delivery and communicates them to tech lead and / or management

Takes responsibility for failings, perceived or real.

Identifies more than one way to resolve a problem, requires the help of a subject-matter expert to make decisions.
Organizes tasks to buffer for potential issues, setting clear expectations with consumers / management.

Course corrects quickly when problems arise, taking responsibility and working towards a resolution without hesitation.

Identifies more than one way to solve a problem and is able to communicate the cost/benefits of each one and why they chose one solution over another.