To Be And To Last As A Data Analyst

This is a living document summarizing some of the practices I’ve personally found to enable (1) peaceful / calm work practices that are (2) effective at accomplishing difficult projects which (3) have measurable, positive impact on myself, my team, and my customers. Your mileage may vary, so take with a large grain of salt. :)

Preface 1: Why?

To be and to last? Like many fields, parkour attracts two types of people: those who want to do everything now and those who patiently build for the long-term. Simply rushing to be inevitably results in consequences (physical injury in parkour, technical debt in coding, etc). Better to do things the slow way - humbly studying the basics, internalizing proper thinking, and building on a strong foundation that will last.

My intention in this document is to explore some of those foundational skills that separate good from great (aka, “career capital”), and then outline a few approaches to deliberately practice.

  1. Career capital: The finite game approach to employment is to negotiate salaries and maximize rewards today. The infinite game approach is to maximize value add, or “career capital”, by building a set of rare and valuable skills.

  2. Deliberate practice: Would you rather have 30 years of experience or 1 year of experience 30 times? Deliberate practice is about explicitly pushing just outside one’s current skill level in a certain area in order to accelerate growth.

  3. Deep work: Particularly in our interruption-oriented world, those who set aside time for deep work (eg, >1 hour of continuous focus) achieve disproportionate returns.

Preface 2: Ideal Time Allocation

Before I jump into the building blocks, a brief note on ideal time allocation. Jim Collins discussed how he found that some of the people he most respected in academics spent their time 50 / 30 / 20 between new creative work / teaching / other necessary stuff.

This will obviously vary across a career evolution, but the 50% focus on creative work appears ideal at all levels, while the 30% on teaching actually becomes increasingly important even outside academia, as much of the core role of a data analyst is education: educating stakeholders on business opportunities or educating other data analysts on various approaches.

What would this look like if it were easy? Perhaps a simple spreadsheet tracking creative hours, with a goal of >2 hours a day of creative deep work (segments under 1 hour don’t count).

A note on ethics

Data analysts have great potential to do good in our world, but also great potential to compound inequity and other failings of our human civilizations on this planet.

In my experience, bad outcomes from data analysis result from two primary error types:

  1. Negligence: When ignorance or mistakes result in inaccurate modeling of the world, resulting in damaging decisions.

  2. Malpractice: When people knowingly misuse their skills to subvert the truth (see How To Lie With Statistics).

The work before you humbly tries to assist those eager to avoid errors of negligence, but makes no attempt to address the broader ethical questions behind errors in malpractice. Malpractice is unfortunately very common due to a wide array of misaligned incentives, so I would encourage all producers and consumers of data analysis to practice healthy skepticism.

1. Analytical Skills

A T-shaped data analyst might eventually go deeper in facilitation or teaching, but all must reach a base level of skill in the core analytical skill sets. Without some ability in this quadrant, nothing else matters.

Data Sourcing / Cleaning

Test: Are you able to find or create the data sets needed to answer the questions that matter?

Universities focus on the heady work of data science, and journalists write that we live in the age of “big data”. Yet in reality most analysts will spend 80% of their time on data sourcing and cleaning.

Success in this area depends on a combination of technical sophistication, grit, and curiosity. You’ll need to be able to wade through partially documented databases, play detective with emails, decks, or other resources you might find, and motivate random people across the company to help point you in the right direction.

Many good analysts find themselves left behind at this stage, as they are content to be “blocked” when their first few attempts at usable data sources reach a dead end. Great analysts show a level of persistence that often leads to unexpected breakthroughs.

Data Manipulation

Test: Can you accurately translate raw data into meaningful numbers and insights?

Whether you’re using SQL, R, or some other language, you should be comfortable converting complex data sets into the structure and outputs needed to answer relevant questions. This could be as complex as combining six different data sets to run a multilinear regression to identify primary factors in sales growth, or it could be as simple as automating a data pipeline to display daily transaction volume.

You should understand a wide range of conceptual approaches, how to conduct basic operations, and where to figure out more complicated operations as needed.

Dashboarding / Visualization

Test: Do people need a PhD to understand your results, or are you able to visualize so clearly that a busy executive can understand your data outputs at a glance?

If your job is to enable better decision making through data, then you have two potential approaches available: First, you could train everyone at your company in data analysis so they can understand your world. Second, you could learn how to translate data into their world through visualization. Do the second.

Effective dashboarding or visualization tends to follow two primary principles: 1. Apply basic mental models that people are accustomed to understanding (eg, line charts, bar charts) 1. Follow good data design principles (eg, denote units, apply consistent axis limits, write “read as” guides, etc)

Experiment Design

Test: Can you structure proper, statistically-sound experiments that generate meaningful results?

Everyone loves experimentation, yet far too often even smart people build experiments that fail to deliver meaningful results (i.e., because one arm gets too much weight). Or, perhaps even worse, they share “results” that simply aren’t accurate (i.e., by running a so-called “before and after experiment”).

A great data analyst should be a student of science and the scientific approach, while nurturing deep skepticism of anything that smells of scientism. Often this comes down to doing the edifficult and uncomfortable work of learning how to accurately apply statistics, and then pleasantly yet firmly holding oneself and others accountable to a standard of accuracy.

Predictive Modeling

Test: Can you predict the future with a level of proven accuracy necessary for the job at hand?

Humans are great at prediction (just watch any baseball player predict a ball’s trajectory and then bring their bat to the exact place at the perfect time), yet humans are also horrible at prediction (hello 2008 recession). What gives?

As it turns out, we’re pretty good at predicting things that follow a standard Gaussian Distribution (baseballs pretty much do the same thing every time), but the same skill set that makes us good at that actually hurts us when predicting things that can be impacted by Black Swans (in 2008, US subprime mortgage practices and rising interest rates led to a doubling in delinquency rates, with ripple effects across the global economy).

So a large part of data analysis is understanding the limitations of predictive modeling, while an equally large portion hinges on building predictive models that are useful despite those limitations. This could be as simple as a linear forecast for sales growth, or as complex as a machine learning model for the performance of local economies.

Documentation

Test: If you were beamed up by aliens, would your team be able to continue using your work with no issues?

Doing something is easy. Doing something in a way that you or others can replicate years later is much more difficult.

Best case, poor documentation will cause more work for you down the line when someone starts asking questions about how you did something. Worst case, poor documentation leads to people mis-interpreting your results and making a costly decision.

Documenting well tends to come down to three elements: (1) writing clear notes in your code, spreadsheets, and decks; (2) linking everything back to the replicable source data and code; and (3) making sure that the right people/groups have access to those files and code repositories.

2. Facilitation and Decision Making

Brilliant analysis only matters if others know it exists, understand it, and can make better decisions because of it. That is what these skills unlock.

Thought Partner / Stakeholder Management

Test: Do the right decision makers come to you for advice? Can they trust that you will proactively raise meaningful thoughts?

Success as a data analyst means directly improving the decisions being made in an organization. A substantial part of this is about having accurate and valuable perspectives (especially “do no harm”), but an equally important part is about building relationships with cross-functional decision makers.

The key ingredients? Proactively building relationships through regular sharing moments (meetings, emails, or other venues) and prioritizing quality over quantity. Over time, some stakeholders will start reaching out themselves, at which point you can take the relationship to the next level by reacting promptly and helpfully.

Reversal: The only thing more useless a VHS player is a data analyst who gets things wrong. Always be willing to say “let me get back to you on that” rather than sharing unfinished analysis. All analysts are allowed to make a mistake once in their careers.

Ownership

Test: Does anyone else wake up at night worrying about a project you’re leading?

This is true for all careers, but still worth noting. As a data analyst, an important part of your job is project management. If you put your name on a project, that should mean something to everyone involved. It should mean that everyone involved will have clarity on their roles. It should mean that stakeholders will be kept informed. And it should mean that you will do whatever is needed to complete the project successfully (see technical sophistication).

Problem Structuring / Modeling

Test: Can you take a complex problem space (eg, the role of financial instruments in consumer retention) and simplify the relevant elements into something a 5 year old can understand?

As with much in life, a large part of any data analyst’s job is teaching. It’s your job to dig deep into an important topic, master all the ins and outs, and then write the Cliff’s Notes guide. Doing this effectively often involves taking an amorphous problem space and structuring it with the help of some sort of model (the classic 2x2, or something else entirely).

Providing others with this structure will help them rapidly understand the problem space, and it will also force you to think with more clarity and intentionality on the topic.

The book Thinking In Systems provides a great introduction to this area, and this Coursera session will add practicality.

Elegant Solutions

Test: Like Alexander the Great and the Gordian knot, are you able to find simple, even boring, solutions to impossibly complex problems?

It’s easy to design a crazy complex solution that might look impressive in a slide deck. A trained mouse can do that. What’s less easy is taking the time to really understand the complex problem, and then find a simple solution.

Most of the time, finding an elegant solution really means spending more time on properly structuring the problem and really understanding the system at play. Think deeply about the second-order effects and follow the problem tree to really understand how the system currently works and what impact various possible changes would have.

Artifact Hygiene

Test: If a random person comes across your analysis, can they misunderstand it? Would they be able to understand how you arrived at your conclusions?

This area feels almost unnecessarily tactical, yet I’m repeatedly shocked how often poorly structured outputs get published by perfectly good data analysts. Data shared without a linked source, charts structured with multiple interpretations, recommendations made without clear supporting reasons: all of these lead to low fidelity data transfer and sloppy decision making.

Spoken Communication

Test: Do your stakeholders feel like you’re communicating to them or at them?

Avoid any meetings that don’t require your active participation; just read the notes later. For those meetings where you are actively participating, embrace the opportunity.

Know your audience, and speak to people in their language. Most people speak from their own context, implicitly demanding that their audience do the work of translating. Instead, show respect for your audience by seeking to walk in their shoes and communicating where they already are.

And take time to review your own speaking habits, whether with a coach or simply practicing consciousness of your communication habits. Most of the best podcast hosts spend hours listening to their own shows, embracing the discomfort to improve their craft.

Enable Intelligent Decision Making

Test: Are decision makers relying on your work to guide their choices? Is your analysis positively impacting those choices?

The worst thing a data analyst can do is be wrong. As Mark Twain probably didn’t say, “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.” As a data analyst, you must be a harbinger of truth. If you start cutting corners and getting things wrong, you absolutely can have a negative impact on the quality of decisions being made.

The second worst thing a data analyst can do is be right, when it no longer matters. If you’re unable to contribute until after a decision must be made (and remember, no decision is also a decision) then it really doesn’t matter how perfect your work might eventually be.

If it’s your job to improve decision making, then you must manage the difficult balance of building accurate but also timely perspectives. Often this involves error bars, yet be wary of “directional” results that are really just dart throwing.

Institutional Knowledge

Test: Can decision makers find relevant artifacts when they need them?

Far too often, organizations treat strategy work like a social media newsfeed: ordered chronologically and quickly forgotten. Rather than crossing your fingers and hoping someone is still around to remember a relevant study when it’s needed again years later, a great analyst should proactively build an organizational memory system to make previous studies accessible.

This takes two forms: First is that of building an actual system that others can use and access. Ideally this is a simple electronic reference card system that people can easily search for keywords, requiring very little maintenance. Second is that of making yourself available as a sort of “chief memory executive” to connect the dots for the people you work with.

3. Domain Knowledge

Analytical skills hold value only so long as they are grounded in reality. That grounding is done through domain, or contextual, knowledge.

Technical Awareness

Test: Are your proposals realistic within the context of your product, engineering or infrastructural constraints? Or do they require significant revision before holding any practical value?

The core job of a data analyst is to help improve decision making by ingesting and processing all relevant inputs. This is commonly understood to include customer- or partner-facing analysis, but it must equally involve understanding the constraints of the overall ecosystem.

Recommending the removal of an unnecessary step in a purchase flow might make sense from a customer perspective, but that recommendation must also take into account constraints and costs from your front-end dev team, your commerce team, and even any external regulatory constraints.

A great data analyst must embrace a certain amount of technical awareness in order to have any practical value to an organization.

Subject Matter Knowledge

Test: Can you clearly communicate the root causes and variables at play within your product or industry? Or are you only equipped to share surface-level facts?

Identifying that, say, people in mid-sized towns purchase more soap than those in big cities might be a fun fact (if true). Knowing what drives this behavior could lead to improved decision making.

Translating data into insights often requires a certain level of subject matter expertise: about the industry, about cultural contexts, about customer lifestyles and choices, and more.

You’ll never be a subject matter expert in everything, so subject matter is less about knowing the answers and more about knowing who to ask.

Strategic Thinking

Test: If your company did just one thing for the next year, do you know which one thing would make everything else easier or irrelevant?

Finding opportunities is easy; the world is full of them. Finding the right opportunity is very difficult.

Great data analysts should invest deep thought into clearly prioritizing opportunities, and knowing which very few recommendations will have outsized impact relative to their difficulty. With the right technical awareness and subject matter knowledge, this is often as simple as a Y-axis of “potential impact” (e.g., sales) plotted against an X-axis of “difficulty” (e.g., money or time cost).

The second critical element of strategic thinking is that of winning today without sacrificing tomorrow. Cutting product quality might improve profitability today, but it could very well come at an incredible cost to future profitability. Only through really understanding the full system can an analyst attempt to consider indirect and long-term effects.

Meta Learning

Test: Can you ingest ambiguous and scattered information on a subject, and rapidly distill the actually important elements into a structured form that can be easily understood by others?

Despite literal decades of schooling, most of us are remarkably bad at learning. As it turns out, effective learning is less about working hard at “learning something” and more about working smart by learning how to learn (aka, meta learning).

While you have plenty of great models to choose from, Tim Ferriss’ simple DiSSS model tends to cover most of the important elements:

4. Professional Self-Awareness

While the above skills are primarily externally-facing, growth and performance as a data analyst depends on continual work to be self-aware. This awareness should be introspectively focused on both yourself and your analyses/tools.

Understanding Strengths and Limitations

Test: When evaluating a problem, are you able to identify the strengths and weaknesses of different approaches or tools? Or are you so skilled with a hammer that you pretend every problem is a nail?

Work tends to become easier and more fun as we move up the ladder of mastery. Unfortunately, this can lead to self-reinforcing biases toward certain skills/tools/approaches. The better you are with a tool, the more you want to use it with everything, the better you become with that tool.

Best case, an imbalance between skill ladders can lead to pain and frustration when switching to a less developed ladder. Worst case, a data analyst might become so comfortable with a certain ladder that they apply it regardless of the problem. Before you know it, you might find yourself trying to hammer in a screw.

Equally, foster an awareness of the biases and pitfalls inherent in all approaches. Nothing will ever be perfect, so be intentional in understanding elements like the Black Swan effect (e.g., unexpected outliers have outsized impact) or intervention bias (e.g., doctors are paid to do something, yet sometimes the best path forward is to do nothing).

Second-Order Effects

Test: Can you clearly communicate the second-order causes or effects at play?

First-order thinking can generally be accomplished through data alone (i.e., “sales are up because purchase conversion rates are improved”), but these surface level facts rarely illuminate real opportunities.

Second-order thinking gets to the secondary causes or effects (i.e., “conversion rates are up at the end of every month because that’s when most people get paid”).

Understanding second-order causes (or the “why” behind the “what”) equips you to begin identifying possible second-order effects of pulling various levers. Yet as with most things, an enduring skepticism makes for healthy decisions, as one must be very cautious to avoid arrogance in believing second-order effects can be accurately predicted in complex systems.

360 Feedback

Test: Do you know how others view your current strengths and weaknesses?

As humans, we constantly tell ourselves stories about what other people are doing and thinking, particularly as it relates to us. We’re nearly always wrong.

Direct feedback from colleagues gives us the gift of new perspectives. Granted, these perspectives will be also subject to innumerable biases and distortions, yet while they won’t tell us what’s really happening, they will tell us what other people are perceiving - providing new vantage points to help triangulate reality.

Meta Analysis / Thinking Time

Test: Do you spend time analyzing yourself and your approaches, or are you like a constantly running hamster who never steps back to see where the wheel leads?

“Measure twice, cut once.” “Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” We love adages about time spent planning, analyzing, and adjusting, yet few of us actively apply this to our own professional journeys.

Set aside regular time to go deep into assessing your goals, your current state, and the actions to adjust. Ideally you should aim to walk away with just 1-2 clear actions that will move you closer to your long-term goal. More is not better; take the time to find the 1-2 changes that matter most.

5. Teaching / Sharing

Passing the torch will scale the impact of your work AND free more of your time for growth instead of just maintenance.

Mentoring / 1:1 Teaching

Test: Do people come to you with intelligent questions? Are you able to teach them how to fish, instead of just fishing for them?

Watch one, do one, teach one. Instead of simply doing things for people and then complaining that people ask you boring questions, take the time to teach and equip your colleagues so they can do things themselves.

One important element of this is teaching in a clear and understandable way: whether recording screencasts, writing documentation, or pair programming.

An equally important element though is respecting your company’s time by teaching technical sophistication and holding people to a high standard of self-learning. You should be educating on questions that aren’t already covered by documentation or a quick search. If people repeatedly come back with the same questions they should be able to answer without you, then you should not encourage that unhealthy reliance.

Courses / Structured Teaching

Test: Do you share your expertise in a structured format (course, writing, etc)?

Teaching a subject at scale will be rewarding in three significant areas: (1) you’ll be able to raise the decision making ability of your wider org, (2) you’ll see a decrease in one-off questions as you can direct common queries to your recorded talk or writing, and (3) you’ll find yourself more structured and clear in your thinking as teaching provides a fantastic forcing function for learning.