The Agile Data (AD) Method

The Agile Database Techniques Stack: The Dev Side of DataOps

Modern software development processes, including SAFe, Extreme Programming (XP), and Scrum are all evolutionary if not agile in nature. The implication is that if data professionals are going to be effective members of such teams, then they need to adopt tools and techniques which enable them to do so. There is nothing special about the data aspects of your IT ecosystem: they can be developed in an evolutionary manner just like non-data aspects. Even data warehouse and business intelligence solutions can be developed in an evolutionary manner (and quite frankly this is the preferred approach).This article overviews what we call the agile database techniques stack, a collection of strategies required by modern data development professionals. It is organized into the following topics:
  1. Agile database techniques stack
  2. Why is this a stack?
  3. Why adopt the agile database techniques stack?
  4. What is the best way to adopt the agile database techniques stack?
  5. Can you adopt individual techniques?
  6. Agile database techniques and DataOps

1. The Agile Database Techniques Stack

The techniques stack is overviewed in Figure 1 below.

Figure 1. The Agile Database Techniques Stack (click to enlarge).

Agile Database Techniques stack

These strategies of the agile database techniques stack are:

  1. Thin slicing. A fundamental agile development technique is to thinly slice functionality into small, consumable pieces that may be potentially deployed into production quickly. These vertical slices are completely implemented – the analysis, design, programming, and testing are complete – and offer real business value to stakeholders. Thin slicing is completely applicable, and highly desirable, in data development.
  2. Clean data architecture and clean data design. A clean data architecture strategy enables you to develop and evolve your data assets at a pace which safely and effectively supports your organization – in short, to be agile. Similarly, a clean database design enables you to evolve specific data assets in an agile manner.
  3. Agile data modeling. With an evolutionary approach to data modeling you model the data aspects of a system iteratively and incrementally. With an agile data modeling approach you do so in a highly collaborative and streamlined manner.
  4. Database refactoring and data repair. A database refactoring is a small change to your database schema which improves its design without changing its semantics (e.g. you don’t add anything nor do you break anything). The process of database refactoring is the evolutionary improvement of your database schema so as to improve your ability to support the new needs of your customers. A data repair is a small fix to data that addresses a data quality problem. The process of data repair is the evolutionary improvement of your data quality.
  5. Automated database regression testing. You should ensure that your database schema actually meets the requirements for it, and the best way to do that is via testing. With a test driven development (TDD) approach you write a unit test before you write production database schema code, the end result being that you have an automated regression test for your database schema. to ensure data quality.
  6. Continuous database integration (CDI). Continuous integration (CI) is the automatic invocation of the build process of a system. As the name implies, continuous database integration (CDI) is the database version of CI.
  7. Configuration management. Your data models, database tests, test data, and so on are important artifacts which should be put under configuration management just like any other artifact.

 

2. Why is This a Stack?

We call it a stack because each technique relies on you being able to perform the ones below it. For example, Figure 2 shows that continuous database integration requires you to have a configuration management strategy in place.

Figure 2. How the techniques rely on each other (click to enlarge).

Agile Data Techniques Stack - Relationships

 

3. Why Adopt The Agile Database Techniques Stack?

Any given technique has advantages and disadvantages, including the ones overviewed in this article. Every given practice works well in some situations, and may even be the “best” that you can do in those situations, but doesn’t work well in others. So practices are contextual in nature and should be presented as such. To present something as a “best practice” is deceptive in my opinion. To prove my point, the advantages and disadvantages of each strategy of the agile database techniques stack are summarized in the following table.

Technique Advantages Disadvantages
Thin slicing
  • Deliver high-value functionality sooner.
  • Reduces risk.
  • Enables opportunity for feedback.
  • Requires clean architecture, design, and implementation.
  • Requires “full stack” data capability within a team to develop and evolve the solution.
Clean Data Architecture
  • Easier to understand.
  • Easier to evolve, thereby enabling agility.
  • Easier to validate.
  • Requires investment to keep clean, including in architectural modeling and architectural refactoring.
  • Existing legacy architectures often have significant technical debt that needs to be addressed before your architecture is sufficiently clean.
Clean Database Design
  • Easier to understand and evolve.
  • Easier to test.
  • Increases ability to evolve data hosting strategy.
  • Requires investment to keep clean, including in agile design modeling and database refactoring.
  • Existing legacy designs often have significant technical debt that needs to be addressed before your design is sufficiently clean.
Agile Data Modeling
  • Enables evolutionary exploration of both your problem and solution spaces.
  • Enables decisions at the most responsible moment.
  • Integrates into other agile ways of working.
  • Traditional data modelers struggle with agile data modeling strategies at first, particularly when they don’t yet have full stack agile database skills.
  • Requires ability to safely refactor whatever is being modeled.
Database Refactoring
  • Enables safe evolution of a data source.
  • Supports removal of data debt.
  • Enables evolutionary database development.
  • Although possible to implement by hand, you really want to invest in database refactoring tooling.
  • Not possible for data sources that do not support executable functionality, stored procedures for example, such as CSV files.
  • Requires capability to schedule, and then execute upon, removal of refactoring scaffolding once systems that access the data source are updated.
Data Repair
  • Enables you to reduce overall technical debt as you will no longer need to address the data quality problem wherever you use the data.
  • Often straightforward to implement
  • Increases the understandability of your data through greater consistency
  • Changes to data sources that are accessed by multiple clients may impact one or more of those clients, requiring corresponding updates there. Without automated regression test suites in place for those clients this can be risky.
  • Some data repairs require a deprecation period where the original and new versions are supported in parallel, increasing overall complexity.
Automated Database Regression Testing
  • Enables inclusion of data sources in automated testing strategy.
  • Enables automatic enforcement of data standards and conventions.
  • Support executable specification strategy.
  • Requires people with test thinking and testing skills.
  • Requires an understanding that data sources are enterprise assets AND must be treated as such.
  • Traditional data groups are often unwilling at first to include testing as one of their responsibilities.
  • Lack of automated tests for existing legacy data sources requires investment to develop them.
Continuous Database Integration (CDI)
  • Automates much of the drudgery around the overall build process.
  • Increases consistency and predictability of database evolution work.
  • Brings database evolution up to common software engineering practice.
  • Enables greater visibility into the work and work products of database evolution via automated logging of results.
  • Requires investment in automation of the database development infrastructure, potentially including new tooling.
  • Requires investment in the creation of automated tests.
  • May require training and coaching of data engineers in agile development techniques.
Configuration Management
  • Enables you to manage versions of databases, and the assets they are comprised of, across environments.
  • Enables you to rollback portions of a database to previous versions.
  • Enables you to modify work products in parallel and then merge the work.
  • Makes it easier to identify the source of defects injected due to the change history maintained about assets.
  • Requires database developers to be trained in CM-related skills.
  • May require new tooling, in particular the CM system itself.

4. What is the Best Way to Adopt the Agile Database Techniques Stack?

There isn’t a single, “best” way. It depends on the context of the situation that you face. Here are some potential strategies to consider:

  1. Small, incremental improvements. Treat the techniques stack like an improvement target, making small changes to work towards it. You’re very likely doing data architecture, database design, data modeling, some database testing, configuration management of some assets, and may even be improving the implementation of some of your legacy data sources. So start evolving your current way of working (WoW) to leverage ideas captured in the various techniques of the stack. I highly suggest that anyone interested in how to effectively improve via small steps to take a look at PMI’s Guided Continuous Improvement (GGI).
  2. Take a top-down approach. When you adopt a technique, you’ll find that you need to adopt at least some aspects of the technique that it immediately relies on. This continues recursively until you reach the bottom of the stack. In some ways you’ll be adopting “vertical slices” of the overall techniques stack. What I mean by this is that you’ll adopt just enough of each technique to get some value, then adopt some more of it, and so on.
  3. Take a bottom-up approach. This makes sense from a technical point of view, and it’s certainly easier, it tends to be difficult from a management point of view. The techniques towards the top of the stack tend to have the best short-term payback whereas the techniques towards the bottom have a longer-term payback. Starting bottom up, you’re effectively starting with the hardest strategies to justify (at least in organizations struggling to operate with a value-driven mindset rather than a cost-driven one).
  4. Take a middle-out approach. Some people choose to start with the more technically interesting techniques, particularly database refactoring and automated database regression testing, often because those are most likely to be new to them and to their organization. Other people will focus on improving what they’re already doing, data modeling and database design, but adopting more effective agile approaches. Either way, you will still need to swiftly adopt techniques lower on the stack.
  5. Adopt everything at once. This can be chaotic because in effect it’s a large change for your data group. As pointed out above, the top-down strategy tends to quickly evolve into this one anyway, although perhaps via “vertical slices” of the overall technique stack.

The list above is ordered by my personal preference, which is driven by what I have found to work in practice. But it really does depend on your situation, one approach does not work in all situations. There are no best practices.

5. Can You Adopt Individual Techniques?

Yes.

Each technique offers value on its own. However, because they build on each other, as you saw in Figure 2, and in the previous section, they are more effective when adopted together.

 

6. Agile Database Techniques Stack and DataOps

Figure 3 depicts the DataOps lifecycle, the mobius loop, with a collection of data and database techniques mapped to its activities. The figure includes several techniques that are not called out by the agile database techniques stack, such as data lineage and data security. This is because the agile database techniques stack is a subset of the agile data techniques available to you, capturing the ones most critical to the development side of loop.

Figure 3. Mapping data techniques to the DataOps lifecycle (click to enlarge).

DataOps/Data DevOps

 

7. Related Resources

 

Recommended Reading

Choose Your WoW! 2nd Edition
This book, Choose Your WoW! A Disciplined Agile Approach to Optimizing Your Way of Working (WoW) – Second Edition, is an indispensable guide for agile coaches and practitioners. It overviews key aspects of the Disciplined Agile® (DA™) tool kit. Hundreds of organizations around the world have already benefited from DA, which is the only comprehensive tool kit available for guidance on building high-performance agile teams and optimizing your WoW. As a hybrid of the leading agile, lean, and traditional approaches, DA provides hundreds of strategies to help you make better decisions within your agile teams, balancing self-organization with the realities and constraints of your unique enterprise context.

 

I also maintain an agile database books page which overviews many books you will find interesting.