The Agile Database Techniques Stack: The Dev Side of DataOps
- Agile database techniques stack
- Why is this a stack?
- Why adopt the agile database techniques stack?
- What is the best way to adopt the agile database techniques stack?
- Can you adopt individual techniques?
- Agile database techniques and DataOps
1. The Agile Database Techniques Stack
The techniques stack is overviewed in Figure 1 below.
Figure 1. The Agile Database Techniques Stack (click to enlarge).
These strategies of the agile database techniques stack are:
- Thin slicing. A fundamental agile development technique is to thinly slice functionality into small, consumable pieces that may be potentially deployed into production quickly. These vertical slices are completely implemented – the analysis, design, programming, and testing are complete – and offer real business value to stakeholders. Thin slicing is completely applicable, and highly desirable, in data development.
- Clean data architecture and clean data design. A clean data architecture strategy enables you to develop and evolve your data assets at a pace which safely and effectively supports your organization – in short, to be agile. Similarly, a clean database design enables you to evolve specific data assets in an agile manner.
- Agile data modeling. With an evolutionary approach to data modeling you model the data aspects of a system iteratively and incrementally. With an agile data modeling approach you do so in a highly collaborative and streamlined manner.
- Database refactoring and data repair. A database refactoring is a small change to your database schema which improves its design without changing its semantics (e.g. you don’t add anything nor do you break anything). The process of database refactoring is the evolutionary improvement of your database schema so as to improve your ability to support the new needs of your customers. A data repair is a small fix to data that addresses a data quality problem. The process of data repair is the evolutionary improvement of your data quality.
- Automated database regression testing. You should ensure that your database schema actually meets the requirements for it, and the best way to do that is via testing. With a test driven development (TDD) approach you write a unit test before you write production database schema code, the end result being that you have an automated regression test for your database schema. to ensure data quality.
- Continuous database integration (CDI). Continuous integration (CI) is the automatic invocation of the build process of a system. As the name implies, continuous database integration (CDI) is the database version of CI.
- Configuration management. Your data models, database tests, test data, and so on are important artifacts which should be put under configuration management just like any other artifact.
2. Why is This a Stack?
We call it a stack because each technique relies on you being able to perform the ones below it. For example, Figure 2 shows that continuous database integration requires you to have a configuration management strategy in place.
Figure 2. How the techniques rely on each other (click to enlarge).
3. Why Adopt The Agile Database Techniques Stack?
Any given technique has advantages and disadvantages, including the ones overviewed in this article. Every given practice works well in some situations, and may even be the “best” that you can do in those situations, but doesn’t work well in others. So practices are contextual in nature and should be presented as such. To present something as a “best practice” is deceptive in my opinion. To prove my point, the advantages and disadvantages of each strategy of the agile database techniques stack are summarized in the following table.
Technique | Advantages | Disadvantages |
Thin slicing |
|
|
Clean Data Architecture |
|
|
Clean Database Design |
|
|
Agile Data Modeling |
|
|
Database Refactoring |
|
|
Data Repair |
|
|
Automated Database Regression Testing |
|
|
Continuous Database Integration (CDI) |
|
|
Configuration Management |
|
|
4. What is the Best Way to Adopt the Agile Database Techniques Stack?
There isn’t a single, “best” way. It depends on the context of the situation that you face. Here are some potential strategies to consider:
- Small, incremental improvements. Treat the techniques stack like an improvement target, making small changes to work towards it. You’re very likely doing data architecture, database design, data modeling, some database testing, configuration management of some assets, and may even be improving the implementation of some of your legacy data sources. So start evolving your current way of working (WoW) to leverage ideas captured in the various techniques of the stack. I highly suggest that anyone interested in how to effectively improve via small steps to take a look at PMI’s Guided Continuous Improvement (GGI).
- Take a top-down approach. When you adopt a technique, you’ll find that you need to adopt at least some aspects of the technique that it immediately relies on. This continues recursively until you reach the bottom of the stack. In some ways you’ll be adopting “vertical slices” of the overall techniques stack. What I mean by this is that you’ll adopt just enough of each technique to get some value, then adopt some more of it, and so on.
- Take a bottom-up approach. This makes sense from a technical point of view, and it’s certainly easier, it tends to be difficult from a management point of view. The techniques towards the top of the stack tend to have the best short-term payback whereas the techniques towards the bottom have a longer-term payback. Starting bottom up, you’re effectively starting with the hardest strategies to justify (at least in organizations struggling to operate with a value-driven mindset rather than a cost-driven one).
- Take a middle-out approach. Some people choose to start with the more technically interesting techniques, particularly database refactoring and automated database regression testing, often because those are most likely to be new to them and to their organization. Other people will focus on improving what they’re already doing, data modeling and database design, but adopting more effective agile approaches. Either way, you will still need to swiftly adopt techniques lower on the stack.
- Adopt everything at once. This can be chaotic because in effect it’s a large change for your data group. As pointed out above, the top-down strategy tends to quickly evolve into this one anyway, although perhaps via “vertical slices” of the overall technique stack.
The list above is ordered by my personal preference, which is driven by what I have found to work in practice. But it really does depend on your situation, one approach does not work in all situations. There are no best practices.
5. Can You Adopt Individual Techniques?
Yes.
Each technique offers value on its own. However, because they build on each other, as you saw in Figure 2, and in the previous section, they are more effective when adopted together.
6. Agile Database Techniques Stack and DataOps
Figure 3 depicts the DataOps lifecycle, the mobius loop, with a collection of data and database techniques mapped to its activities. The figure includes several techniques that are not called out by the agile database techniques stack, such as data lineage and data security. This is because the agile database techniques stack is a subset of the agile data techniques available to you, capturing the ones most critical to the development side of loop.
Figure 3. Mapping data techniques to the DataOps lifecycle (click to enlarge).
7. Related Resources
- Clean Database Design
- Data Debt: Understanding Enterprise Data Quality Problems
- Guided Continuous Improvement (GGI)
- Introduction to DataOps: Bringing Databases Into DevOps
- Questioning the Concept of “Best Practices”: Practices are Contextual, Never “Best”