Introduction – “Features Hell”

If you ever had to do enterprise-level development for large data integration projects, the term “feature hell” is not foreign to you. I’m sure you’ve come across a case where separate teams work on a few features of a larger initiative in parallel: “Yellow Elephant”, “Pink Snake”, and Green Hippo”, for example. You’ve been working hard and making good progress for a few months, and the deadline is quickly approaching. The code base is stable and the number of issues is slowly reducing. Suddenly, out of the blue, management decides that to better align with ever-changing business goals and objectives, “Yellow Elephant” won’t be released this time , while “Pink Snake” and “Green Hippo” should be slightly modified. This falls on your head like a brick, as all features have been developed in the same trunk and they all heavily relate each other. Separating features is a Sisyphean task and will require huge effort and personal sacrifices of the team to save the project/release.

After repeatedly seeing these scenarios, we at Coherent Solutions decided to take a step back and see whether we could find processes, procedures, methods and tools to enable us to thrive in such situations, rather than constantly relying on the heroics of our team members to pull us out of such situations at the last minute. While the techniques to deal with these challenges are relatively well established in on application development projects, they are less mature in the Business Intelligence/Data Integration sphere.

We at Coherent Solutions are not newbies to Agile methodologies (as Agile is the de-facto development process at Coherent), and have extensively worked with Scrum and Kanban implementations. We fully realize that the key to each methodology is “tailoring” it to specific organizations and ensuring support at all levels. “Tailoring” isn’t only about applying small tweaks to the process and procedures, it also about making sure an appropriate infrastructure is in place. Although being agile is important to effectively resolving issues like the ones mentioned above, our experience says that Scrum or Kanban are only successful up to a point for data integration projects without compromising these processes’ manifesto.

Thus, we decided to take on an R&D initiative to attempt to perfect the details.

The type of integration work that our teams face mainly has to do with feature development and implementation. Features vary in size from somewhere between “story” to something slightly bigger than “epic” with estimates ranging from a few weeks to a few months to implement. Much of our work leans towards Microsoft technologies with SQL Server and SSIS as primary development tools so the characteristics of these tools was a key consideration for us in the approach to take.

After reviewing various flavors of agile, we decided to start with Feature Driven Development (FDD) as the Agile process to manage our development efforts. The main reason for our decision is this approach is inherently feature-based so it fits well with the work environment described above. Given the size of features we work with, we definitely did not want to follow the path of splitting them into smaller stories, implementing and merging into the main trunk, and making other teams deal with half-baked deliveries (note that the definition of “done” is definitely an item of discussion and it will be covered in future blogs). When working with data integration projects, dealing with something that is half-baked is painful and time consuming so it is preferable to keep feature development isolated until it is done and then merge it into the main trunk.

Thus, we needed to consider a branch management strategy that could be applied for data integration projects. This presented us with the following options:

  • “branching by feature”. See more at Martin Fowler’ article or Git branching model.
  • “branching by release”. Everyone’s tried and true friend – create a branch per a release to support it.
  • “branching by abstraction”. “Branching by abstraction” is based on an approach of having a single branch for everything.  Features can be turned on/off by feature-toggles.  See a few more good Martin Fowler’s articles at Branch By Abstraction & Feature Toggle.

On the surface, “branching by abstraction” seemed like the best choice to go with, especially when used for SSIS, which is considered “un-mergeable”.  However, when one consider a complex, layered architecture with many rolling parts (e.g. BI Tool such as Tableau or Microstrategy with development cycle of its own), with data crossing the borders between different layers, data marts and BI platforms, it becomes very apparent that a decision to use “branching by abstraction” can only be made after extensive infrastructure and architectural analysis of a specific environment. Furthermore, substantial investment would need to be made to refactor the code so that it fits the “abstraction” paradigm. An extra complexity is added by the “feature toggle” paradigm. The “feature toggle” is very difficult to implement as it needs to be applied to many layers in a given layered architecture. Consequently, the nature of data integration projects with complex business rules and layered data architectures makes implementation and support effort for “feature toggle” effort go through the roof.

Therefore, we decided to pursue the path of least resistance by exploring “branching by feature” to manage branches. This option isn’t optimal either as some of the difficulties still remain such as the “un-mergeable” nature of SSIS, but all things considered it offered the most promise.

In our subsequent blogs, we will cover the results of a number of R&D projects related to how organizations can make “branching by feature” work, including resolving the “un-mergeability” problem of SSIS, the ability to branch and merge “non-code” artifacts such as source-to-target mapping documentations, and how to select and fine-tune tools and process methodologies for FDD with the “branching by feature” strategy for data integration.