Apr 23, 2020 - On the architecture for unit testing

Automated testing is an integral part of any major software project as a means for improving quality, productivity and flexibility. As such, it’s vital that system architecture is designed in a way to facilitate the development and execution of automated tests.

Quality is improved because automated testing execution allows us to find and solve problems early in the development cycle, much before a product change is deployed to production and becomes available to end-users.

Productivity increases because the earlier a problem is found in the development cycle, the cheaper it’s to fix it, and it’s easy to see why. If a software developer is able to run an automated test suite before integrating his code changes to the main repository he can quickly discover newly introduced bugs and fix them in the act. However, if such test suite is not available, newly introduced bugs may only appear in a manual testing phase later on, or even worse, reported by end-users, requiring developers to step out of the regular development workflow for investigating and fixing them.

Flexibility is improved because developers feel more confident for refactoring code, upgrading packages and modifying system behavior when required as they rely on a test suite with high level of coverage for assessing the impacts of their code changes.

When discussing automated testing I also like to bring up the topic of risk management to the conversation. As a lead software engineer risk management is a big part of my job, and it involves mentoring the development team in practices and processes that reduce the risks of technical deterioration of the product. From the benefits listed above it’s clear that the employment of an adequate automated testing strategy fits right in, helping to mitigate risks in a software project.

Moving forward, we can divide automated tests into at least three different types according to the strategy for implementing and running them, which are shown in the famous test pyramid below:

Testing Pyramid

Unit tests are cheap to develop and cheap to run with respect to time and resources employed, and are focused on testing individual system components (eg: business logic) isolated from external dependencies.

Integration tests take one step further, and are developed and ran without isolating external dependencies. In this case we are interested in evaluating that all system components interact as expected when put together and faced with integration constraints (eg: networking, storage, processing, etc).

Finally, on the top of the pyramid, GUI tests are the most expensive to automate and execute. They usually rely on UI input/output scripting and playback tools for mimicking an end-user’s interaction with the system’s graphical user interface.

In this article we will be focusing on the foundation of the test pyramid, the unit tests, and on system architecture considerations for promoting their adoption.

Properties of an effective unit test

Fist, let’s enumerate what constitutes an effective, well-crafted unit test. Below is a proposition:

  • Short, having a single purpose
  • Simple, clear setup and tear down
  • Fast, executes in a fraction of a second
  • Standardized, follows strict conventions

Ideally a unit test should display all of these properties, below I elaborate why.

If the unit test isn’t short enough it will be harder to read it and understand its purpose, i.e., exactly what it’s testing. So for that reason unit tests should have a clear objective and evaluate one thing only, instead of trying to perform multiple evaluations at the same time. This way, when a unit test breaks, a developer will more easily and quickly assess the situation and fix it.

If unit tests require a lot of effort to setup their test context, and tear it down afterwards, developers will often start questioning whether the time being invested in writing these tests is worth it. Therefore, we need to provide an enviroment for writing unit tests that takes care of managing all the complexity of the test context, such as injecting dependencies, preloading data, clearing up caches, and so forth. The easier it is to write unit tests, the more motivated developers will be for creating them!

If executing a suite of unit tests takes a lot of time, developers will naturally execute it less often. The danger here lies in having such a lengthy unit test suite that it becomes impractical, and developers start skipping running it, or running it selectively, reducing its effectiveness.

Lastly, if tests aren’t standardized, before too long your test suite will start looking like the wild west, with different and sometimes conflicting coding styles being used for writing unit tests. Hence, pursuing system design coherence is as much as valid in the scope of unit testing as it is for the overall system.

Once we agree on what constitutes effective unit tests we can start defining system architecture guidelines that promote their properties, as described in the following sections.

Software complexity

Software complexity arises, among other factors, from the growing number of interactions between components within a system, and the evolution of their internal states. As complexity gets higher the risk of unintentionally interfering in intricated webs of components interactions increases, potentially leading to the introduction of defects when making code changes.

Furthermore, it’s common sense that the higher the complexity of a system, the harder it is to maintain and test it, which leads to a first (general) guideline:

Keep an eye for software complexity and follow design practices to contain it

A practice worth mentioning for managing complexity while improving testability is to employ Pure Functions and Immutability whenever possible in your system design. A pure function is a function that has the following properties:1

  • Its return value is the same for the same arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams from I/O devices).
  • Its evaluation has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or I/O streams).

From its properties it’s clear that pure functions are well suited to unit testing. Their usage also removes the need for much of the complementary practices which are discussed in the following sections for handling, mostly, stateful components.

Immutability plays an equally important role. An immutable object is an object whose state cannot be modified after it is created. They are more simple to interact with and more predictable, contributing for lowering the system complexity, disentangling global state.

Isolating dependencies

By their very definition unit tests are intended to test individual system components in isolation, since we don’t want the result of a component’s unit tests to be influenced by one of its dependencies. The degree of isolation varies according to specifics of the component under test and preferences of each development team. I personally don’t worry for isolating lightweight, internal business classes, since I see no value added in replacing them by a test targeted component that will display pretty much the same behavior. Be that as it may the strategy here is straightforward:

Apply the dependency inversion pattern in component design

The dependency inversion pattern (DIP) states that both high-level and low-level objects should depend on abstractions (e.g. interfaces) instead of specific concrete implementations. Once a system component is decoupled from its dependencies we can easily replace them in the context of a unit test by simplified, test targeted concrete implementations. The class diagram below illustrates the resulting structure:

Isolated Dependencies

In this example the component under test is dependent on Repository and FileStore abstractions. When deployed to production we might inject a concrete SQL based implementation for the repository class and a S3 based implementation for the file store component, for storing files remotely in the AWS Cloud. Nevertheless, when running unit tests we will want to inject simplified functional implementations that don’t rely on external services, such as the “in memory” implementations painted in green.

If you’re not familiar with the DIP, I have another article that goes through a practical overview on how to use it in a similar context that you may find helpful: Integrating third-party modules.

The Mocks vs Fakes debate

Notice that I’m not referring to these “in memory” implementations as “mocks”, which are simulated objects that mimic the behavior of real objects in limited, controlled ways. I do this deliberately, since I’m against the usage of mock objects in favor of fully compliant “fake” implementations that give us more flexibility for writing unit tests, and can be reused across several unit test classes in a more reliable way than setting up mocks.

To get into more detail suppose we are writing a unit test for a component that depends on the FileStore abstraction. In this test the component adds an item to the file store but isn’t really worried whether the operation succeeds or fails (eg: a log file), and hence we decide to mock that operation in a “dummy” way. Now suppose that later on requirements change, and the component needs to ensure that the file is created by reading from the file store before proceeding, forcing us to update the mock’s behavior in order for the test to pass. Then, imagine requirements change yet again and the component needs to write to multiple files (eg: one for each log level) instead of only one, forcing another improvement of our mock object behavior. Can you see what’s happening? We are slowly improving our mock making it more similar to a concrete implementation. What’s worse is that we may end up with dozens of independent, half-baked, mock implementations scattered throughout the codebase, one for each unit test class, resulting in more maintenance effort and less cohesion within the testing environment.

To address this situation I propose the following guideline:

Rely on Fakes for implementing unit tests instead of Mocks, treating them as first class citizens, and organizing them in reusable modules

Since Fake components implement business behavior they’re inherently a more costly initial investment when compared to setting up mocks, no doubt about that. However, their return in the long-term is definitely higher, and more aligned with the properties of effective unit tests.

Coding style

Every automated test can be described as a three-step script:

  1. Prepare test context
  2. Execute key operation
  3. Verify outcome

It’s logical to consider that, given an initial known state, when an operation is executed, then it should produce the same expected outcome, every time. For the outcome to turn out different either the initial state has to change, or the operation implementation itself.

You’re probably familar with the words marked in bold above. If not, they represent the popular Given-When-Then pattern for writing unit tests in a way that favors readability and structure. The idea here is simple:

Define and enforce a single, standardized coding style for writing unit tests

The Given-When-Then pattern can be adopted in a variety of ways. One of them is to structure a unit test method as three distinct methods. For instance, consider a password strength test:


[TestMethod]
public void WeakPasswordStrengthTest()
{
    var password = GivenAWeakPassowrd();
    var score = WhenThePasswordStrengthIsEvaluated(password);
    ThenTheScoreShouldIndicateAWeakPassword(score);
}

private string GivenAWeakPassowrd()
{
    return "qwerty";
}

private int WhenThePasswordStrengthIsEvaluated(string password)
{
    var calculator = new PasswordStrengthCalculator();
    return (int)calculator.GetStrength(password);
}

private void ThenTheScoreShouldIndicateAWeakPassword(int score)
{
    Assert.AreEqual((int)PasswordStrength.Weak, score);
}

Using this approach the main test method becomes a three-line description of the unit test’s purpose that even a non-developer can understand with ease just by reading it. In practice, unit tests main methods end up becoming a low level documentation of your system’s behaviour providing not only a textual description but also the possibility to execute the code, debug it and find out what happens internally. This is extremely valuable for shortening the system architecture learning curve as new developers join the team.

It is important to highlight that when it comes to coding style, there’s no single right way of doing it. The example I presented above may please some developers and displease others for, say, being verbose, and that’s all right. What really matters is coming to an agreement within your development team on a coding convention for writing unit tests that make sense to you, and stick to it.

Managing test contexts

Unit test context management is a topic that is not discussed often enough. By “test context” I mean the entire dependency injection and initial state setup required for successfully running unit tests.

As noted before unit testing is more effective when developers spend less time worrying about setting up test contexts and more time writing test cases. We derive our last guideline from the observation that a few test contexts can be shared by a much larger number of test cases:

Make use of builder classes to separate the construction of test contexts from the implementation of unit test cases

The idea is to encapsulate the construction logic of test contexts in builder classes, referencing them in unit test classes. Each context builder is then responsible for creating a specific test scenario, optionally defining methods for particularizing it.

Let’s take a look at another illustrative code example. Suppose we are developing an anti-fraud component for detecting mobile application users suspicious location changes. The test context builder might look like this:


public class MobileUserContextBuilder : ContextBuilder
{
    public override void Build()
    {
        base.Build();

        /*
            The build method call above is used for
            injecting dependencies and setting up generic
            state common to all tests.

            After it we would complete building the test
            context with what's relevant for this scenario
            such as emulating a mobile user account sign up.
        */
    }

    public User GetUser()
    {
        /*
            Auxiliary method for returning the user entity
            created for this test context.
        */
    }

    public void AddDevice(User user, DeviceDescriptior device)
    {
        /*
            Auxiliary method for particularizing the test
            context, in this case for linking another
            mobile device to the test user's account
            (deviceType, deviceOS, ipAddress, coordinates, etc)
        */
    }
}

The test context created by this MobileUserContextBuilder is generic enough that any test case required to start from a state in which the application already has a mobile user registered can use it. On top of that it defines the AddDevice method for particularizing the test context to fit our fictitious anti-fraud component testing needs.

Consider that this anti-fraud component is called GeolocationScreener and is responsible for checking wether or not a mobile user’s location changed too quickly, which would indicate that he’s probably faking his real coordinates. One of its unit tests might look like the following:


public class GeolocationScreenerTests
{
    [TestInitialize]
    public void TestInitialize()
    {
        context = new MobileUserContextBuilder();
        context.Build();
    }
    
    [TestMethod]
    public void SuspiciousCountryChangeTest()
    {
        var user = GivenALocalUser();
        var report = WhenTheUserCountryIsChangedAbruptly(user);
        ThenAnAntiFraudAlertShouldBeRaised(report);
    }
    
    [TestCleanup]
    public void TestCleanup()
    {
        context.Dispose();
    }
    
    private User GivenALocalUser()
    {
        return context.GetUser();
    }
    
    private SecurityReport WhenTheUserCountryIsChangedAbruptly(User user)
    {
        var device = user.CurrentDevice.Clone();
        device.SetLocation(Location.GetCountry("Italy").GetCity("Rome"));
        context.AddDevice(user, device);

        var screener = new GeolocationScreener();
        return screener.Evaluate(user);
    }
    
    private void ThenAnAntiFraudAlertShouldBeRaised(SecurityReport report)
    {
        Assert.AreEqual(RetportType.Geolocation, report.Type);
        Assert.IsTrue(report.AlertRaised);
    }
    
    private MobileUserContextBuilder context;
}

It’s visible that the amount of code dedicated to setting up the test context in this sample test class is minimal, since it’s almost entirely contained within the builder class, preserving code readability and organization. The amortized time taken for setting up the test context becomes very short as more and more test cases take advantage of the available library of test context builders.

Conclusion

In this post I have covered the topic of unit testing providing five major guidelines for addressing the challenge of mantaining effectiveness in an ever growing base of test cases. These guidelines have important ramifications in system architecture, which should, from the begining of a software project, take unit testing requirements into account in order to promote an environment in which developers see value in and are motivated to write unit tests.

Unit tests should be regarded as a constituent part of your system architecture, as vital as the components they test, and not as second class citizens that the development team merely writes for the purpose of filling up managerial reports check boxes or feeding up metrics.

In closing, if you’re working in a legacy project with few or none unit tests, that doesn’t employ the DIP, this post may not contain the best strategy for you, since I intentionally avoided talking about sophisticated mocking frameworks that, in the context of legacy projects, become a viable option for introducing unit tests to extremely coupled code.


Notes

  • I have decided to incorporate the “Software complexity” section into the article only after receiving a feedback in a reddit comment. For reference you can find the original article here.

Sources

[1] Bartosz Milewski (2013). “Basics of Haskell”. School of Haskell. FP Complete. Retrieved 2018-07-13.

Mar 21, 2020 - A strategy for effective system modularization

Back in 1972, almost half a century ago, David Lorge Parnas published an iconic paper entitled “On the Criteria to Be Used in Decomposing Systems into Modules” 1. In it he discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time, and also presents a criterion for effectively carrying out the decomposition of a system into modules.

When I first read this paper I was impressed by how relevant and practical it is. I stumbled on it while reading through an online discussion on the topic of object oriented programming. Yet again someone had published a post criticizing the fundamental concepts of OOP and in the discussion there was a comment pointing out to the author that he had misinterpreted the concept of encapsulation in his critique, linking to the paper.

It was in this paper that the concept of information hiding, closely related to encapsulation, was first described. This concept plays a central role in the strategy for effective system modularization, as I’ll describe in the following sections.

Benefits of an effective modularization

Modularization is the division of a system or product into smaller, independent units that work alongside each other for implementing said system or product requirements. In regard to software projects, it is applied to a system at its higher levels of abstraction (ex: microservices architecture) and at its lower levels as well (ex: object oriented class design).

When performed effectively, modularization brings many benefits, among them:

  • Managerial: Development time should be shortened because separate teams would work on each module with little need for communication
  • Flexibility: It should be possible to make drastic changes to one module without a need to change others
  • Comprehensibility: It should be possible to study the system one module at a time, and the whole system can therefore be better designed because it is better understood

A strong indicator that a system has not been properly modularized is not reaping the benefits listed above. The challenge is then to define, and follow, a criterion that leads the system towards an effective modularized structure.

According to D.L. Parnas, one might choose between two distinct criteria for breaking a system into modules:

  1. The Procedural Criterion: Make each major step in the processing a module, typically begining with a rough flowchart and moving from there to a detailed implementation
  2. The Information Hiding Criterion: Every module is characterized by its knowledge of a design decision which it hides from all others

In the next section we will use an example system to demonstrate how following the second criterion leads the system towards a much more effective modularized structure than following the first one, and why the first criterion should actually never be followed alone unless there’s a strong motivation to do so.

Example system: Scheduling Calendar

Consider a Scheduling Calendar system that implements the following features:

  • As an organizer, I want to create a scheduled event so that I can invite guests to attend it
  • As an organizer, I want to be informed of conflicting guests schedules so that I’m able to propose a valid event date
  • As a participant, I want automatic reminders to notify me of upcoming events I should attend so that I don’t miss them

Let’s exercise both criteria for sketching this system’s modularized structure. Notice that I will not be using class diagrams as not to induce an OOP bias in this exercise.

Using the procedural criterion

A straightforward procedure for implementing the event creation feature is:

  1. Read JSON input with the proposed event information (Date, Title, Location, Participants)
  2. Validate against a user_schedules database table that all participants can attend to this new event
  3. In case one or more participants isn’t able to attend, throw an exception informing it, otherwise proceed
  4. Insert an entry in an events table and one entry for each participant in a user_schedules table

We also need to define a procedure for implementing the automatic notification feature:

  1. Setup a notifier task that continuously polls the user_schedules table
  2. Select all user_schedules whose notification_date column is due and notified column is false
  3. For each resulting entry, send an e-mail reminder message to the corresponding event participant
  4. Then, for each resulting entry, set the notified column value with true

The database schema is being loosely defined since it’s not the central point here to discuss it. It’s sufficient to say that, considering a relational database and the third normal form, three tables would suffice the storage necessities of this exercise: events, users and user_schedules.

Based on these two procedures, we might define the following modules for the Scheduling Calendar:

Procedural Modularization

Naturally, following this criterion leads to modules with several responsibilities. The scheduler module is parsing the input, validating data, querying the database and inserting new entries. The notifier module is also querying the database, modifying entries, preparing and sending e-mail messages.

Using Information Hiding as a criterion

Information hiding is the principle of segregation of the design decisions in a system that are most likely to change, thus protecting other parts of the system from extensive modification if a design decision is indeed changed. The protection involves providing a stable interface which isolates the remainder of the system from the implementation.

To apply this principle we start with the system requirements and extrapolate them, anticipating all possible improvement/change requests we can think of that our users, or any stakeholder actually, might ask:

  • Handle different input formats (ex: JSON, XML)
  • Allow the addition and removal of participants after the creation of an event
  • Allow users to customize the frequency of event reminders (ex: single vs multiple notifications per event)
  • Implement different notification types (E-mail, SMS, Push Notification)
  • Support a different storage medium (ex: SQL Database, NoSQL Database, In Memory - for testing purposes)

Hopefully most of them will make sense, but it’s always a good idea to involve a colleague to validate them before making a design decision that might be expensive to change later on.

Now the challenge is to define a system structure that isolates these possible changes to individual modules. Here’s a proposition:

Information Hiding Modularization

As you can see, several specialized modules appeared (in blue), segregating system responsibilities.

The InputParser module hides the knowledge of what input format is being used, converting the JSON data into an internal representation. If we are required to support XML instead, it’s just a matter of implementing another kind of InputParser and plug it into our system.

The Repository modules hide the knowledge of the storage medium from the Scheduler and Notifier modules. Again, if we are required to change persistence to another kind of database we can do so without ever touching Scheduler and Notifier modules. On top of that the specialized repository modules can assimilate data modification and querying responsibilities, making it easier to implement functional changes to events and user schedules.

A MessageSender module is employed for hiding the knowledge of how to send specific notification types. It receives a standardized message request from the Notifier module and sends the corresponding e-mail reminder. If we need to start sending SMS reminders we just have to implement a new kind of MessageSender and plug it to the output of the Notifier.

With the extraction of these specialized modules the original Scheduler and Notifier modules become thinner and take on a new role acting as higher level services, orchestrating lower level modules for implementing system operations. D.L. Parnas reasoned about this hierarchical structure that is formed while decomposing the system, pointing out that it favors code reuse, leveraging productivity. He also warned against lower level modules making use of higher level modules, as it would break the hierarchical structure.

Conclusion

In this exercise I tried to demonstrate how using the information hiding criterion naturally leads to an improved system structure when compared to using the procedural criterion. The latter results in less modules that aggregate many responsibilities, while the former promotes the segregation of responsibilities into several specialized modules. These specialized modules become the foundation of a hierarchical system structure that not only improves comprehension of the system but also it’s flexibility.

The proposed strategy for effective system modularization is then to:

  1. Enlist all operations the system is required to implement
  2. Anticipate possible improvement/change requests for these operations
  3. Identify design decisions likely to change, prioritizing them if necessary
  4. Extract specialized modules that encapsulate these design decisions
  5. Establish and maintain a clear hierarchical structure within the system

The first two steps will help visualize what the system design decisions are, upon which the information hiding criterion (third and fourth steps) is applied. Depending on the scale of enlisted design decisions susceptible to change a prioritization step may come in handy for directing development efforts and maximizing value:

Prioritization matrix

In closing I would like to add another quote from D.L. Parnas own conclusion pertaining this strategy’s third step, in which specialized modules are extracted from the system:

Each module is then designed to hide such a decision from the others. Since, in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing. To achieve an efficient implementation we must abandon the assumption that a module is one or more subroutines, and instead allow subroutines and programs to be assembled collections of code from various modules.

For me this quote captures the main paradigm shift from procedural to object oriented programming.


Sources

[1] Parnas, D.L. (December 1972). “On the Criteria To Be Used in Decomposing Systems into Modules” (PDF)

Feb 25, 2020 - A mindset for improving the product development workflow

Software developers are responsible for carrying out tasks from a product backlog for delivering increments of value in a regular manner. The mindset we adopt towards work can have a direct impact in the quality and speed of our output. In this article I briefly discuss how adopting a “consultant mindset” internally within your company can help improve the product development workflow.

Backlog Refinement

In an ideal world software developers would only work on tasks from a perfectly refined product backlog. Tasks requirements, user stories, user flows, interface designs, edge cases, etc would be thoroughly detailed, allowing for maximum productivity during development since all required information would be easy to access and readily available.

In practice this is seldom the case. Backlog refinement is an important but often underattended process. Below I provide a usual definition of what it is:

Backlog refinement is the ongoing process of reviewing product backlog items and checking that they are appropriately prepared and ordered in a way that makes them clear and executable for teams once they are planned for development

It can be regarded as a specification process somewhat perpendicular to the software development process, i.e., backlog refinement is driven by the product vision, fit to technical constraints and performed concurrently and in anticipation of development:

Product Backlog

Typical signs of a deficient product backlog are:

  1. Lack of user stories detailing what a feature should do
  2. Missing interface designs defining what a feature should look like
  3. Absence of relevant non-functional requirements (ex: performance, technology stack, etc)
  4. Variability of feature requirements

In the absence of information developers are faced with two choices, either halt development and switch to another task, or work to fill in the gaps themselves to make the task completion feasible. This latter approach can be dangerous if not handled properly, since strategic product decisions could be left open and done on the fly without oversight.

Here enters the consultant mindset, which I propose as a solution to mitigate the risks posed by insufficiently refined backlog items and improve development productivity.

The Consultant Mindset

A consultant-minded developer is expected to work collaboratively within the organization taking part not only of a software project’s technical scope, but also getting involved in its business scope as well. He/she is genuinely interested in helping people understand problems and solve them, acting as more than merely an executor of technical tasks.

Product Scopes

Larger organizations usually have the resources to build fully featured specialized business and technical teams. A deficient product backlog in this case will most likely be the result of problematic internal processes or underperforming professionals.

However, medium and small organizations, specially early stage startups, may not have the resources to maintain these fully featured teams, and team members are often required to hold multiple responsibilities, of both business and technical scopes.

In this context consultant-minded developers are most valuable. Put simply, they are software developers that take backlog items to work on and double check they i) capture business value, ii) have clear acceptance criteria and iii) are technically feasible before starting development.

Upon finding specification issues with the task at hand they actively collaborate with stakeholders seeking more information to understand their perspective, giving expert technical and user experience design advice, which (most) developers naturally acquire over the years working in different projects, and proposing practical solutions for moving forward.

Agile software development frameworks (ex: Scrum) dictates that developers should be shielded from external influences and interruptions as to leverage their productivity. While this is true, it doesn’t mean the development team should be isolated in a silo prevented from reaching out stakeholders when needed, internal or external ones, in the context of backlog items to clear the path for development. As a bonus software developers employing this mindset will grow their vision of the product and potentially perceive their work as more useful and meaningful.

Finally, this simple proactive behavior promotes a culture of action and an environment of open and transparent collaboration beneficial to the company, and even if the motivation for proposing it comes from a scenario of a deficient product backlog, I see no reason why it wouldn’t also add value in a scenario where product backlog refinement is performed appropriately.