May 17, 2020 - Sandboxing front-end apps from GitHub with Docker

I’m often required to evaluate simple take home projects from candidates applying to front-end developer positions at our company. In these projects candidates are asked to implement a few UI components that take user input and produce an output, nothing fancy, but enough to get a grasp on the candidate’s familiarity with modern front-end frameworks and to ask follow up questions about his design decisions in the interview that follows.

It’s open to the candidate to choose the front-end framework that he/she is most comfortable with (eg: React, Angular, Vue) for this exercise, and we also ask him to share his solution in a GitHub repository as to facilitate the reviewing process.

So after reviewing a few dozens of these projects I started using a simple sandboxing approach that I’m sharing in this article in order to quickly an easily build and run these apps and assess them in a controlled manner.

Sandboxing requirements

In short, these were the implicit system requirements I took into account:

  • The reviewer (myself!) shall be able to spin up the front-end app with a single command
  • The system shall provide access to app compilation/runtime errors in case of failure
  • The system shall isolate the untrusted front-end app execution for security reasons
  • The system shall support different front-end development environments/configurations


The solution was implemented with Docker and is available on GitHub: GitRunner (not a very creative name, I know 🙈). It provides a few Docker images for building and running front-end projects from GitHub, and here’s how it works:

1) First, build the base Alpine Linux image:

cd /images/alpine-git-node
docker build . -t alpine-git-node

2) Then, build the target platform image, for instance:

cd /images/alpine-git-npm
docker build . -t alpine-git-npm

3) Finally execute this docker command pointing to the target GitHub repository:

docker run -d \
 -e GIT_REPO_URL="" \
 -e COMMAND="npm install && npm start" \
 -p 4100:4100 \
 --name sandbox alpine-git-npm

4) And optionally attach to the command screen within the container to see the terminal output:

docker exec -it <CONTAINER_ID> sh
# screen -r


(In order to leave the secondary screen back to the container primary shell, type CTRL + A + D.)


The first two steps for building the target platform Docker image will only need to be executed once per configuration. Every once in a while it may be required to build a new image due to framework updates, for supporting a new configuration.

The third step is the actual “single command” that spins up the front-end app, and receives two custom variables:

  • GIT_REPO_URL: Url of the GitHub repository containing the front-end app source code
  • COMMAND: The front-end app startup command, typically bringing up a development web server

These variables are fed into a simple script for cloning the repository and then running the startup command in a secondary screen:

set -e

git clone $GIT_REPO_URL repo
cd repo

screen -dmS daemon sh -c "$COMMAND" >> logfile.log

sleep infinity

The secondary terminal screen is adopted for running the startup command because development web servers are typically terminated if the compilation fails, or the app crashes, which would bring the container down were they run as the container’s main process.

At the end of the entry point script there’s an unconventional sleep infinity command. This way, if the startup command fails, it holds the container up, allowing us to start an iterative bash session within the container to rerun the startup command and assess errors.

Lastly, proper isolation is only achieved when executing this container in a separate virtual machine, since container solutions don’t guarantee to provide complete isolation. According to the Docker documentation1:

One primary risk with running Docker containers is that the default set of capabilities and mounts given to a container may provide incomplete isolation, either independently, or when used in combination with kernel vulnerabilities.

That’s it! After assessing the front-end app the container can be stopped and removed for disposing allocated resources. I’ve been successfully using this straightforward approach regularly, and it’s been saving me some time and effort. I hope this comes useful to you as well.


[1] Docker security. Retrieved 2020-05-16.

Apr 23, 2020 - On the architecture for unit testing

Automated testing is an integral part of any major software project as a means for improving quality, productivity and flexibility. As such, it’s vital that system architecture is designed in a way to facilitate the development and execution of automated tests.

Quality is improved because automated testing execution allows us to find and solve problems early in the development cycle, much before a product change is deployed to production and becomes available to end-users.

Productivity increases because the earlier a problem is found in the development cycle, the cheaper it’s to fix it, and it’s easy to see why. If a software developer is able to run an automated test suite before integrating his code changes to the main repository he can quickly discover newly introduced bugs and fix them in the act. However, if such test suite is not available, newly introduced bugs may only appear in a manual testing phase later on, or even worse, reported by end-users, requiring developers to step out of the regular development workflow for investigating and fixing them.

Flexibility is improved because developers feel more confident for refactoring code, upgrading packages and modifying system behavior when required as they rely on a test suite with high level of coverage for assessing the impacts of their code changes.

When discussing automated testing I also like to bring up the topic of risk management to the conversation. As a lead software engineer risk management is a big part of my job, and it involves mentoring the development team in practices and processes that reduce the risks of technical deterioration of the product. From the benefits listed above it’s clear that the employment of an adequate automated testing strategy fits right in, helping to mitigate risks in a software project.

Moving forward, we can divide automated tests into at least three different types according to the strategy for implementing and running them, which are shown in the famous test pyramid below:

Testing Pyramid

Unit tests are cheap to develop and cheap to run with respect to time and resources employed, and are focused on testing individual system components (eg: business logic) isolated from external dependencies.

Integration tests take one step further, and are developed and ran without isolating external dependencies. In this case we are interested in evaluating that all system components interact as expected when put together and faced with integration constraints (eg: networking, storage, processing, etc).

Finally, on the top of the pyramid, GUI tests are the most expensive to automate and execute. They usually rely on UI input/output scripting and playback tools for mimicking an end-user’s interaction with the system’s graphical user interface.

In this article we will be focusing on the foundation of the test pyramid, the unit tests, and on system architecture considerations for promoting their adoption.

Properties of an effective unit test

Fist, let’s enumerate what constitutes an effective, well-crafted unit test. Below is a proposition:

  • Short, having a single purpose
  • Simple, clear setup and tear down
  • Fast, executes in a fraction of a second
  • Standardized, follows strict conventions

Ideally a unit test should display all of these properties, below I elaborate why.

If the unit test isn’t short enough it will be harder to read it and understand its purpose, i.e., exactly what it’s testing. So for that reason unit tests should have a clear objective and evaluate one thing only, instead of trying to perform multiple evaluations at the same time. This way, when a unit test breaks, a developer will more easily and quickly assess the situation and fix it.

If unit tests require a lot of effort to setup their test context, and tear it down afterwards, developers will often start questioning whether the time being invested in writing these tests is worth it. Therefore, we need to provide an enviroment for writing unit tests that takes care of managing all the complexity of the test context, such as injecting dependencies, preloading data, clearing up caches, and so forth. The easier it is to write unit tests, the more motivated developers will be for creating them!

If executing a suite of unit tests takes a lot of time, developers will naturally execute it less often. The danger here lies in having such a lengthy unit test suite that it becomes impractical, and developers start skipping running it, or running it selectively, reducing its effectiveness.

Lastly, if tests aren’t standardized, before too long your test suite will start looking like the wild west, with different and sometimes conflicting coding styles being used for writing unit tests. Hence, pursuing system design coherence is as much as valid in the scope of unit testing as it is for the overall system.

Once we agree on what constitutes effective unit tests we can start defining system architecture guidelines that promote their properties, as described in the following sections.

Software complexity

Software complexity arises, among other factors, from the growing number of interactions between components within a system, and the evolution of their internal states. As complexity gets higher the risk of unintentionally interfering in intricated webs of components interactions increases, potentially leading to the introduction of defects when making code changes.

Furthermore, it’s common sense that the higher the complexity of a system, the harder it is to maintain and test it, which leads to a first (general) guideline:

Keep an eye for software complexity and follow design practices to contain it

A practice worth mentioning for managing complexity while improving testability is to employ Pure Functions and Immutability whenever possible in your system design. A pure function is a function that has the following properties:1

  • Its return value is the same for the same arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams from I/O devices).
  • Its evaluation has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or I/O streams).

From its properties it’s clear that pure functions are well suited to unit testing. Their usage also removes the need for much of the complementary practices which are discussed in the following sections for handling, mostly, stateful components.

Immutability plays an equally important role. An immutable object is an object whose state cannot be modified after it is created. They are more simple to interact with and more predictable, contributing for lowering the system complexity, disentangling global state.

Isolating dependencies

By their very definition unit tests are intended to test individual system components in isolation, since we don’t want the result of a component’s unit tests to be influenced by one of its dependencies. The degree of isolation varies according to specifics of the component under test and preferences of each development team. I personally don’t worry for isolating lightweight, internal business classes, since I see no value added in replacing them by a test targeted component that will display pretty much the same behavior. Be that as it may the strategy here is straightforward:

Apply the dependency inversion pattern in component design

The dependency inversion pattern (DIP) states that both high-level and low-level objects should depend on abstractions (e.g. interfaces) instead of specific concrete implementations. Once a system component is decoupled from its dependencies we can easily replace them in the context of a unit test by simplified, test targeted concrete implementations. The class diagram below illustrates the resulting structure:

Isolated Dependencies

In this example the component under test is dependent on Repository and FileStore abstractions. When deployed to production we might inject a concrete SQL based implementation for the repository class and a S3 based implementation for the file store component, for storing files remotely in the AWS Cloud. Nevertheless, when running unit tests we will want to inject simplified functional implementations that don’t rely on external services, such as the “in memory” implementations painted in green.

If you’re not familiar with the DIP, I have another article that goes through a practical overview on how to use it in a similar context that you may find helpful: Integrating third-party modules.

The Mocks vs Fakes debate

Notice that I’m not referring to these “in memory” implementations as “mocks”, which are simulated objects that mimic the behavior of real objects in limited, controlled ways. I do this deliberately, since I’m against the usage of mock objects in favor of fully compliant “fake” implementations that give us more flexibility for writing unit tests, and can be reused across several unit test classes in a more reliable way than setting up mocks.

To get into more detail suppose we are writing a unit test for a component that depends on the FileStore abstraction. In this test the component adds an item to the file store but isn’t really worried whether the operation succeeds or fails (eg: a log file), and hence we decide to mock that operation in a “dummy” way. Now suppose that later on requirements change, and the component needs to ensure that the file is created by reading from the file store before proceeding, forcing us to update the mock’s behavior in order for the test to pass. Then, imagine requirements change yet again and the component needs to write to multiple files (eg: one for each log level) instead of only one, forcing another improvement of our mock object behavior. Can you see what’s happening? We are slowly improving our mock making it more similar to a concrete implementation. What’s worse is that we may end up with dozens of independent, half-baked, mock implementations scattered throughout the codebase, one for each unit test class, resulting in more maintenance effort and less cohesion within the testing environment.

To address this situation I propose the following guideline:

Rely on Fakes for implementing unit tests instead of Mocks, treating them as first class citizens, and organizing them in reusable modules

Since Fake components implement business behavior they’re inherently a more costly initial investment when compared to setting up mocks, no doubt about that. However, their return in the long-term is definitely higher, and more aligned with the properties of effective unit tests.

Coding style

Every automated test can be described as a three-step script:

  1. Prepare test context
  2. Execute key operation
  3. Verify outcome

It’s logical to consider that, given an initial known state, when an operation is executed, then it should produce the same expected outcome, every time. For the outcome to turn out different either the initial state has to change, or the operation implementation itself.

You’re probably familar with the words marked in bold above. If not, they represent the popular Given-When-Then pattern for writing unit tests in a way that favors readability and structure. The idea here is simple:

Define and enforce a single, standardized coding style for writing unit tests

The Given-When-Then pattern can be adopted in a variety of ways. One of them is to structure a unit test method as three distinct methods. For instance, consider a password strength test:

public void WeakPasswordStrengthTest()
    var password = GivenAWeakPassowrd();
    var score = WhenThePasswordStrengthIsEvaluated(password);

private string GivenAWeakPassowrd()
    return "qwerty";

private int WhenThePasswordStrengthIsEvaluated(string password)
    var calculator = new PasswordStrengthCalculator();
    return (int)calculator.GetStrength(password);

private void ThenTheScoreShouldIndicateAWeakPassword(int score)
    Assert.AreEqual((int)PasswordStrength.Weak, score);

Using this approach the main test method becomes a three-line description of the unit test’s purpose that even a non-developer can understand with ease just by reading it. In practice, unit tests main methods end up becoming a low level documentation of your system’s behaviour providing not only a textual description but also the possibility to execute the code, debug it and find out what happens internally. This is extremely valuable for shortening the system architecture learning curve as new developers join the team.

It is important to highlight that when it comes to coding style, there’s no single right way of doing it. The example I presented above may please some developers and displease others for, say, being verbose, and that’s all right. What really matters is coming to an agreement within your development team on a coding convention for writing unit tests that make sense to you, and stick to it.

Managing test contexts

Unit test context management is a topic that is not discussed often enough. By “test context” I mean the entire dependency injection and initial state setup required for successfully running unit tests.

As noted before unit testing is more effective when developers spend less time worrying about setting up test contexts and more time writing test cases. We derive our last guideline from the observation that a few test contexts can be shared by a much larger number of test cases:

Make use of builder classes to separate the construction of test contexts from the implementation of unit test cases

The idea is to encapsulate the construction logic of test contexts in builder classes, referencing them in unit test classes. Each context builder is then responsible for creating a specific test scenario, optionally defining methods for particularizing it.

Let’s take a look at another illustrative code example. Suppose we are developing an anti-fraud component for detecting mobile application users suspicious location changes. The test context builder might look like this:

public class MobileUserContextBuilder : ContextBuilder
    public override void Build()

            The build method call above is used for
            injecting dependencies and setting up generic
            state common to all tests.

            After it we would complete building the test
            context with what's relevant for this scenario
            such as emulating a mobile user account sign up.

    public User GetUser()
            Auxiliary method for returning the user entity
            created for this test context.

    public void AddDevice(User user, DeviceDescriptior device)
            Auxiliary method for particularizing the test
            context, in this case for linking another
            mobile device to the test user's account
            (deviceType, deviceOS, ipAddress, coordinates, etc)

The test context created by this MobileUserContextBuilder is generic enough that any test case required to start from a state in which the application already has a mobile user registered can use it. On top of that it defines the AddDevice method for particularizing the test context to fit our fictitious anti-fraud component testing needs.

Consider that this anti-fraud component is called GeolocationScreener and is responsible for checking wether or not a mobile user’s location changed too quickly, which would indicate that he’s probably faking his real coordinates. One of its unit tests might look like the following:

public class GeolocationScreenerTests
    public void TestInitialize()
        context = new MobileUserContextBuilder();
    public void SuspiciousCountryChangeTest()
        var user = GivenALocalUser();
        var report = WhenTheUserCountryIsChangedAbruptly(user);
    public void TestCleanup()
    private User GivenALocalUser()
        return context.GetUser();
    private SecurityReport WhenTheUserCountryIsChangedAbruptly(User user)
        var device = user.CurrentDevice.Clone();
        context.AddDevice(user, device);

        var screener = new GeolocationScreener();
        return screener.Evaluate(user);
    private void ThenAnAntiFraudAlertShouldBeRaised(SecurityReport report)
        Assert.AreEqual(RetportType.Geolocation, report.Type);
    private MobileUserContextBuilder context;

It’s visible that the amount of code dedicated to setting up the test context in this sample test class is minimal, since it’s almost entirely contained within the builder class, preserving code readability and organization. The amortized time taken for setting up the test context becomes very short as more and more test cases take advantage of the available library of test context builders.


In this post I have covered the topic of unit testing providing five major guidelines for addressing the challenge of mantaining effectiveness in an ever growing base of test cases. These guidelines have important ramifications in system architecture, which should, from the begining of a software project, take unit testing requirements into account in order to promote an environment in which developers see value in and are motivated to write unit tests.

Unit tests should be regarded as a constituent part of your system architecture, as vital as the components they test, and not as second class citizens that the development team merely writes for the purpose of filling up managerial reports check boxes or feeding up metrics.

In closing, if you’re working in a legacy project with few or none unit tests, that doesn’t employ the DIP, this post may not contain the best strategy for you, since I intentionally avoided talking about sophisticated mocking frameworks that, in the context of legacy projects, become a viable option for introducing unit tests to extremely coupled code.


  • I have decided to incorporate the “Software complexity” section into the article only after receiving a feedback in a reddit comment. For reference you can find the original article here.


[1] Bartosz Milewski (2013). “Basics of Haskell”. School of Haskell. FP Complete. Retrieved 2018-07-13.

Mar 21, 2020 - A strategy for effective system modularization

Back in 1972, almost half a century ago, David Lorge Parnas published an iconic paper entitled “On the Criteria to Be Used in Decomposing Systems into Modules” 1. In it he discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time, and also presents a criterion for effectively carrying out the decomposition of a system into modules.

When I first read this paper I was impressed by how relevant and practical it is. I stumbled on it while reading through an online discussion on the topic of object oriented programming. Yet again someone had published a post criticizing the fundamental concepts of OOP and in the discussion there was a comment pointing out to the author that he had misinterpreted the concept of encapsulation in his critique, linking to the paper.

It was in this paper that the concept of information hiding, closely related to encapsulation, was first described. This concept plays a central role in the strategy for effective system modularization, as I’ll describe in the following sections.

Benefits of an effective modularization

Modularization is the division of a system or product into smaller, independent units that work alongside each other for implementing said system or product requirements. In regard to software projects, it is applied to a system at its higher levels of abstraction (ex: microservices architecture) and at its lower levels as well (ex: object oriented class design).

When performed effectively, modularization brings many benefits, among them:

  • Managerial: Development time should be shortened because separate teams would work on each module with little need for communication
  • Flexibility: It should be possible to make drastic changes to one module without a need to change others
  • Comprehensibility: It should be possible to study the system one module at a time, and the whole system can therefore be better designed because it is better understood

A strong indicator that a system has not been properly modularized is not reaping the benefits listed above. The challenge is then to define, and follow, a criterion that leads the system towards an effective modularized structure.

According to D.L. Parnas, one might choose between two distinct criteria for breaking a system into modules:

  1. The Procedural Criterion: Make each major step in the processing a module, typically begining with a rough flowchart and moving from there to a detailed implementation
  2. The Information Hiding Criterion: Every module is characterized by its knowledge of a design decision which it hides from all others

In the next section we will use an example system to demonstrate how following the second criterion leads the system towards a much more effective modularized structure than following the first one, and why the first criterion should actually never be followed alone unless there’s a strong motivation to do so.

Example system: Scheduling Calendar

Consider a Scheduling Calendar system that implements the following features:

  • As an organizer, I want to create a scheduled event so that I can invite guests to attend it
  • As an organizer, I want to be informed of conflicting guests schedules so that I’m able to propose a valid event date
  • As a participant, I want automatic reminders to notify me of upcoming events I should attend so that I don’t miss them

Let’s exercise both criteria for sketching this system’s modularized structure. Notice that I will not be using class diagrams as not to induce an OOP bias in this exercise.

Using the procedural criterion

A straightforward procedure for implementing the event creation feature is:

  1. Read JSON input with the proposed event information (Date, Title, Location, Participants)
  2. Validate against a user_schedules database table that all participants can attend to this new event
  3. In case one or more participants isn’t able to attend, throw an exception informing it, otherwise proceed
  4. Insert an entry in an events table and one entry for each participant in a user_schedules table

We also need to define a procedure for implementing the automatic notification feature:

  1. Setup a notifier task that continuously polls the user_schedules table
  2. Select all user_schedules whose notification_date column is due and notified column is false
  3. For each resulting entry, send an e-mail reminder message to the corresponding event participant
  4. Then, for each resulting entry, set the notified column value with true

The database schema is being loosely defined since it’s not the central point here to discuss it. It’s sufficient to say that, considering a relational database and the third normal form, three tables would suffice the storage necessities of this exercise: events, users and user_schedules.

Based on these two procedures, we might define the following modules for the Scheduling Calendar:

Procedural Modularization

Naturally, following this criterion leads to modules with several responsibilities. The scheduler module is parsing the input, validating data, querying the database and inserting new entries. The notifier module is also querying the database, modifying entries, preparing and sending e-mail messages.

Using Information Hiding as a criterion

Information hiding is the principle of segregation of the design decisions in a system that are most likely to change, thus protecting other parts of the system from extensive modification if a design decision is indeed changed. The protection involves providing a stable interface which isolates the remainder of the system from the implementation.

To apply this principle we start with the system requirements and extrapolate them, anticipating all possible improvement/change requests we can think of that our users, or any stakeholder actually, might ask:

  • Handle different input formats (ex: JSON, XML)
  • Allow the addition and removal of participants after the creation of an event
  • Allow users to customize the frequency of event reminders (ex: single vs multiple notifications per event)
  • Implement different notification types (E-mail, SMS, Push Notification)
  • Support a different storage medium (ex: SQL Database, NoSQL Database, In Memory - for testing purposes)

Hopefully most of them will make sense, but it’s always a good idea to involve a colleague to validate them before making a design decision that might be expensive to change later on.

Now the challenge is to define a system structure that isolates these possible changes to individual modules. Here’s a proposition:

Information Hiding Modularization

As you can see, several specialized modules appeared (in blue), segregating system responsibilities.

The InputParser module hides the knowledge of what input format is being used, converting the JSON data into an internal representation. If we are required to support XML instead, it’s just a matter of implementing another kind of InputParser and plug it into our system.

The Repository modules hide the knowledge of the storage medium from the Scheduler and Notifier modules. Again, if we are required to change persistence to another kind of database we can do so without ever touching Scheduler and Notifier modules. On top of that the specialized repository modules can assimilate data modification and querying responsibilities, making it easier to implement functional changes to events and user schedules.

A MessageSender module is employed for hiding the knowledge of how to send specific notification types. It receives a standardized message request from the Notifier module and sends the corresponding e-mail reminder. If we need to start sending SMS reminders we just have to implement a new kind of MessageSender and plug it to the output of the Notifier.

With the extraction of these specialized modules the original Scheduler and Notifier modules become thinner and take on a new role acting as higher level services, orchestrating lower level modules for implementing system operations. D.L. Parnas reasoned about this hierarchical structure that is formed while decomposing the system, pointing out that it favors code reuse, leveraging productivity. He also warned against lower level modules making use of higher level modules, as it would break the hierarchical structure.


In this exercise I tried to demonstrate how using the information hiding criterion naturally leads to an improved system structure when compared to using the procedural criterion. The latter results in less modules that aggregate many responsibilities, while the former promotes the segregation of responsibilities into several specialized modules. These specialized modules become the foundation of a hierarchical system structure that not only improves comprehension of the system but also it’s flexibility.

The proposed strategy for effective system modularization is then to:

  1. Enlist all operations the system is required to implement
  2. Anticipate possible improvement/change requests for these operations
  3. Identify design decisions likely to change, prioritizing them if necessary
  4. Extract specialized modules that encapsulate these design decisions
  5. Establish and maintain a clear hierarchical structure within the system

The first two steps will help visualize what the system design decisions are, upon which the information hiding criterion (third and fourth steps) is applied. Depending on the scale of enlisted design decisions susceptible to change a prioritization step may come in handy for directing development efforts and maximizing value:

Prioritization matrix

In closing I would like to add another quote from D.L. Parnas own conclusion pertaining this strategy’s third step, in which specialized modules are extracted from the system:

Each module is then designed to hide such a decision from the others. Since, in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing. To achieve an efficient implementation we must abandon the assumption that a module is one or more subroutines, and instead allow subroutines and programs to be assembled collections of code from various modules.

For me this quote captures the main paradigm shift from procedural to object oriented programming.


[1] Parnas, D.L. (December 1972). “On the Criteria To Be Used in Decomposing Systems into Modules” (PDF)