Jun 27, 2020 - Disabling TLS 1.0 and TLS 1.1

Following recommendations from RFC-7525 this year all major browsers are disabling TLS versions 1.0 and 1.1. The table below presents approximate deadline for that to happen1,2,3,4:

Browser Name Date
Google Chrome July 2020
Microsoft Edge July 2020
Microsoft IE11 September 2020
Mozilla Firefox March 2020
Safari/Webkit March 2020§

In light of current global circumstances, this planned change has been postponed — originally scheduled for the first half of 2020.

Due to the pandemic, Mozilla reverted the disable of TLS 1.0 and TLS 1.1 for an undetermined amount of time to better enable access to critical government sites sharing Covid-19 information.

§ Release date for Safari Technology Preview.

So why is this happening? Well, these protocol versions were designed around two decades ago and they don’t support a lot of the recent developments in cryptography. Below is a copy of the rationale included in RFC-7525:

  • TLS 1.0 (published in 1999) does not support many modern, strong cipher suites. In addition, TLS 1.0 lacks a per-record Initialization Vector (IV) for CBC-based cipher suites and does not warn against common padding errors.

  • TLS 1.1 (published in 2006) is a security improvement over TLS 1.0 but still does not support certain stronger cipher suites.

In the next sections I provide more information on how this may affect your web applications, and how to handle this transition, based on my recent experience.

Security Scan

To be honest I only found out about browsers dropping support to TLS 1.0 and TLS 1.1 after running a SSL Server Test against one of the web applications I manage and seeing a drop in its overall rating:

SSL Scan grade B

Since this application is required by contract to achieve grade “A” in this particular test I started digging on this subject to understand the difficulty involved in complying with this requirement. Fortunately in my case it was a no brainer, as you’ll see.

Evaluating Impact

My first action was to evaluate how many of my web application users would be impacted by this change. Given my application supported TLS 1.2, and fewer than 0.5% of users were making connections using older TLS versions, the impact was evaluated as minimum.

Overall, the following table shows for each browser the percentage of connections made to SSL/TLS servers in the world wide web using protocol TLS 1.0 and TLS 1.1 from an analysis conducted in late 2018 5:

Browser/Client Name Percentage (%) – Both TLS 1.0 and 1.1
Google Chrome 0.5%
Microsoft IE and Edge 0.72%
Mozilla Firefox 1.2%
Safari/Webkit 0.36%
SSL Pulse Nov. 20186 5.84%

These figures are from two years ago and by now we can expect them to be even lower, meaning the transition should be transparent to most web application users on the internet.

Third-Party Integrations

Moving forward, I checked all of my web application’s third-party integrations in order to confirm that they also supported TLS 1.2, just to name a few:

  • Amazon DynamoDB (data storage)
  • Amazon Elasticsearch Service (text indexing and search)
  • SendGrid (e-mail delivery)

During testing I experienced the following error in some of them, which apparently was caused due to having TLS protocol versions 1.0 and 1.1 enabled in the source code (C#) and at the same time disabled at the Web Server level:

‘ConnectionError’ —> The client and server cannot communicate, because they do not possess a common algorithm.

This error was easily solvable by forcing the TLS 1.2 version in my application source code, replacing this:

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;

By this:

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

With this configuration in place I rebuilt my application and ran integration tests for each external service to guarantee the issue had been solved.

Luckily all of my application’s integrations supported TLS 1.2. If that’s not your case and your Web Server doesn’t give you the option to make exceptions for these external services you’ll have to reach out their technical support and possibly open a request ticket for adding TLS 1.2 support. Another (less secure) option is to create a proxy between your application and the external service that will talk TLS 1.2 with your app and TLS 1.1 or TLS 1.0 with the external service.

Remote Connection

If you’re solely using SSH to connect to remote Virtual Machines (VMs) you can skip this section. However, if like me you also manage some Windows Server VMs connecting to them using the RDP protocol, before disabling TLS 1.0 and 1.1 in these machines make sure RDP is running on TLS 1.2 or above, otherwise you will lock yourself out of the VM!

To be on the safe side besides confirming that the RDP Host in the server was configured to accept TLS 1.2 connections I also disabled older TLS versions in my notebook, connected to the remote machine and sniffed packets using Wireshark:

Wireshark sniffing

This result gave me enough confidence to proceed, knowing that I was indeed connecting using TLS 1.2 and I’d not lose access to the remote server.

Disabling the Protocol

The exact steps you’ll have to follow for disabling TLS 1.0 and TLS 1.1 in your web application will obviously depend on your technology stack. Below I provide references for the three most used web servers (Apache, IIS and Nginx):

If you’ve followed the link with instructions for disabling the protocol in IIS you may have notice it’s the least straightforward of them all! It requires a system wide configuration by modifying a few Registry Keys.

Since I had not one but multiple Windows Server instances to configure I decided to make my life easier and write a simple console app for querying the Windows Registry and making required changes. I made this tool available on GitHub in case you’re interested: TLSConfig.

TLSConfig

As with all Windows Registry changes you will have to restart your VM for them to take effect 😒

Important: In case you follow the steps for disabling the protocol only to realise nothing happened, it could be that your application in sitting behind a load balancer or a proxy server which you will have to configure as well.

Results

After using the TLSConfig tool to disable TLS versions 1.0 and 1.1 in all of my application servers I ran the security scan again and… success!

SSL Scan grade A

DISCLAIMER: This blog post is intended to provide information and resources on this subject only. Most of the recommendations and suggestions here are based on my own experience and research. Follow them at your own will and responsibility, evaluate impact at all times before proceeding, test changes in a separate environment before applying them to production, schedule maintenance to low usage hours and be prepared for things to go wrong.


Sources

[1] Google Developers. Deprecations and removals in Chrome 84. Retrieved 2020-06-27.

[2] Windows Blogs. Plan for change: TLS 1.0 and TLS 1.1 soon to be disabled by default. Retrieved 2020-06-27.

[3] Mozilla. Firefox 74.0 Release Notes. Retrieved 2020-06-27.

[4] WebKit Blog. Release Notes for Safari Technology Preview 98. Retrieved 2020-06-27.

[5] Qualys Blog. SSL Labs Grade Change for TLS 1.0 and TLS 1.1 Protocols. Retrieved 2020-06-27.

[6] Qualys. SSL Pulse is a continuous and global dashboard for monitoring the quality of SSL / TLS support over time across 150,000 SSL- and TLS-enabled websites, based on Alexa’s list of the most popular sites in the world.

May 29, 2020 - Data replication in random regular graphs

Graph theory is extensively studied, experimented on and applied to communications networks 📡. Depending on a communication network’s requirements it may benefit from adopting one or another network topology: Point to point, Ring, Star, Tree, Mesh, and so forth.

In this post I analyze a network topology based on unweighted random regular graphs, and evaluate its robustness for data replication amid partial network disruption. First I present the definition of this kind of graph and describe its properties. Then I implement and validate a random regular graph generation algorithm from the literature. Finally I simulate data replication for varying degrees of partial network disruption and assess this topology effectiveness.

Supporting source code for this article can be found in this GitHub repository.

Definition

A regular graph is a graph where each vertex has the same number of neighbors; i.e. every vertex has the same degree or valency. The image below shows a few examples:

regular-graphs

These sample graphs are regular since we can confirm that every vertex has exactly the same number of edges. The first one is 2-regular (two edges per vertex) and the following two are 3-regular (three edges per vertex).

Even though 0-regular (disconnected), 1-regular (two vertices connected by single edge) and 2-regular (circular) graphs take only one form each, r-regular graphs of the third degree and upwards take multiple distinct forms by combining their vertices in a variety of different ways.

More broadly we can denote Gn,r as the probability space of all r-regular graphs on n vertices, where 3 ≤ r < n. Then, we define a random r-regular graph as the result of randomly sampling Gn,r.

Properties of random regular graphs

There are at least two main properties that are worth exploring in this article. It is possible to prove that as the size of the graph grows the following holds asymptotically almost surely:

  • A random r-regular graph is almost surely r-connected; thus, maximally connected
  • A random r-regular graph has diameter at most d, where d has an upper bound of Θ(logr−1(nlogn)) [1]

The connectivity of a graph is an important measure of its resilience as a network. Qualitatively we can think of it as how tolerant the topology is to vertices failures. In this case, the graph being maximally connected means it’s as fault tolerant as it can be in regard to its degree.

Also relevant for communication networks is the graph diameter, which is the greatest distance between any pair of vertices, and hence is qualitatively related to the complexity of messaging routing within the network. In this case, the graph diameter grows slowly (somewhat logarithmically) as the graph size gets larger.

Graph generation algorithm

A widely known algorithm for generating random regular graphs is the pairing model, which was first given by Bollobas2. It’s a simple algorithm to implement and works fast enough for small degrees, but it becomes slow when the degree is large, because with high probability we will get a multigraph with loops and multiple edges, so we have to abandon this multigraph and start the algorithm again3.

The pairing model algorithm is as follows:

  1. Start with a set of n vertices.
  2. Create a new set of n*k points, distributing them across n buckets, such that each bucket contains k points.
  3. Take each point and pair it randomly with another one, until (n*k)/2 pairs are obtained (a perfect matching).
  4. Collapse the points, so that each bucket (and thus the points it contains) maps onto a single vertex of the original graph. Retain all edges between points as the edges of the corresponding vertices.
  5. Verify if the resulting graph is simple, i.e., make sure that none of the vertices have loops (self-connections) or multiple edges (more than one connection to the same vertex). If the graph is not simple, restart.

You can find my implementation of this algorithm in the following source file: RandomRegularGraphBuilder.cs.

Now that the algorithm is implemented, let’s evaluate that the properties described earlier truly hold.

Evaluating graph connectivity

In order to calculate a graph’s connectivity we need to ask for the minimum number of elements (vertices or edges) that need to be removed to separate the remaining vertices into isolated subgraphs.

We start from the observation that by selecting one vertex at will and then pairing it with all other remaining vertices in the graph, one at a time, to calculate their edge connectivity (i.e., the minimum number of cuts that partitions these vertices into two disjoint subsets) we are guaranteed to eventually stumble across the graphs own edge connectivity:

public int GetConnectivity()
{
    var min = -1;
    for (int i = 1; i < Vertices.Length; i++)
    {
        var c = GetConnectivity(Vertices[0], Vertices[i]);
        if (min < 0 || c < min)
            min = c;
    }
    return min;
}

The code above was taken from my Graph class and does just that from the array of vertices that compose the graph. The algorithm used in this code sample for calculating the edge connectivity between two vertices is straightforward, but a bit more extensive. Here’s what it does on a higher level:

  1. Initialize a counter at zero.
  2. Search for the shortest path between source vertex and destination vertex (using Depth First Search).
  3. If a valid path is found increment the counter, remove all path edges from the graph and go back to step 2.
  4. Otherwise finish. The counter will hold this pair of vertices’ edge connectivity.

It’s important to emphasize that this algorithm is intended for symmetric directed graphs, where all edges are bidirected (that is, for every arrow that belongs to the graph, the corresponding inversed arrow also belongs to it).

Once we are able to calculate a graph’s connectivity, it’s easy to define a unit-test for verifying that random regular graphs are indeed maximally connected:

[TestMethod]
public void RRG_Connectivity_Test()
{
    var n = (int)1e3;
    Assert.AreEqual(3, BuildRRG(n, r: 3).GetConnectivity());
    Assert.AreEqual(4, BuildRRG(n, r: 4).GetConnectivity());
    Assert.AreEqual(5, BuildRRG(n, r: 5).GetConnectivity());
    Assert.AreEqual(6, BuildRRG(n, r: 6).GetConnectivity());
    Assert.AreEqual(7, BuildRRG(n, r: 7).GetConnectivity());
    Assert.AreEqual(8, BuildRRG(n, r: 8).GetConnectivity());
}

This unit test is defined in the RandomRegularGraphBuilderTests.cs source file and, as theorized, it passes ✓

Evaluating graph diameter

The algorithm for calculating the diameter is much easier, particularly in the case of unweighted graphs. It boils down to calculating the maximum shortest path length from all vertices, and then taking the maximum value among them:

public int GetDiameter()
{
    return Vertices.Max(
        v => v.GetMaxShortestPathLength(Vertices.Length)
    );
}

This maximum shortest path length method receives an expectedVerticesCount integer parameter, which is the total number of vertices in the graph. This way the method is able to compare the number of vertices traversed while searching for the maximum shortest path with the graph’s size, and in case they differ, return a special value indicating that certain vertices are unreachable from the source vertex.

So after implementing this method I ran it against random regular graphs of varying sizes and degrees. The results are plotted below:

diameters

We can clearly confirm that the graph’s diameter flattens somewhat logarithmically as we increase the graph’s size and degree.

Since we are dealing with random graphs, instead of simply calculating the diameter for a single sample I generated one hundred samples per size and degree, and took their average. It’s worth noting that the inferred diameter standard deviation started reasonably small (less than 0.5) and diminished to insignificant values as the graph size and degree increased.

Data replication

In this study I simulated the propagation of information across random regular graphs accounting for disruption percentages starting from 0% (no disruption) up to 50% of the graph’s vertices. The simulation runs in steps, and at each iteration active vertices propagate their newly produced/received data to their neighbors.

At the beginning of each simulation vertices are randomly selected and marked as inactive up to the desired disruption percentage. Inactive vertices are able to receive data, but they don’t propagate it. The simulation runs until 100% of vertices successfully receive data originated from a source vertex chosen arbitrarily, or until the number of iterations becomes greater than the number of vertices, indicating that the level of disruption has effectively separated the graph into two isolated subgraphs.

Here’s the method that I implemented for running it for any given graph size and degree:

private static void SimulateReplications(int n, int r)
{
    Console.WriteLine($"Disruption\tIterations");

    for (double perc = 0; perc < 0.5; perc += 0.05)
    {
        var reached = -1;
        int iterations = 1;
        int disabled = (int)(perc * n);

        for (; iterations < n; iterations++)
        {
            var graph = new RandomRegularGraphBuilder().Build(n, r);

            var sourceVertex = 50; // Any
            var item = Guid.NewGuid().GetHashCode();
            graph.Vertices[sourceVertex].AddItem(item);

            graph.DisableRandomVertices(disabled, sourceVertex);
            graph.Propagate(iterations);

            reached = graph.Vertices.Count(n => n.Items.Contains(item));
            if (reached == n || iterations > n)
                break;
        }

        Console.WriteLine($"{perc.ToString("0.00")}\t{iterations}");
    }
}

The results for running it with parameters n = 1000 and r = [4, 8, 12] are given in the chart that follows:

simulation iterations

We can verify that the larger the graph’s degree, the less significant the effects of disruption levels are, which makes sense since intuitively there are much more path options available for the information to replicate.

In the simulation ran with parameters [n=1000, r=12] it was only required two additional iterations (6 in total) at 50% disruption for the source data to reach all graph vertices when compared with the base case in which all vertices were active.

For the lowest graph degree tested with [n=1000, r=4], however, the effects of disruption were quite noticeable, spiking to a total of 42 iterations for 25% disruption and not being able to reach the entirety of vertices for disruption levels above that (that’s why it’s not even plotted in the chart).

After some consideration, I realized that the spike in the number of iterations required for reaching all graph’s vertices in the simulation seems to occur when the chances of accidentally cutting the graph into two isolated subgraphs (while deactivating vertices) increase considerably. This probability can be roughly estimated as (1 - (1 - disruptionr)n), i.e., the probability of deactivating all neighbors of at least one graph vertex.

The following contour plot displays these probability estimates for graphs of size n=1000 for given disruption levels and graph degrees:

disruption contour

Now by analyzing the simulation iteration spikes on top of these probabilities we find that they started occurring when p neared 0.9. It’s important to highlight that as the probability of cutting off graph vertices increases, the number of simulation iterations required for reaching the totality of graph vertices becomes more volatile since, the way my code was devised, a new random regular graph is sampled and the current disruption level is randomly applied at each retrial. Nonetheless, as p nears 1.0, we are certain to end up with at least one disconnected vertex, meaning that we won’t be able to assess a valid number of simulation iterations for the replication to reach the entire graph.

Conclusion

This analysis has shown that from a theoretical point of view random regular graph topologies are particularly well suitable for data replication in applications where communication agents are faulty or cannot be fully trusted to relay messages, such as sensor networks, or reaching consensus in multi-agent systems.

As a result of being maximally connected, with proper scaling high levels of network disruption can be tolerated without significantly affecting the propagation of data among healthy network agents.

Lastly, the topology’s compact diameter favors fast and effective communication even for networks comprised of a large number of participating agents.


Sources

[1] B. Bollobás & W. Fernandez de la Vega, The diameter of random regular graphs, Combinatorica 2, 125–134 (1982).

[2] B. Bollobás, A probabilistic proof of an asymptotic formula for the number of labelled regular graphs, Preprint Series, Matematisk Institut, Aarhus Universitet (1979).

[3] Pu Gao, Models of generating random regular graphs and their short cycle distribution, University of Waterloo (2006).

May 17, 2020 - Sandboxing front-end apps from GitHub with Docker

I’m often required to evaluate simple take home projects from candidates applying to front-end developer positions at our company. In these projects candidates are asked to implement a few UI components that take user input and produce an output, nothing fancy, but enough to get a grasp on the candidate’s familiarity with modern front-end frameworks and to ask follow up questions about his design decisions in the interview that follows.

It’s open to the candidate to choose the front-end framework that he/she is most comfortable with (eg: React, Angular, Vue) for this exercise, and we also ask him to share his solution in a GitHub repository as to facilitate the reviewing process.

So after reviewing a few dozens of these projects I started using a simple sandboxing approach that I’m sharing in this article in order to quickly an easily build and run these apps and assess them in a controlled manner.

Sandboxing requirements

In short, these were the implicit system requirements I took into account:

  • The reviewer (myself!) shall be able to spin up the front-end app with a single command
  • The system shall provide access to app compilation/runtime errors in case of failure
  • The system shall isolate the untrusted front-end app execution for security reasons
  • The system shall support different front-end development environments/configurations

Implementation

The solution was implemented with Docker and is available on GitHub: GitRunner (not a very creative name, I know 🙈). It provides a few Docker images for building and running front-end projects from GitHub, and here’s how it works:

1) First, build the base Alpine Linux image:

cd /images/alpine-git-node
docker build . -t alpine-git-node

2) Then, build the target platform image, for instance:

cd /images/alpine-git-npm
docker build . -t alpine-git-npm

3) Finally execute this docker command pointing to the target GitHub repository:

docker run -d \
 -e GIT_REPO_URL="https://github.com/gothinkster/react-redux-realworld-example-app" \
 -e COMMAND="npm install && npm start" \
 -p 4100:4100 \
 --name sandbox alpine-git-npm

4) And optionally attach to the command screen within the container to see the terminal output:

docker exec -it <CONTAINER_ID> sh
# screen -r

compiled-successfully-screen

(In order to leave the secondary screen back to the container primary shell, type CTRL + A + D.)

Overview

The first two steps for building the target platform Docker image will only need to be executed once per configuration. Every once in a while it may be required to build a new image due to framework updates, for supporting a new configuration.

The third step is the actual “single command” that spins up the front-end app, and receives two custom variables:

  • GIT_REPO_URL: Url of the GitHub repository containing the front-end app source code
  • COMMAND: The front-end app startup command, typically bringing up a development web server

These variables are fed into a simple docker-entrypoint.sh script for cloning the repository and then running the startup command in a secondary screen:

#!/bin/bash
set -e

git clone $GIT_REPO_URL repo
cd repo

screen -dmS daemon sh -c "$COMMAND" >> logfile.log

sleep infinity

The secondary terminal screen is adopted for running the startup command because development web servers are typically terminated if the compilation fails, or the app crashes, which would bring the container down were they run as the container’s main process.

At the end of the entry point script there’s an unconventional sleep infinity command. This way, if the startup command fails, it holds the container up, allowing us to start an iterative bash session within the container to rerun the startup command and assess errors.

Lastly, proper isolation is only achieved when executing this container in a separate virtual machine, since container solutions don’t guarantee to provide complete isolation. According to the Docker documentation1:

One primary risk with running Docker containers is that the default set of capabilities and mounts given to a container may provide incomplete isolation, either independently, or when used in combination with kernel vulnerabilities.

That’s it! After assessing the front-end app the container can be stopped and removed for disposing allocated resources. I’ve been successfully using this straightforward approach regularly, and it’s been saving me some time and effort. I hope this comes useful to you as well.


Sources

[1] Docker security. Retrieved 2020-05-16.