Jan 15, 2020 - Likelihood of autoregression in stock returns

The standard rule for calculating a stock returns monthly volatility is to multiply the daily volatility by the square root of the number of business days in that month, for instance:

Monlthly volatility

This rule is derived from an independent log-normal daily returns model. However, if the stochastic process is considered autoregressive, i.e, having its output variable linearly dependent on its own previous values, then this rule is likely to give us inaccurate volatilities.

Autoregressive behaviour can express itself as a “momentum”, perpetuating the effect of an upward or downward movement, or as a “reversal”, acting against it. The first case can lead to a higher volatility than that calculated by the standard rule, and the later case can lead to a lower volatility.

In this post I evaluate if an autoregressive Gaussian model better describes the behavior of observed daily stock returns, when compared to a non autogressive control model.

The code I share in this article is written in the R programming language, using the following packages: plotly, ggplot2, sfsmisc.

Data Set

Just like in the previous article, I picked three years worth of PETR4 historical log returns to work on, which is the most actively traded stock in B3, the largest stock exchange in Brazil. You can find the source data in the links below:

To generate these files I used a simple parser I built for B3 historical quotes, freely available at their website from 1986 up to the current year, which you can find here.

The Autoregressive (AR) Model

Our AR model is defined as:

AR model


  • μ → is the Gaussian distribution mean
  • σ → is the Gaussian distribution standard deviation
  • ki → is the AR parameter of order “i”
  • p → is the AR model order

Let’s take a moment to analyze it. It is composed of two terms, a stochastic term and an AR term. If p, the model order, is zero, than it becomes a simple non autoregressive Gaussian model, similar to the one I used in my previous post. If p is greater than zero than past outputs are taken into account when calculating its current output, weighted by the ki parameters.

Model Fitting

Since my last post I have sharpened my R skills a bit 🤓, and learned how to use optimization functions for fitting parameters more efficiently (instead of using a brute-force approach). First let me define the autoregressive model likelihood function that will be minimized:

# calculates likelihood for the AR model, and inverts its signal
# k: model parameters ([σ, μ, kp, kp-1, ... , k1 ])
# x: stock returns samples
likelihood <- function(k, x) {
  lke <- 0
  offset <- 21
  order <- length(k) - 2
  for(i in offset:length(x)) {
    xt <- x[i]
    if (order > 0) {
      xt <- xt - sum(k[3:length(k)] * x[(i-order):(i-1)])
    p <- dnorm(xt, sd=k[1], mean=k[2])
    lke <- lke + log(p)
  return (-lke)

Now let’s read the data set and optimize our model’s parameters for varying p orders, starting at zero (non autoregressive control case) up to a twenty days regression:

x <- read.table("PETR4_2019.txt")[["V1"]]

max_order <- 20

res <- lapply(0:max_order, function(n) {
  return (optim(c(0.01, 0.0, rep(-0.01, n)), function(k) likelihood(k, x), method="BFGS"))

The resulting res list will hold, for each model order analyzed, the optimized parameter set and its corresponding likelihood. Let’s check the results, below is the code for plotting the likelihood of each model order analyzed, which will help us find the best performing one:

df <- data.frame(
  x = seq(0, max_order), y = sapply(res, function(e) (res[[1]][[2]]-e[[2]]))

p <- ggplot() +
  ggtitle("Order Analysis") +
  xlab("Order (p)") +
  ylab("Log-likelihood ratio") + 
  geom_line(data = df, aes(x=x, y=y), color='red')

p analysis

Interestingly enough it seems that AR models of higher orders yielded much higher likelihood than lower orders, possibly indicating that stock daily returns is under the influence of an AR effect, but let’s not jump into any conclusion just yet!

So the best performing model was found to have p=20, log-likelihood ratio of 7.0685184 and the following set of parameters:

  • σ → 0.0178999192
  • μ → 0.0010709814
  • [kp, kp-1, … , k1] → [-0.0210373765, 0.0038455320, -0.0233608796, 0.0329653224, -0.0824597966, 0.0177617412, 0.0142196315, 0.0313809394, 0.0519001996, -0.0562944497, 0.0034231849, 0.0255535975, 0.0824235508, -0.0832489175, -0.0863261198, -0.0008716017, -0.0996273827, -0.0920698729, -0.0613492897, -0.0640396245]

Which raises the question:

Is this result statistically significant to confirm an AR effect in the analyzed data?

Well, we will find that out in the next section.

Probability Value

In statistical hypothesis testing, the probability value or p-value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming, in our case, that there is no autoregressive effect in place. If the p-value is low enough, we can determine, with resonable confidence, that the non autoregressive model hypothesis, as it stands, is false.

In order to calculate the p-value initially we need to estimate the observed log-likelihood ratio probability distribution under the hypothesis that our process follows a non autoregressive Gaussian model (control process). So let’s replicate the analysis above under this assumption:

iterations <- 1000   # number of analysis per order
max_order <- 20      # same number of orders as before
num_samples <- 247   # replicating observed data length (1 year)
wiener_std <- 0.018  # Control model standard deviation

res <- lapply(1:iterations, function(z) {
  # generates independent normally distributed samples
  x <- rnorm(num_samples, 0, wiener_std)
  y <- lapply(0:max_order, function(n) {
    return (optim(c(0.01, 0.0, rep(-0.01, n)), function(k) likelihood(k, x), method="BFGS"))
  return (sapply(y, function(e) c(-e[[2]], e[[4]])))

Beware that this code may take a couple of hours to complete running, or more, dependeding on your notebook configuration. It generates 1000 log-likelihood ratio samples for each model order. The box plot below gives us a qualitative understanding of this output:

df <- lapply(0:max_order, function(k) {
  vec <- t(sapply(1:length(res), function(l) {
    return (c(k, res[[l]][1, k + 1] - res[[l]][1, 1]))
  return (data.frame(
    period = vec[,1],
    ratio = vec[,2]

df <- do.call(rbind, df)
df$period <- as.factor(df$period)

ggplot(df, aes(x=period, y=ratio)) +
  ggtitle("Control process analysis") +
  xlab("Order (p)") +
  ylab("Log-likelihood ratio") + 

control process likelihood

This result took me by surprise. As the order of the AR model grows, the better it becomes in fitting samples generated by a non AR model! It sure looks counterintuitive at first, but after a second thought it’s not that surprising at all, since as we add more parameters to the AR model we are actually making it easier for it to mimic this set of generated samples.

Moving forward, I defined a function that calculates the one-tailed p-value from this resulting res list. A margin of error of 3% (95% confidence interval) is expected since 1000 samples were used:

# caluates the one-tailed p-value
# res: pre-processed log-likelihood ratioss
# n: the model order
# lke: log-likelihood ratio whose p-value should be obtained

p_value <- function (res, n, lke) {
  vec <- sapply(1:length(res), function(l){
       return (res[[l]][1, n + 1] - res[[l]][1, 1])
  d_lke <- density(vec)
  i <- findInterval(lke, d_lke[["x"]])
  l <- length(d_lke[["x"]])
  if (i == l) {
    return (0.00)
  return (integrate.xy(d_lke[["x"]][i:l], d_lke[["y"]][i:l]))

Thus, calling this function like so p_value(res, 20, 7.0685184) for our original “best fit” yields a p-value of 0.814. This figure is nowhere near the typical 0.05 threshold required for rejecting the non AR model hypothesis.

Also notice that the log-likelihood ratio alone is not sufficient for determining the AR model order that best outperforms the control model, as we need to look at the p-value instead. I ran the analysis for the previous years as well and the complete p-value results (2017, 2018, 2019) are displayed below:

Order (p) 2017 2018 2019
1 0.000 0.069 0.399
2 0.002 0.195 0.502
3 0.005 0.005 0.420
4 0.000 0.011 0.336
5 0.001 0.000 0.465
6 0.001 0.000 0.338
7 0.003 0.000 0.295
8 0.005 0.000 0.296
9 0.008 0.001 0.360
10 0.014 0.002 0.452
11 0.011 0.001 0.456
12 0.019 0.001 0.479
13 0.024 0.002 0.540
14 0.028 0.001 0.614
15 0.043 0.002 0.661
16 0.014 0.003 0.617
17 0.013 0.004 0.670
18 0.011 0.005 0.725
19 0.016 0.005 0.775
20 0.007 0.005 0.814

Despite of the weak results for 2019, after taking the margin of error into account 17 out of 20 model orders passed the significance test for the year of 2017 and 18 out of 20 model orders passed the significance test for the year of 2018.


I’ve got mixed results in this analysis. If I were to look to the stock’s daily returns in the year of 2019 alone I might conclude that the control model hypothesis couldn’t be rejected in favor of the AR model. However, the results for the years of 2017 and 2018 seem strong enough to conclude otherwise. This may indicate an intermittent characteristic of the AR behavior intensity. Further investigation is required.

Dec 23, 2019 - Stock option pricing inference

I’m a stock market enthusiast 📈, so lately I have spent my spare time on the topic of stock option pricing, and I was curious to find out whether or not I could build a script that accurately reflected real market option prices. It turns out I came close, and learned a lot along the way, which I share in this article.

My analysis uses the Monte Carlo method for pricing options, which from all available pricing models in the literature is the most straight forward, practical one for a software developer to start working with, when compared to closed form, analytic models, such as the Black–Scholes model.

The code I share in this article is written in the R programming language, using the following packages: sfsmisc, plotly, ggplot2.

Data Set

I had to pick a stock to work on as the starting point, so I decided to use PETR4 which is the most actively traded stock in B3, the largest stock exchange in Brazil, and also the stock with the most actively traded options.

On December 17, 2019, while PETR4 was priced at R$ 29.71, I spoted the following european call option prices:

Symbol Strike Bid Ask
PETRA232 22.81 6.92 7.09
PETRA239 23.81 5.93 6.09
PETRA252 24.81 4.98 5.09
PETRA257 25.31 4.48 4.64
PETRA261 25.81 4.01 4.15
PETRA266 26.31 3.57 3.68
PETRA271 26.81 3.12 3.22
PETRA277 27.31 2.67 2.76
PETRA281 27.81 2.24 2.33
PETRA285 28.31 1.88 1.90
PETRA289 28.81 1.52 1.54
PETRA296 29.31 1.19 1.21
PETRA301 29.81 0.92 0.93
PETRA307 30.31 0.69 0.71
PETRA311 30.81 0.52 0.53
PETRA317 31.31 0.38 0.39
PETRA325 32.31 0.20 0.22
PETRA329 32.81 0.16 0.17
PETRA333 33.31 0.12 0.13
PETRA342 33.81 0.10 0.11
PETRA343 34.31 0.07 0.09
PETRA351 34.81 0.06 0.07
PETRA358 35.81 0.04 0.05
PETRA363 36.31 0.03 0.04
PETRA368 36.81 0.02 0.04
PETRA373 37.31 0.02 0.03
PETRA378 37.81 0.02 0.03
PETRA383 38.31 0.01 0.02
PETRA388 38.81 0.01 0.02
PETRA393 39.31 0.01 0.02
PETRA398 39.81 0.01 0.02

The maturity of these options is the same, January 20, 2020, i.e., 20 business days from the recorded prices date. I saved this data into a CSV formatted file that you can find here.

Besides that I also used six months worth of daily PETR4 log returns from 17/Jun/2019 to 16/Dec/2019 (126 samples), which you can find here.

Distribution of Stock Returns

Under the Monte Carlo method, in order to calculate the price of a stock option, first we need a probability distribution model of the stock returns. Let’s take a look at the probability density estimate for our stock:

x <- read.table("PETR4_Returns.txt")[["V1"]]

d_real <- density(x)
df <- data.frame(x = d_real[["x"]], y = d_real[["y"]])

p <- ggplot() +
  ggtitle("PETR4 daily returns density estimate") +
  geom_line(data = df, aes(x=x, y=y), colour="black") 

PETR4 desnity

Because the number of samples is small, we don’t get that fine-looking symmetric density estimate we would like. Instead, what we get is a density estimate somewhat sensitive to trends and macroeconomical events of this six month period. If we were to look at the previous six month period this observed density estimate would certainly show differences.

The next step is to find a model that reasonably fits this observed density. While there isn’t an agreement on which probability density function best describes stock returns, I decided to turn to two notable, widely adopted, basic functions, the Gaussian distribution and the Cauchy Distribution.

Guassian Model

Let’s try to fit a Gaussian density function to our market data, and see what it looks like. First I will define a couple of auxiliary functions:

# generates a truncated gaussian density
dgauss <- function(std, mean, min, max) {
  d.x <- (0:511) * (max - min) / 511 + min
  d.y <- dnorm(d.x, mean, sd = std)
  d.y <- d.y / integrate.xy(d.x, d.y)
  f <- data.frame(x = d.x, y = d.y)
  return (f)

# interpolates data frame
interpolate <- function(df, min, max, n) {
  step <- (max - min) / n
  pol <- data.frame(
         approx(x, y, xout = seq(min, max, by = step))
  return (pol)

# simple numeric maximum likelihood estimation
# (resolution reffers to that of daily log returns)
mle <- function(dist, x) {
  min_x <- min(x)
  max_x <- max(x)
  resolution <- 10000
  n = (max_x - min_x) * resolution
  rev <- interpolate(data.frame(x = dist$x, y = dist$y), min_x, max_x, n)
  indexes = sapply(x, function (x) { return ((x - min_x) * resolution + 1) })
  return (log(prod(rev$y[indexes])))

With these functions I developed a simple brute force script to find the Gaussian distribution standard deviation that best fits the observed density estimate:

max_mle <- -Inf
d_norm <- c()

for(k in seq(0.001, 0.1, by = 0.0001)) {
  dist <- dgauss(k, mean(x), -6 * sd(x), 6 * sd(x))
  val <- mle(dist, x)
  if (val > max_mle) {
    d_norm <- dist
    max_mle <- val

df1 <- data.frame(x = d_real[["x"]], y = d_real[["y"]])
df2 <- data.frame(x = d_norm[["x"]], y = d_norm[["y"]])

p <- ggplot() +
  ggtitle("Probability densities") +
  scale_colour_manual("", values = c("red", "blue")) +
  geom_line(data = df1, aes(x=x, y=y, colour="Observed")) +
  geom_line(data = df2, aes(x=x, y=y, colour="Gaussian"))

Gaussian density

Both densities display a similar shape, but it’s perceptible that the Gaussian density decreases marginally faster than the observed density.

Despite the fact that the Gaussian distribution is widely used in fianacial models, it has some well known pitfalls, namely its inability to encompass fat tails observed in historical market data. As a result it will fail accurately describe extreme volatility events, which can lead to possibly underpriced options.

Cauchy Model

The Cauchy distribution, just like the Gaussian distribution, is a stable distribution, but one with fat tails. Over the years several academic papers have looked into it as an alternative to best describe these extreme volatility events.

So let’s see how it performs. Initially, I defined the following auxiliary function in order to fit it to the observed density.

# generates a truncated cauchy density
dcauchy <- function(s, t, min, max) {
  d.x <- (0:511) * (max - min) / 511 + min
  d.y <- 1 / (s * pi * (1 + ((d.x - t) / s) ^ 2))
  d.y <- d.y / integrate.xy(d.x, d.y)
  f <- data.frame(x = d.x, y = d.y)
  return (f)

The brute force fitting script is the same as the previous one, replacing the call to dgauss with dcauchy:

max_mle <- -Inf
d_cauchy <- c()

for(k in seq(0.001, 0.1, by = 0.0001)) {
  dist <- dcauchy(k * (sqrt(2/pi)), mean(x), -6*sd(x), 6*sd(x))
  val <- mle(dist, x)
  if (val > max_mle) {
    d_cauchy <- dist
    max_mle <- val

df1 <- data.frame(x = d_real[["x"]], y = d_real[["y"]])
df2 <- data.frame(x = d_cauchy[["x"]], y = d_cauchy[["y"]])

p <- ggplot() +
  ggtitle("Probability densities") +
  scale_colour_manual("", values = c("red", "blue")) +
  geom_line(data = df1, aes(x=x, y=y, colour="Observed")) +
  geom_line(data = df2, aes(x=x, y=y, colour="Cauchy"))

Cauchy density

Intuitively, by looking at this plot, it feels that the Cauchy density isn’t as much as a good fit as the Gaussian density. Indeed, the log-likelihood ratio of the Gaussian Model to the Cauchy Model is 7.5107, quite significant.

The general consensus for the Cauchy distribution is that it has too fat tails, which in our case may lead to overpriced options. Nevertheless, it can be very useful for defining a high volatility scenario for risk analysis.

Now that we have our probability distribution models we can go ahead and calculate option prices based on them.

Option Pricing

The technique applied to calculate the fair price of an option under the Monte Carlo method is:

(1) to generate a large number of possible, but random, price paths for the underlying stock via simulation, and (2) to then calculate the associated exercise value (i.e. “payoff”) of the option for each path. (3) These payoffs are then averaged and (4) discounted to today. This result is the value of the option.1

To accomplish that I defined, yet again, more auxiliary functions:

# generates random samples for a probability density
dist_samples <- function(dist, n) {
  samples <- (approx(
  samples[is.na(samples)] <- dist$x[1]
  return (samples)

# calculates the expected value of a probability density
exp_val <- function(dist) {
  v <- dist[["x"]] * dist[["y"]]
  return (sum(v) / sum(dist[["y"]]))

# converts a single day probability density to a n-day period
# (removes any drift from the input density)
dist_convert <- function(dist, n) {
  dd <- data.frame(x = dist$x - exp_val(dist), y = dist$y)
  dn <- density(sapply(1:100000, function(x) (sum(dist_samples(dd, n)))))
  return (dn)

# calculates the price of a call option from a probability density
call_price <- function(dist, strike, quote) {
  v <- (((exp(1) ^ dist[["x"]]) * quote) - strike) * dist[["y"]]
  v <- pmax(v, 0)
  return (sum(v) / sum(dist[["y"]]))

And here’s the final script for calculating option prices based on the fitted Gaussian and Cauchy distribution, for each strike price, and then plot the result against actual (Bid) prices:

stock_quote <- 29.71
days_to_maturity <- 20

df1 <- read.csv("PETR4_Calls.txt", header = TRUE, sep = ";")
strikes <- df1[["Strike"]]

dn <- dist_convert(d_norm, days_to_maturity)
y2 <- sapply(strikes, function(x) call_price(dn, x, stock_quote))
df2 <- data.frame(x = strikes, y = y2)

dn <- dist_convert(d_cauchy, days_to_maturity)
y3 <- sapply(strikes, function(x) call_price(dn, x, stock_quote))
df3 <- data.frame(x = strikes, y = y3)

p <- ggplot() +
  ggtitle("Option Prices") +
  xlab("Strike Price") +
  ylab("Option Price") +
  scale_colour_manual("", values = c( "black", "red", "blue")) +
  geom_line(data = df1, aes(x=Strike, y=Bid, colour="Actual")) +
  geom_line(data = df2, aes(x=x, y=y, colour="Gaussian")) +
  geom_line(data = df3, aes(x=x, y=y, colour="Cauchy"))

Pricing models

As speculated, actual market prices are slightly above the Gaussian model prices (underpriced), and below the Cauchy model prices (overpriced). The Gaussian model came really close to the actual option prices, which makes sense, since it’s likelihhod to the observed density was much higher.

So after analysing this plot I had an idea, since this final script can take any probability density to calculate option prices, what if I used the observed density estimate instead, would it yield better, more accurate option prices?

stock_quote <- 29.71
days_to_maturity <- 20

df1 <- read.csv("PETR4_Calls.txt", header = TRUE, sep = ";")
strikes <- df1[["Strike"]]

dn <- dist_convert(d_real, days_to_maturity)
y2 <- sapply(strikes, function(x) call_price(dn, x, stock_quote))
df2 <- data.frame(x = strikes, y = y2)

p <- ggplot() +
  ggtitle("Option Prices") +
  xlab("Strike Price") +
  ylab("Option Price") +
  scale_colour_manual("", values = c( "black", "red")) +
  geom_line(data = df1, aes(x=Strike, y=Bid, colour="Actual")) +
  geom_line(data = df2, aes(x=x, y=y, colour="Empirical"))

Empirical model

In fact, to my surprise, it did. Visually the lines appear to be on top of each other. The table below displays the standard deviation from the models estimates to the actual market prices:

Model σ
Gaussian 0.0463
Cauchy 0.1369
Empirical 0.0290


The “empirical” model turned out to be the one that best approximates market prices, for this particular stock (PETR4) in the B3 stock exchange. Market option prices are determined by automated trading systems, usually employing much more sophisticated techniques than the one presented here, in real-time, almost instantaneously reflecting price shifts of the underlying stock. Even so, this simplified approach yielded great results, worth exploring more deeply.


  • I submitted this article on Hacker News and it received a lot of attention there. One comment correctly pointed out that the standard technique for fitting probability distributions is to use maximum likelihood estimation (MLE), instead of the least squares regression approach that I originally employed. So I ran the analysis again using MLE, observed improved results for both models, and updated the code and plots.
  • I also fixed a minor issue in the function that generates random samples for probability densities, and recalculated the “empirical” model standard deviation.
  • For reference you can find the original article here.


[1] Don Chance: Teaching Note 96-03: Monte Carlo Simulation

Nov 20, 2019 - Quantum computing, for programmers

Recent claims from Google of achieving “Quantum Supremacy” brought a lot of attention to the subject of Quantum Computing. As a software developer all of the sudden every non-developer person I know started asking me questions about it, what it meant, and if it’s the start of skynet 🤖

I knew only the basic stuff from college, general concepts from quantum mechanics and superposition, but barely grasped how it was actually applied to computing and algorithms. So I decided to study it more deeply and develop a weekend project along the way.

I’m sharing my learnings in this post focusing on the programmer’s perspective of quantum computing, and demystifying the almost supernatural facet the general media gives to it. It’s basically just linear algebra.

My weekend project turned out to be a pet Quantum Simulator, which I made available on github.

The Qubit

Qubit is the basic unit of quantum information. You can think of it as a variable that when measured gives you either true or false, i.e., a boolean value. For instance:

var q = new Qubit(true);


As a variable, you can perform operations on qubits, such as applying the “X gate” to negate its value:

var q = new Qubit(true);

new XGate().Apply(q);


Now things get more interesting, the quibit’s value, or state, can also be in a coherent superposition, being both true and false at the same time, with 50% ~ 50% probability (adding up to 100%). To achieve that we apply an “H gate” (Hadamard gate) to it:

var q = new Qubit(false);

new HGate().Apply(q);

// will print 'true' half of the time, 'false' the other half

To understand what this means we need to take a look at the Qubit’s internal state. It’s actually an array of complex numbers. Considering a single qubit system we can have:

  • [1, 0] → 100% chance of being measured 0 (false)
  • [0, 1] → 100% chance of being measured 1 (true)
  • [1 / sqrt(2), 1 / sqrt(2)] → 50% chance of being measured 0 and 50% chance of being measured 1

OBS: The correct mathematical notation of a quantum state of “n” qubits is a vector of a single column and 2n rows. Here I’m describing how to implement it in code, using arrays.

The first position of the array holds a complex number describing the probability of the qubit being “0”, and the second position of it being “1”. The probability for each outcome is calculated as the complex number magnitude squared.

Each quantum gate is a matrix that operates on the quantum system state. For example, I implemented the “H Gate” as a UnaryOperation in my project:

public class HGate : UnaryOperation
    protected override Complex[,] GetMatrix()
        return h_matrix;

    private readonly Complex[,] h_matrix = new Complex[,]
        { 1 / Math.Sqrt(2), 1 / Math.Sqrt(2) },
        { 1 / Math.Sqrt(2), -1 / Math.Sqrt(2) }

So the “H gate” took the qubit from the [1, 0] state to the [1 / sqrt(2), 1 / sqrt(2)] state through a matrix multiplication, implemented as follows:

public static Complex[] Multiply(Complex[,] matrix, Complex[] vector)
    if (matrix.GetLength(0) != vector.Length)
        throw new InvalidOperationException();

    var r = new Complex[vector.Length];

    for (int i = 0; i < matrix.GetLength(0); i++)
        r[i] = 0;
        for (int j = 0; j < matrix.GetLength(1); j++)
            r[i] += vector[j] * matrix[i, j];

    return r;

You can define different 2x2 matrices to operate on a single qubit, but there’s a catch, quantum gates must be reversible (due to physical world constraints), i.e., applying it twice should reverse its effects:

var q = new Qubit(false);

new HGate().Apply(q);
new HGate().Apply(q);


This reversibility is also required for more complex quantum gates, operating on multiple qubits as we’ll see in the next section.

Quantum Entanglement

The potential of quantum computing becomes more apparent when we start using gates that operate on two or more qubits. Perhaps the most famous binary quantum gate is the “CNOT gate” (CXGate class in my project), which will negate the target qubit if a control qubit is true, otherwise it preserves the target qubit:

var control = new Qubit(true);
var target = new Qubit(false);

new CXGate().Apply(control, target);


The “CNOT gate” defines a 4x4 matrix that is applied to the two qubit system state. Here’s how I implemented it:

public class CXGate : BinaryOperation
    protected override Complex[,] GetMatrix()
        return cx_matrix;

    private readonly Complex[,] cx_matrix = new Complex[,]
        { 1, 0, 0, 0 },
        { 0, 1, 0, 0 },
        { 0, 0, 0, 1 },
        { 0, 0, 1, 0 }

As expected the two qubit quantum system state vector will have length “4” (Length = 2n), representing four exclusive values, and superpositions of them:

  • [1, 0, 0, 0] → 100% chance of being measured 0
  • [0, 1, 0, 0] → 100% chance of being measured 1
  • [0, 0, 1, 0] → 100% chance of being measured 2
  • [0, 0, 0, 1] → 100% chance of being measured 3

To derive the four position state vector from the two qubits, simply perform their tensor product, for instance:

  • [1, 0] ⦻ [0, 1] = [0, 1, 0, 0]

Which is equivalent to |0⟩ ⦻ |1⟩ = |01⟩ using Dirac notation.

Now supose that, before applying the “CXGate”, we apply the “HGate” to the control bit, what would happen?

var control = new Qubit(true);
var target = new Qubit(false);

new HGate().Apply(control);
new CXGate().Apply(control, target);

// What to expect for:
// control.Measure() ?
// target.Measure() ?

We’ve seen before that the control qubit will be in a coherent superposition state, then it will be applied to the “CXGate”, modifying the target qubit. From a classical point of view there are two valid scenarios:

  • Control qubit is 0 and target qubit is preserved as 0
  • Control qubit is 1 and target qubit is flipped to 1

Which one will be? Actually both of them!

If you measure one qubit (control or target - you choose), you have 50% of chance for each outcome, either 0 or 1. But after that the quantum state collapses, and we will have 100% certainty of the next measurement outcome, since there are only two valid states for this system. The value of one qubit dictates the value of the other. The qubits are entangled.

Algebraically this entangled state is describe as:

  • [1/sqrt(2), 0, 0, 1/sqrt(2)] → 50% chance of being measured 0 (|00⟩) and 50% chance of being measured 3 (|11⟩)

There is no possible outcome where values 1 (|01⟩) or 2 (|10⟩) are measured.

Quantum Collapse

Measuring, or “collapsing”, a single quibit is easy, as explained above it has only two possible outcomes, each one with a defined probability. It behaves as a random variable.

As more qubits are added to the quantum system and become entangled, measuring one quibit can have a direct impact on other quibits, collapsing their probability distribution. A practical implementation for measuring a single qubit in a multiple qubit system, extracted from my Qstate class, follows:

int val = Sample();
bool m = BinaryUtility.HasBit(val, pos);
Collapse(pos, m);

Initially it samples the quantum system state vector in full, without changing it, getting an outcome for the 2n system. If we wanted to measure all system qubits at once, we could simply collapse the entire state vector from this sample, however we are only interested in measuring one qubit, leaving others unmeasured.

So after getting a full sample we test for the bit being measured. It could be present in the sample, in which case it will collapse to true, or not present, collapsing to false. Once we get its value we propagate it to the state vector:

private void Collapse(int pos, bool b)
    for (int i = 0; i < v.Length; i++)
        if (BinaryUtility.HasBit(i, pos) != b)
            v[i] = Complex.Zero;

    var sum = Math.Sqrt(
        v.Sum(x => x.Magnitude * x.Magnitude)
    for (int i = 0; i < v.Length; i++)
        v[i] /= sum;

This method will zero out all positions in the state vector in which this quibit has a different value than what was measured. Then it will normalize the vector, making sure the sum of the complex numbers magnitudes squared adds to one.

From a logical point of view this is what “quantum collapse” means, we measured one qubit and propagated its value to the quantum state vector, “zeroing” the probability of all outcomes inconsistent with this measurement.

Beating Classical Computing

Saving the best for last, I will now introduce the Deutsch Oracle algorithm, one of the first examples of a quantum algorithm that is exponentially faster than its classical counterpart on the size of the input.

Its quantum circuit diagram is presented below:

Deutsch circuit

The purpose of this algorithm is to determine if an unknown function f(x) is either balanced or constant. Considering the one quibit case (n = 1) there are two balanced functions, the identity and the negation functions:

Input Identity Negation
0 0 1
1 1 0

And two constant functions, the “set to zero” and the “set to one” functions:

Input Set to zero Set to one
0 0 1
1 0 1

The algorithm circuit will become more clear after we translate it to code, considering an unknown one qubit input function f(x) implemented by a binary quantum gate:

public class DeutschAlgorithm
    public bool IsBalanced(BinaryOperation gate)
        var q1 = new Qubit(false);
        var q2 = new Qubit(true);

        new HGate().Apply(q1);
        new HGate().Apply(q2);

        gate.Apply(q2, q1);

        new HGate().Apply(q1);

        return q1.Measure();

Classically we would require two measurements to find out the answer. For instance, if we query the {0 → 0} input/output pair, we are not sure yet if this is an identity or “set to zero” function.

In quantum computing however, we can find out the answer with a single query! I will not get into the details of the circuit inner workings, only say that it takes advantage of superposition to reduce the number of queries required to get a conclusive answer.

To see the math behind it open the link provided at the start of this section. You can also debug the algorithm executing one of my project unit test classes: DeutschAlgorithmTests.cs

It was really fun learning more about quantum computing and implementing a basic quantum simulator. This was a brief introduction into the subject, which I tried to keep simple, and haven’t covered quantum operations involving imaginary numbers or gone too deep into the math.

If you find any problem in my quantum simulator code feel free to open an issue on github.