Markov Chain Monte Carlo Simulation For Airport Queuing Network
Markov Chain Monte Carlo Simulation For Airport Queuing Network
Today we’ll introduce a new set of algorithms called Markov Chain Monte Carlo which won't fall in supervised learning algorithms. In this blog post we will walk through, what Markov Chains are and where we can use it.
We will introduce the main family of algorithms, known collectively as Markov Chain Monte Carlo (MCMC), that allows us to approximate the posterior distribution as calculated by Bayes' Theorem.
In particular, we consider the Metropolis Algorithm, which is easily stated and relatively straightforward to understand. It serves as a useful starting point when learning about MCMC before delving into more sophisticated algorithms such as Metropolis-Hastings, Gibbs Samplers, and Hamiltonian Monte Carlo.
Once we have described how MCMC works, we will carry it out using the open-source PyMC3 library, which takes care of many of the underlying implementation details, allowing us to concentrate on Bayesian modeling.
Markov Chain Monte Carlo Simulation For Airport Queuing Network
Before we drive further let’s look at what you are going to learn by the end of the article.
Markov chain Monte Carlo analogy
Before getting started we’ll try to understand the analogy behind Markov Chains. When we are getting into a learning curve in the field of analytics we have various divisions like first we’ll start with forecasting and then linear regression after we’ll get into classification algorithms which are non-parametric models.
After this curve, we’ll get into Neural Networks like CNN, R-CNN, Auto-encoders so on, and so forth.
Once these are done now we will get into the stage of Markov Chains(MC) and Hidden Markov Chains (HMC) which are purely stochastic models in order to make a statement on random predictions.
Let’s understand it by example,
Decision tree algorithms can say whether they are going to buy it or not. Random forest says what are the various conditions need to be satisfied in order to make a statement. The logistic algorithm can say yes/no statements using sigmoid equations. When getting into CNN they can recognize the images and by using RNN they can do sequential tasking.
Where we use Markov chains
All the above set of algorithms categories are intended to predict the feature using the historical data, But if we want to predict like if you’re sitting in a restaurant and you’re waiting for the waiter in order to take up the order right.
So which algorithm be used in order to make the statement?.
Whether Virat Kohli going to hit a six or not,
which algorithm we need to use? In these scenarios we can’t go into deep learning algorithms, we can’t do forecasting right. So we want to make a statement on the above statements based on the current event we want to make a prediction about the next ball.
So for all these instant based predictions, the only route we have is Markov Chains and Hidden Markov Chains. Markov Chains are one of the powerful algorithms where we’re able to extract or able to make a statement on random events.
For these kinds of events, we prefer Markov Chains. Before we learn about markov chains, we need to learn about bayes's rule.
Let's spend some time now.
Bayes’s Rule
If we recall Bayes’s Rule:
Where:
- is the prior. This is the strength in our belief of without considering the evidence D.
- is the posterior. This is the refined strength of our belief of once the evidence D has been taken into account.
- is the likelihood. This is the probability of seeing the data D as generated by a model with a parameter .
We can see that we need to calculate the evidence P(D). In order to achieve this we need to evaluate the following integral, which integrates over all possible values of, the parameters:
The fundamental problem is that we are often unable to evaluate this integral analytically and so we must turn to a numerical approximation method instead.
An additional problem is that our models might require a large number of parameters. This means that our prior distributions could potentially have a large number of dimensions.
This in turn means that our posterior distributions will also be high dimensional. Hence, we are in a situation where we have to numerically evaluate an integral in a potentially very large dimensional space.
This means we are in a situation often described as the Curse of Dimensionality. Informally, this means that the volume of a high-dimensional space is so vast that any available data becomes extremely sparse within that space and hence leads to problems of statistical significance.
Practically, in order to gain any statistical significance, the volume of data needed must grow exponentially with the number of dimensions.
Such problems are often extremely difficult to tackle unless they are approached in an intelligent manner. The motivation behind Markov Chain Monte Carlo’s methods is that they perform an intelligent search within a high dimensional space and thus Bayesian Models in high dimensions become easy to control.
The basic idea is to sample from the posterior distribution by combining a “random search” with a mechanism for intelligently “jumping” around, but in a manner that ultimately doesn’t depend on where we started from.
Hence Markov Chain Monte Carlo methods are memoryless searches performed with intelligent jumps.
The Metropolis Algorithm
There is a large family of Algorithms that perform MCMC. Most of these algorithms can be expressed at a high level as follows:
- Begin the algorithm at the current position in parameter space.
- Propose a “jump” to a new position in parameter space.
- Accept or reject the jump probabilistically using the prior information and available data.
- If the jump is accepted, move to the new position and return to step 1.
- If the jumps are rejected, stay where you are and return to step 1.
- After a set of a number of jumps has occurred, return all accepted positions.
The main difference between MCMC algorithms occurs in how you jump as well as how you decide whether to jump.
The Metropolis algorithm uses a normal distribution to propose a jump. This normal distribution has a mean value μ which is equal to the current position and takes a "proposal width" for its standard deviation σ.
A normal distribution is a good choice for such a proposal distribution (for continuous parameters), it is more likely to select points nearer to the current position than further away. However, it will occasionally choose points further away, allowing the space to be explored.
Once the jump has been proposed, we need to decide (in a probabilistic manner) whether it is a good move to jump to the new position. How do we do this? We calculate the ratio of the proposal distribution of the new position and the proposal distribution at the current position to determine the probability of moving, p:
About PyMC3
PyMC3 is a Python package for Bayesian statistical modeling and probabilistic machine learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. It is a rewrite from scratch of the previous version of the PyMC software.
Simulation using PyMC3
The example we want to model and simulate is based on this scenario: a daily flight from London to Rome has a scheduled departure time at 12:00 am, and a standard flight time of two hours.
We need to organize the operations at the destination airport, but we don't want to allocate resources when the plane hasn't landed yet. Therefore, we want to model the process using a Bayesian network and considering some common factors that can influence the arrival time.
In particular, we know that the onboarding process can be longer than expected, as well as the refueling one, even if they are carried out in parallel. London air traffic control can also impose a delay, and the same can happen when the plane is approaching Rome. We also know that the presence of rough weather can cause another delay due to a change of route.
We can summarise this analysis with the following plot
Bayesian network representing the air traffic control problem
Considering our experience, we decide to model the random variables using the following distributions:
- Passenger onboarding ~ Wald(µ = 0.5, λ = 0.2)
- Refueling ~ Wald( µ = 0.25, λ = 0.5)
- Departure traffic delay ~ Wald(µ = 0.1, λ = 0.2)
- Arrival traffic delay ~ Wald(µ = 0.1, λ = 0.2)
- Departure time = 12 + Departure traffic delay + max(Passenger onboarding, Refueling)
- Rough weather ~ Bernoulli(p =0.35)
- Flight time ~ Exponential(λ = 0.5 - (0.1 . Rough weather))(The output of a Bernoulli distribution is 0 or 1 corresponding to False and True)
- Arrival time = Departure time + Flight time + Arrival traffic delay
Departure time and Arrival time are functions of random variables, and the parameter λ of Flight time is also a function of Rough Weather.
Markov Chain Monte Carlo Simulation with PyMC3
Even if the model is not very complex, the direct inference is rather inefficient, and therefore we want to simulate the process using PyMC3.
The first step is to create a model instance:
From now on, all operations must be performed using the context manager provided by the model variable. We can now set up all the random variables of our Bayesian network
We have imported two namespaces, pymc3.distributions.continuous and pymc3.distributions.discrete because we are using both kinds of variables.
Wald and exponential are continuous distributions, while Bernoulli is discrete. In the first three rows, we declare the variables passenger_onboarding, refueling, and departure_traffic_delay.
The structure is always the same: we need to specify the class corresponding to the desired distribution, passing the name of the variable and all the required parameters.
The departure_time variable is declared as pm. Deterministic. In PyMC3, this means that, once all the random elements have been set, its value becomes completely determined.
Indeed, if we sample from departure_traffic_delay, passenger_onboarding, and refueling, we get a determined value for departure_time. In this declaration, we've also used the utility function pmm.switch, which operates a binary choice based on its first parameter (for example, if A > B, return A, else return B).
The other variables are very similar, except for flight_time, which is an exponential variable with a parameter λ, which is a function of another variable (rough_weather). As a Bernoulli variable outputs 1 with probability p and 0 with probability 1 - p, λ = 0.4 if there's rough weather, and 0.5 otherwise.
Once the model has been set up, it's possible to simulate it through a sampling process. PyMC3 picks the best sampler automatically, according to the type of variables. As the model is not very complex, we can limit the process to 500 samples:
The output can be analyzed using the built-in pm.traceplot() function, which generates the plots for each of the sample’s variables. The following graph shows the detail of one of them:
Distribution and samples for the arrival time random variable
PyMC3 provides a statistical summary that can help us in making the right decisions using pm.summary(). In the following snippet, the output containing the summary of a single variable is shown:
For each variable, it contains mean, standard deviation, Monte Carlo error, 95% highest posterior density interval, and the posterior quantiles. In our case, we know that the plane will land at about 15:10 (15.174).
This is only a very simple example to show the power of Bayesian networks.
List of parametric and non-parametric Algorithms
Machine learning algorithms can be classified as two distinct groups: parametric and non-parametric
We can classify algorithms as non-parametric when models become more complex if the number of samples in the training set increases. Vice versa, a model would be parametric if the model becomes stable when the number of examples in the training set increases.
In simple terms, we can say parametric has a functional form while non-parametric has no functional form.
Functional form comprises a simple formula like y = f(x). So if you input a value, you are to get a fixed output value. It means if the data set is changed or being changed there is not much variation in the results. But in non-parametric algorithms, a small change in data sets can result in a large change in results.
- Non-parametric models
- Parametric models
List of methods that can perform MCMC
- The metropolis algorithm
- The Metropolis-Hasting algorithm
- The Gibbs sampler
- Hamiltonian Monte Carlo
- No U-turn sampler (and several variants)
Complete Code
Below is the complete code we have explained in the article, you clone the code in our GitHub repo too.
Conclusion
In this article, we learned the basics of markov chain monte carlo, one specific method known as the Metropolis algorithm, how to implement them using PyMC3.
Next Steps
In the coming articles, we’ll further discuss sampling techniques such as Metropolis-Hastings, Gibbs Sampling, and Hamiltonian Monte Carlo.
Recommended Courses
Machine Learning Interview Preparation
Rating: 4/5
Markov Chain Simulation in Python
Rating: 4.5/5
Machine Learing A to Z in Python course
Rating: 5/5
Good intro of MCMC, can you please show us MCMC with real world example. that will help lot rather than standard random variable.
Hi JD,
Thanks for your compliment. In the article, we have introduced how we can apply the MCMC for the airport queuing network but will try to write an article about the practical usage of MCMC with another real-world example.
Thanks and happy learning.