Paper Example: “Mixture Distribution”


Mixture of normal distributions has been considered by variety of mathematicians as it has a long history in the field of statistics. The first ever person to use the concept of mixture of normal distribution was Newcomb. The idea was used to handle large files.
Owing to increased importance of the normal distribution and mixture of normal distribution the various researches have been conducted for exploring the domain. Mixture of normal distribution accommodate variations such as non- normality variables etc to measure the different models. Various tools have been devised to fit the mixture of normal distribution. Among tools and techniques developed for fitting the mixture of normal distribution include mixtool packages. Mixtool is one of the packages in R program that upon identifying the peak of the plotted area and then adjusts standard deviation with mixing percentages to adjust width and height of the plotted area. The project would focus on fitting logarithm of return (log-return) with the mixture of normal distributions along using mix tools package of R. In addition, the application of mixture of normal distributions would also be the concern of the analysis. This will significantly help to evaluate and observe the process of fitting log-return with a mixture of normal distributions.
The underlying document develops the proposal for the research on the mixture of normal distributions with log returns. For the purpose, document contains details about the areas of research. The proposal determining the problem statement leads to setting the aims and objectives to be achieved with the study. The proposal also provides references from academic literature and sets the methodology to be employed for achieving the study objectives. Log return is the most prominent practice used by variety of people instead of price or raw return. The benefits associated with the use of Log-return have made the practice unique to various users in various ways such as determining all variables in a comparable metric, that help in evaluation of analytic relationships between variables despite of coming from price series of unequal values. The long-return distribution entails the use of hypothesis that focuses on describing the distribution of change in price through variety of normal contribution under the same mean but with different variances. The departures could be explained with the help of normality that highly relies on the use of mixtures of distributions.

Geyser data refers to data collected and analyzed concerning volcanic eruptions in Yellow stone National park, Montana. This data comprises 272 pairs of measurements of different time intervals between successive eruptions and the time taken for the subsequent eruption. This research project would highlight the concept of log-return with the mixtures of normal distribution. Also, the research will discuss the process to fit log-return by mixtools package. In order to do so, the data sets used in the project are daily, weekly and monthly closing prices from January 2003 to July 2012 for Australian stock market indices. Moreover, the application of mixture of normal distributions would be use to analyses geyser data. Hence, the eruption time series and waiting time series will be assessed to be fitted in the normal distribution using mix tools in R.

Nature of Problem
Importance of long returns has gained increased attention. Mixture of normal distribution has offered greater variation incorporation for understanding different trends in data such increased skewness, kurtosis and so on. These incorporations have made available with traditional an expectation–maximization (EM) algorithm to contemporary mixtools. Hence, the underlying research is concerned for problem, of fitting log returns to normal distribution using contemporary techniques of mixture of normal distribution.
On the other hand, geyser data have taken part in the data collection and analysis continuously beginning 1985 by several researchers and geologists. Some of the duration time taken was classified in the levels of long, short and medium. A puzzling situation arises when dealing with discontinuous data with time series features. Significant similarities have been recorded in different results of the collected data analyses. Therefore, in this research will look at analyses this data by mixtools package.

Aims and Objectives
The aim of the project would be to elaborate the model that fits the log-return with a mixture of normal distribution. The research proposal’s aim would be to provide relevant answers to the following questions;
• What is the concept of log-return with a mixture of normal distribution?
• What is the process of fitting log-return with a mixture of normal distributions?
• How does mix tools package explains the geyser data?
• How mix tools package makes it possible to apply mixture of normal distributions?
• What is the possible way of fitting geyser data with mixture of normal distribution?

Literature Review
Normal distribution is one of the most prominent and commonly used models to analyze the daily changes in market variables. Different research studies have revealed that the concept of return in equity, foreign exchanges, and the commodity markets are consistently analyzed with fat tails. It has also been indicated that the concept and assumptions associated with normality are often inappropriate that eventually leads to flaws in the findings . Additionally, the concept of mixture of normal is quite flexible in analyzing the daily changes in the market data. The kurtosis and skewness in the market are the dominant variables that are taken into account with such mixtures. In addition, it has also been revealed that the normal distribution, in particular, is a special case in terms of mixture of normal distributions.
Why do we use log return? While investing, it is normal to think of the worth of investment in future, a bank account paying compound interest provides a model to explore the concept of log-return. The benefits of using log return versus the prices are simply because of normalization: this means measuring the variables in a metric that is more comparable. However, the main concept is that we are interested in maximizing the long-term growth. The long term log- return per unit time implicitly takes into account the risk of depletion of the capital with time.

The concept of normal distribution could be described as the distribution of data that is symmetrical. In addition, the concept could also be explained as the distribution of data that forms a bell shaped curve when plotted. On the other hand, Kurtosis is the measurement of peak i.e. high or flat, while the concept of skewness focuses on the measurement of symmetry in terms of data. This eventually indicates that the mixture of normal distributions can easily be created by adding variety of normal distributions with different kurtosis and skewness at the same time. In addition, it has been identified that the mixture of normal distribution can significantly accommodate the characteristics and even the non-normality of the data.
The mixture of normal distributions is quite flexible method to analyze and model wide variety of random phenomena due to which, the concept of mixture of normal distributions has remained the focus of variety of users. Moreover, the concept of mixture of normal distribution has also been playing an essential role in marketing, economics and finance.
Mixtools package is also a prominent concept that focuses on the examination of sample of measurement to evaluate the subgroups of individuals associated with the sample. Further, the finite mixture model focuses on the examination of subgroups rather than the identification of individuals to those subgroups. One of the essential packages is known as ‘R’ mix tools. The aim of the package is to estimate the centers of the peak in the curve that are then termed as the means of the distributions. After doing so, the standard deviations are adjusted with the mixing percentages to match the width of the peak along with the height. The algorithm that is consistently used in the package is known as the expectation-maximization (EM) algorithm. The formula that is used in the mix tools models is as follows;

With the help of the formulae, the normal values of one data could be significantly compared to the normal values for the second deviation. With the use of same data different values could be gained which would either be positive or negative.
Kamaruzzaman, Isa and Ismail (2012) tested the normal distribution mixture concept on the financial data from FTSE Bursa Malaysia Composite Index (FBM KLCI). The test attempted to explore the explanation of non-normality and asymmetry as essential characteristics of financial data using mixture of normal distribution approach. The results revealed that with mixture of normal distribution can also explain the trends of leptokurtic data. In addition, the test also indentified the fact that skewed data can also be dealt with mixture of normal distribution. Kamaruzzaman, Isa and Ismail (2012) also employed the Maximum Likelihood Estimation (MLE) for fitting in two component mixture of normal distribution to the data sets of log-stock returns adopted for the research.
There are mixed views about assessment predictability of the geyser data. Haordle (1991) believes that geyser data predictions are no longer important as it used to be in earlier times. Despite debate of its usefulness, Venables and Ripley (2002) have developed software that provides the automatic prediction of the data eruption time.This allows for the analysis of the recorded data for each day as well as the median intervals for data recorded in one week. This is more typical as compared to the arithmetic average as a result of the shifting mean of the values recorded for different eruption times. These data aids in developing models and trends of the eruptions with respect to time. The predictive nature of the geyser indicates the basis of the name “old is faithful.” However, in recent times, other researchers indicate variance in the predictions made from prior investigations as a result of changes in the eruption over time.
The concept of log-return is also an essential concept to identify the mixture of normal distribution. The use of log-return provides the users with variety of benefits. It has been observed that the use of log-return provides the individual with an ability to observe changes in the variable that can be directly compared with other variables that have different base values.

Quantitative data from the Australian stock market indices will be used, different companies in the stock exchange will form part of the study, for instance, data collected from the Australian stock exchange market in NAT. BANK FPO (NAB.AX), ASX Limited (ASX.AX) and ASX. FPO. The mean and the weight together with their associated standard deviation will be considered in the analysis, if the standard deviation decreases by a particular margin, then it would be a low season, the reverse would be true.

The methodology that will be used in the study of geysers is the quantitative research methodology. In this particular case, the use of electronic monitoring using data logs will be used to capture these data. The data loggers for the geysers tend to record the temperature of runoff at a point approximately 20 meters towards west from the vent. The sensor will be used to capture preplay as soon as the eruptions begin, considering a delay of between 1 and 10 minutes from the visual minutes captured in OFVC logbook. The position of the sensor will be ensured to be far away from the geyser to the extent to which the temperature trace cannot influence determination of the geyser . The methodology described has been used time and again in other similar works and has proved to bear fruits. After timing the duration of the eruptions, a regression analysis will be used to find the expected ranges before the next eruption. However, better analysis will be deployed through the use of mixtools analysis, especially since it’s impossible to extract data for regression from the logger.
A simple plot of time versus interval will be used, to show rough changes, however, more details concerning the long-term variation in the behavior of geysers will be predicted using moving median graph, it is essential especially because it removes the very short and the very long intervals, giving a better picture of the behavior, additionally, the median intervals tends to remain fairly steady with a bit of fluctuations. Another method that will be used to study the long term behavior is to plot a graph of the distribution of the intervals over the different years.

Geyser data will particular concern the volcanic eruptions in Yellow stone National park in Montana, these data will comprise of 272 pairs of measurements or a range of intervals, concerning different intervals of time between successive eruptions, and the subsequent eruptions. After this data is collected, mixtools packages will be used to fit mixture models. In this process, eruption time will be recorded, and an increase in weight recorded, this will be used to indicate the high regime in the eruption time, the use of mean could also be used, an increase in the mean would mean an increase in the regime of eruption, this will also be accompanied by increased standard deviations. Waiting time will then be recorded, and the low regime in the waiting time, will be indicated by a decrease in the weights. This is similar to the changes in the mean values that will be recorded.
The mixtools packages in R will only consider a Univariate normal mixture analysis; this resembles to a mixture from the Univariate Gaussian family, with 75 iterations, to form a more consistent and representative, consistent and internally and externally valid results. The standard deviations used will be 4, and mean 55.
The j-th component density has a mean of µ and ?2 variance. The steps for this method for the mixtures of Univariate normal is straight forward, and can be easily applied in this particular case, the normalmixEM function is implemented in the mixtools.
Number of iterations= 75

Expected Results
A significant drop in closing price by the Australian stock exchange is expected, these are considering the low regimes in the stock values in the first normal values. The second normal values are a high means with the low variances, indicating balance in the distribution of the low regime. The expected results in geysers are that the eruption time and the waiting time would assume the same relativity in the level of similarity. It is expected that the two sets would indicate different regimes, with a general increase, the data in the tables will be very relevant in indicating these results.