In week 4 study, we’ve learned about the clear definition of observational unit(level-1) and the grouping units(level-2), which we have to clarify them in a real experiments. Also, we learned about how to find out fixed effects, random effects, and mixed effects in a real case, in order to determine the best fit model.

Though this week’s reading, I have a deeper understanding of maximum likelihood estimation, which is an important parameter estimation method. Maximum likelihood estimation is a method for estimating the parameters from a model by maximizing the probability or likelihood of the data produced are actually observed. Firstly, I want to clarify the definition for parameters, which are key properties that determine the shape or position of a distribution, for example, mean, variance, slope and so on. Usually, a parameter of a model is an unknown and fixed value, therefore we need to use maximum likelihood estimation method to determine its value. Hence, the logic behind maximum likelihood estimation is to see under which values of parameters, the data produced are most probable to be captured.

When it comes to the problem of how to calculate the MLE, we first need to construct a maximum likelihood function, which is the total probability of observing all of the data. Suppose there is an assumption claiming that the process of generating the data is independent of each other, the total probability of observing the data is just a product of the probability of observing each data point. Next, a log function is used to simplify this function, but the maximum point of the function stays unchanged, since the log function is also a monotone increasing function. By further differentiating this log function, we can get the maximum point for this function, which is also the estimates we aimed to find for the parameters.

A little remark for the word “likelihood” is that sometimes it is interchangeable with the word “probability”;however, in statistics, there is a small difference between them, since the former is corresponding for parameter values and the latter is for the data.

Generally, the most important logic behind MLE is that we want to get the estimates of the parameters by maximizing the probability of observing the whole data and this is the core of MLE.