## 2017/2018 Intermediate Microeconomics (Business course at UTS)

Proposed solutions for the tutorials 1-2

Proposed solutions for the tutorials 3

Proposed solutions for the tutorials 4

Proposed solutions for the tutorials 5

Proposed solutions for the tutorials 6

Proposed solutions for the tutorials 7

## Some simple probability formulas with examples

A known relationship that is usually given axiomatically:

$P(B|A) = \frac{{P(AB)}}{{P(A)}}$

Upon rearrangement gives the multiplication rule of probability:

$P(AB) = P(A)P(B|A) = P(B)P(A|B)$

Now observe a cool set up that is handy to keep in mind for proving the law of total probability and Bayes’ theorem.

Imagine that $B$ happens with one and only one of $n$ mutually exclusive events $A_1, A_2,..., A_n$, i.e.:

$B = \sum\limits_{i = 1}^n {B{A_i}}$

$B = \sum\limits_{i = 1}^n {P(B{A_i})}$.

Now by multiplication rule:

$B = \sum\limits_{i = 1}^n {P({A_i})P(B|{A_i})}$.

This is the law of total probability

From the same set up imagine that we want to find the probability of even $A_i$ if $B$ is known to have happened. By the multiplication rule:

$P(A_i B) = P(B)P(A_i|B) = P(A_i)P(B|A_i)$

By neglecting $P(A_i B)$ and dividing the rest through $P(B)$ we get:

$P\left( {{A_i}|B} \right){\rm{ = }}\frac{{P({A_i})P(B|{A_i})}}{{P(B)}}$

And applying the law of total probability to the bottom we have the Bayes’ equation

$P\left( {{A_i}|B} \right){\rm{ = }}\frac{{P({A_i})P(B|{A_i})}}{{\sum\limits_{j = 1}^n {P({A_j})P(B|{A_j})} }}$

Bunch of examples:

Problem: $P_t (k)$ is a known probability of receiving $k$ phone calls during time interval $t$. Also $k=0,1,2,...$. Assuming that a number of received calls during two adjeicent time periods are independent find the probability of receiving $s$ calls for the time interval that equal $2t$.

Solution: Let $A_{b.b + t}^k$ be an event consisted of $k$ call in the interval $b$ till $b+t$. Then clearly

$A_{0,2t}^s = A_{0,t}^0A_{t,2t}^s + ... + A_{0,t}^sA_{t,2t}^0$

which means that the event $A_{0,2t}^s$ can be seen as sum of $s+1$ mutually exclusive events, such that in the first interval of duration $t$ number of calls received is $i$ and in the second interval of the same duration number of received calls is $s-i$ ($i=0,1,2,...,s$). By rule of addition

$P(A_{0,2t}^s) = \sum\limits_{i = 0}^s {P(A_{0,t}^iA_{t,2t}^{s - i})}$.

By the rule of multiplication

$P(A_{0,t}^iA_{t,2t}^{s - i}) = P(A_{0,t}^i)P(A_{t,2t}^{s - i})$

If we change the notation so that

${P_{2t}}(s) = P(A_{0,2t}^s)$

then

${P_{2t}}(s) = \sum\limits_{i = 0}^s {{P_t}(s) \cdot P(s - i)}$.

It is known that under quite general conditions

${P_t}(k) = \frac{{{{(at)}^k}}}{{k!}}\exp \{ - at\} {\rm{ }}(k = 0,1,2...)$

(Recall that the Poisson distribution is an appropriate model if the following assumptions are true. (a) $k$ is the number of times an event occurs in an interval and $k$ can take values $0,1,2,...$. (b) The occurrence of one event does not affect the probability that a second event will occur. That is, events occur independently. (c) The rate at which events occur is constant. The rate cannot be higher in some intervals and lower in other intervals (that kinda a lot to take on faith really). (d) Two events cannot occur at exactly the same instant; instead, at each very small sub-interval exactly one event either occurs or does not occur. (e) The probability of an event in a small sub-interval is proportional to the length of the sub-interval.  Or instead of those assumptions, the actual probability distribution is given by a binomial distribution and the number of trials is sufficiently bigger than the number of successes one is asking about (binomial distribution approaches Poisson).)

Parametrisation then gives

${P_{2t}}(s) = \sum\limits_{i = 0}^s {\frac{{{{(at)}^s}}}{{i!(s - i)!}}\exp \{ - 2at\} } {\rm{ = }}{(at)^s}\exp \{ - 2at\} \sum\limits_{i = 0}^s {\frac{1}{{i!(s - i)!}}}$

Note that

$\sum\limits_{i = 0}^s {\frac{1}{{i!(s - i)!}} = \frac{1}{{s!}}\sum\limits_{i = 0}^s {\frac{{s!}}{{i!(s - i)!}}} = \frac{1}{{s!}}{{(1 + 1)}^s} = \frac{{{2^s}}}{{s!}}}$

Then

${P_{2t}}(s) = \frac{{{{(2at)}^s}\exp \{ - 2at\} }}{{s!}}{\rm{ }}(s = 0,1,2,...)$

The key point is that if for time interval $t$ we have that parametrized formula for $2t$ we have the one above. It holds true for any multiples of $t$ as well.

## A simple fact about sets

Out of $n$ elementary events one can get

$\sum_{m=1}^{n} C_{n}^{m} = 2^n - 1$

possible outcomes. Where $C_{n}^{m}$ is an event that contains $m$ elementary events. Take set

$\{ a,b,c\}$

with the size as the only characteristic $n=3$. Then it power set

$\{ \{ a\} ,\{ b\} ,\{ c\} ,\{ a,b\} ,\{ a,c\} ,\{ b,c\} ,\{ a,b,c\} ,\{ \emptyset \} \}$

contains ${2^3} = 8$ elements. $3$ event for one element each, $C_{3}^{1}$. Then $3$ events with two element, $C_{3}^{2}$. Finally, $1$ event for one with all elements, $C_{3}^{1}$. A emply set is an impossible event.

I personally think that this simple fact is amazing, but some would say it is kinda boring. Here is an interesting question for those.

A pack of cards that has $36$ cards is randomly split equally into halves. What is the probability that halves have equal amount black and red cards?

This is just another set with $36$ elements of two type.

$p = \frac{{C_{18}^9 \times C_{18}^9}}{{C_{36}^{18}}} = \frac{{{{(18!)}^4}}}{{36!{{(9!)}^4}}}$

The denominator indicates all possible equally likely ways the pack can be split.

Instead of computing that manually one can use this asymptotic equality

$n!\ \approx \sqrt {2\pi n} \cdot {n^n}{e^{ - n}}$

Thus

$18!\ \approx {18^{18}}{e^{ - 18}}\sqrt {2\pi \cdot 18}$

$9!\ \approx {9^9}{e^{ - 9}}\sqrt {2\pi \cdot 9}$

$36!\ \approx {36^{36}} \cdot {e^{ - 36}}\sqrt {2\pi \cdot 36}$

Which means

$p \approx \frac{{{{(\sqrt {2\pi \cdot 18} \cdot {{18}^{18}} \cdot {e^{ - 18}})}^4}}}{{\sqrt {2\pi \cdot 36} \cdot {{36}^{36}} \cdot {e^{ - 36}}{{(\sqrt {2\pi \cdot 9} \cdot {9^9} \cdot {e^{ - 9}})}^4}}}$

Simple algebra yields

$p \approx \frac{2}{{\sqrt {18\pi } }} \approx \frac{4}{{15}} \approx 0.26$

The result fascinates me. The graph visualizes data from a real experiment where a pack is split equally $100$ times and $\mu$ is a cumulated sum if exactly $9$ red cards are observed in on of the halves. What is crazy is that we were able to see the results of this experiments without doing any experiments, by simply reasoning mathematically about things.

More on this topic: Гнеденко-1988

## Distribution of a ordered pile of rubble

Imagine a pile of rubble ($X$) where the separated elements of the pile are stones ($x_i$). By picking $n$ stones we form a sample that we can sort by weight. A sequence $x_1,x_2,...,x_n$ becomes $x_{(1)},x_{(2)},...,x_{(m)},...x_{(n)}$, where $(m)$ is called “rank”.

Pretend that we do the following. Apon picking a sample and sorting it we put stones into $n$ drawers and mark each drawer by rank. Now repeat the procedure again and again (picking a sample, sorting and putting stones into drawers). After several repetitions, we find out that drawer #$1$ contains the lightest stones, whereas drawer #$n$ the heaviest. An interesting observation is that by repeating the procedure indefinitely we would be able to put all parenting set (the whole pile or the whole range of parenting distribution) into drawers and later do the opposite — take all stones (from all drawers) mix them to get back the parenting set. (The fact that distributions (and moments) of stones of particular rank and the parenting distribution are related is probably the most thought-provoking)

Now let us consider the drawers. Obviously, the weight of stones in a given drawer (in a rank) is not the same. Furthermore, they are random and governed by some distribution. In other words, they are, in turn, a random variable, called order statistics. Let us label this random variable $X_{(m)}$, where $m$ is a rank. Thus a sorted sample looks like this

$X_{(1)},X_{(2)},...,X_{(m)},...,X_{(n)}$

Its elements $X_{(m)}$ (a set of elements (stones) $x$ from the general set $X$ (pile) with rank $m$ (drawer)) are called $m$ order statistics.

//////////////

Elements $X_{1}$ and $X_{(n)}$ are called “extreme”. If $n$ is odd, a value with number $m=\frac{(n+1)}{2}$ is central. If $m$ is of order $\frac{n}{2}$ this statistics is called “$m$ central” A curious question is how define “extreme” elements if $n \to \infty$. If $n$ increases, then $m$ increases as we.

//////////////

Let us derive a density function of $m$ order statistics with the sample size of $n$. Assume that parenting distribution $F(x)$ and  density $f(x)$ are continues everywhere. We’ll be dealing with a random variable $X_{(m)}$ which share the same range as a parenting distribution (if a stone comes from the pile it won’t be bigger than the biggest stone in that pile).

The figure has $F(x)$ and $f(x)$ and the function of interest $\varphi_n (\cdot)$. Index $n$ indicates the size of the sample. The $x$ axis has values $x_{(1)},...,x_{(m)},...,x_{(n)}$ that belong to a particular realization of $X_{(1)},X_{(2)},...,X_{(m)},...,X_{(n)}$

The probability that m-order statistics $X_{(m)}$ is in the neuborhood of $x_{(m)}$ is by definition (recall identity: $dF = F(X + dx) - F(x) = \frac{{F(x + dx) - F(x)}}{{dx}} \cdot dx = f(x) \cdot dx$):

$dF_{n}(x_{(m)})=p[x_{(m)}

We can express this probability in term of parenting distribution $F(x)$, thus relating $\varphi_n (x_{(m)})$ and $F(x)$.

(This bit was a little tricky for me; read it twice with a nap in between) Consider that realization of $x_1,...,x_i,...,x_n$ is a trias (a sequence generated by parenting distribution, rather then the order statistics; remember that range is common) where “success” is when a value $X is observed, and “failure” is when $X>x_{(m)}$ (if still necessary return to a pile and stone metaphor). Obviously, the probability of success is $F(x_{(m)})$, and of a failure is $1-F(x_{(m)})$. The number of successes is equal to $m-1$, failures is equal to $n-m$, because $m$ value of $x_m$ in a sample of a size $n$ is such that $m-1$ values are less and $n-m$ values are higher than it.

Clearly, that the process of counting of successes has a binomial distribution. (recall that probability of getting exactly $k$ successes in $n$ trials is given by pms: $p(k;n,p) = p(X = k) = \left( \begin{array}{l} n\\ k \end{array} \right){p^k}{(1 - p)^{n - k}}$ In words, $k$ successes occur with $p^k$ and $n-k$ failures occur with probability $(1-p)^{n-k}$. However, the $k$ successes can occur anywhere among the $n$ trials, and there are $\left( \begin{array}{l} n\\ k \end{array} \right)$ different ways of distributing $k$ successes in a sequence of $n$ trials. A little more about it)

The probability for the parenting distribution to take the value close to $x_{(m)}$ is an element of $dF(x_{(m)})=f(x_{(m)})dx$.

The probability  of sample to be close to $x_{(m)}$ in such a way that $m-1$ elements are to the left of it and $n-m$ to the rights, and the random variable $X$ to be in the neighborgood of it is equal to:

$C_{n - 1}^{m - 1}{[F({x_{(m)}})]^{m - 1}}{[1 - F({x_{(m)}})]^{n - m}}f({x_m})dx$

Note that this is exactly $p[x_{(m)}, thus:

$\varphi_n (x_{(m)})dx_{m}=C_{n - 1}^{m - 1}{[F({x_{(m)}})]^{m - 1}}{[1 - F({x_{(m)}})]^{n - m}}f({x_m})dx$

Furthermore if from switching from $f(x)$ to $\varphi_n (x_{(m)})$ we maintaine the scale of $x$ axis then

$\varphi_n (x_{(m)})=C_{n - 1}^{m - 1}{[F({x_{(m)}})]^{m - 1}}{[1 - F({x_{(m)}})]^{n - m}}f({x_m})$

The expression shows that the density of order statistics depends on the parenting distribution, the rank and the samples size. Note the distribution of extreme values, when $m=1$ and $m=n$

The maximum to the right element has the distribution $F^{n}(x)$ and the minimumal $1-[1-F(x)]^n$. As an example observe order statistics for ranks $m=1,2,3$ with the sample size $n=3$ for uniform distribution on the interval $[0,1]$. Applying the last formula with $f(x)=1$ (and thus $F(x)=x$ we get the density of the smallest element

$\varphi_3 (x_{(1)})=3(1-2x+x^2)$;

the middle element

$\varphi_3 (x_{(2)})=6(x-x^2)$

and the maximal

$\varphi_3 (x_{(3)})=3x^2$.

With full concordance with the intuition, the density of the middle value is symmetric in regard to the parenting distribution, whereas the density of extreme values is bounded by the range of the parenting distribution and increases to a corresponding bound.

Note another interesting property of order statistics. By summing densities $latex \varphi_3 (x_{(1)}), \varphi_3 (x_{(2)}), \varphi_3 (x_{(3)})$ and dividing the result over their number:

$\frac{1}{3}\sum\limits_{m = 1}^3 {{\varphi _3}({x_{(m)}}) = \frac{1}{3}(3 - 6x + 3{x^2} + 6x - 6{x^2} + 3{x^2}) = 1 = f(x)}$

on the interval $[0,1]$

The normolized sum of order statistics turned out to equla the parenting distribution $f(x)$. It means that parenting distibution is combination of order statistics $X_{(m)}$. Just like above had been mentioned that after sorting the general set by ranks we could mix the sorting back together to get the general set.

## Why so many big lines into terrible restaurants…

It must be a good restaurant since the line is so long. Hm… you are likely just failed to update your beliefs in a rational way.

Imagine you are in a classroom and there is an urn with three balls in front of everyone. You don’t see the colour of balls, but you do know equally likely it could be majority blue (2 blue 1 red) or majority red (1 blue 2 red). Since you don’t know which urn exactly is there (true state of the world) you need some evidence before making a guess. Now every person in class one by one come and pick one ball from the urn and without showing it announces his choice. Believe it or not, but this is your restaurant choice situation.

Two possibilities for the urn is an analogue to whether this restaurant good or bad. A person that comes to make a choice has several pieces of information to combine. Taking one ball from urn is the same as if you have read some review about the restaurant before. The information is not perfect, the reviews could be biased or not representative for your taste. However, you also observed the choices of people before you. You do not know their private signal (what ball they picked from urn, i.e. what was their conclusion after studying the restaurant reviews), but you do know their choices.

Claiming that the restaurant must be good because the line is long would be true only if all people that come sequentially followed only their private signals. Then when your time has come to make a choice the line indicates independent draws of balls from the urn. If it the true state of the world was that the urn is majority blue you would have much more people that say so.

The thing is that those draws are clearly not independent. At some point, a person that has a private signal that states the urn is majority blue might see too many people choosing majority red and he will abandon his private signal and follow the crowd. So that when it is your turn to make a choice and you observe a line (i.e. heaps of people claiming their choice) it does not necessarily mean that the restaurant is good. Put differently, you do not account for correlation of public beliefs (a belief based on the observed choice before seeing your private signal) and private signals.

Well that is herding. And here is a presentation about it….

If that stuff sounded crazy awesome then read this and in the very very end this

It is obviously not about restaurants at all, it could be a choice of major for a college degree. Is being a doctor a good choice or not? There is no way to know for sure, you just have to combine your private signal with the public belief. If you don’t have a strong private belief, then it will be overwhelmed by the public belief and you just follow the crowd. It also could explain why in Russia or Germany during good times aaalll people would put out Nazi flags outside or put Stalin’s portrait on the wall at home and office. Or pretty much anything that involves guessing the state of the world by combining information from your guess and choices of others.

## A practical advice on non-parametric density estimation.

Always start from the histogram, any non-parametric density estimation methods are essentially fancier versions of a histogram.

Compare the problem of choosing and optimal size of bins in histogram with choice of h in kernel estimator

The point of the exercise is to reveal all features of data; and that what important to keep in mind.

And now take a look at a perfect application of the idea in

Nissanov, Zoya, and Maria Grazia Pittau. “Measuring changes in the Russian middle class between 1992 and 2008: a nonparametric distributional analysis.” Empirical Economics 50.2 (2016): 503-530.

Going back to advice: keep in mind that you doing it to reveal features of data and it has to be strictly more informative than a histogram, otherwise the computational costs are not justified.

## Spatial competition… and what science is really about.

Check my presentation on an empirical model of firm entry with endogenous product-type choices. (here)

A normal reaction to the presentation’s topic should be “whaat? why would anyone want to do this stuff for a living?”. It is a great question, I don’t have an answer to it. It is indeed viciously technical and deadly boring.

But I do have something really cool to share. Back home I was driving my 15-year-old niece to a museum and failed to find a humanly understandable combination of words to explain what science is. So now you check this combination of words, I think it is a really cool fit….

A human eye is able to capture a quite limited portion of light wave spectrum (Visible spectrum). We are unable to travel in time or reach most of the planets in the galaxy. Yet there is no need to be able to physically see the whole light wave spectrum to actually “see” it. And you do not need to be able to travel in time to “see” the past, just like you do not need to be able to travel to another planet to “see” that planet. Here is a cool angel on it. An information integration theory of consciousness, an exceptionally creative idea that, if appreciated properly, will blow your mind.

Human bodies have an enormous amount of systems like no other living being. We feel temperature, objects, we see and hear, feel emotions like fear, shame, happiness etc.. Our brain integrates all of this information from all the systems into a sense of reality. Put differently the reality as seen by a person is but an aggregated sensations from a set of systems, which continuously register information. Think about a feeling of pain. Pain is your body’s language. If your body needs attention from you, it sends a signal. However, the signal has only one dimension, it is kind of like a baby cry. Baby can only change the intensity of a cry but it is your job to give to this cry an interpretation. Your brain does the same. (To be more precise you do it yourself but unconsciously, it is one of that automatic processe, kinda like intuition) A conscience, or a capacity to separate yourself from other things, is just another trick of your brain. Instead of giving you a row information from systems that systematically aggregate information it gives you interpretation. Instead of overwhelming you with tonnes of sensations brain gives you a meaning of them. The reality is a brain’s interpretation of the aggregation of information from a number of systems that supply raw data.

Holy bologna!! But is it not what science is? Yes, indeed. Science is nothing but a natural extension of a process that your body does almost automatically. Aggregating information from systems that continuously register information and assign meaning to them (there is also this thesis that mathematics is nothing but common sense, a quite dense at times. I’ll see if I can make this post compact and readable enough if I do I’ll give you that idea as well)

It is also interesting to look at people’s temperaments. The system integrator (our brain, our consciousness) assigns different weights to different system’s from which it gets information. That’s why sometimes we observe people who are always scared or calm, sympathetic or cold. Of course, there are other things that define character, or predilection to specific kinds of decisions, such as upbringing and genetics, yet the system integrator has the last word.

The point is our brain is capable to aggregate information from many many systems that supply information than physiological limits dictate.

In some sense, our brain is a prisoner of our physiological systems. So one way to say is science is setting your brain free. Seeing and thinking are the same thing when your eyes are closed. Put different things that we physically see here or feel is just a little fraction of what we potentially can see if we allow our brain to aggregate information and assign meanings from much wider systems that continuously register information. The sense of reality, conscience, is a computational shortcut. Because otherwise your brain would be overwhelmed with information.

In fact, any meaning is a computational shortcut that only your brain requires. The objective reality exists as an enormous mostly meaningless set of data. Life exists only because it can, asking for the meaning of life is the most idiotic question of all. Meaning itself is senseless it is nothing but a trick of your brain to aggregate information easier (It sounds really weird… hm… I probably should wrap up with this one, better do another post).

P.S. To survive people developed a capacity to form groups very quickly (morality) and to make decisions in uncertainty very quickly. A sense of reality, or consciousness, is sort of a “sufficient statistics”. For the decision at hand (to survive) we can form one parameter, a meaning, that would contain all useful information from the data that surround us. It economized on computational requirements and minimizes the risk of a mistake (sometimes a cost of a mistake is your life)