Wednesday, July 9, 2014

Probability in business problems

FREE SERVICE (unless I get overwhelmed by requests).
Please email me at millertrader@gmail.com with your problems of the type below:
1)You are having breakdown problems at a rate of 12 every 20 working days. You want to know what the probability is of there being more than 2 breakdowns in any particular day. You also want me to calculate the probability that there will be more than 5 days in 240 working days with more than three breakdowns on each of the five days.
2) Forecasting: I have also developed a forecasting computer program for spreadsheets that checks, using a "significant values of the correlation coefficient" table that there is sufficient correlation for a forecast and also gives a confidence interval for the value that has been forecast. This method is standard statistics, but appears to be lacking in some spreadsheets.
3) Your machine breaks down randomly at an average rate of 3 times a year. What is the probability that it will not break down in the next 6 months? (The answer is 0.2231. There is only a small chance it will not break down in the next 6 months).
4) My spreadsheet gives me a correlation coefficient of 0.445. I have 20 entries in one column. Is there significant correlation so that I can make a good forecast? Answer: We can say that there is significant correlation at the 5% level of significance, but we cannot say there is highly significant correlation, because, at the 1% level (stricter than the 5% level) of significance we cannot say there is significant correlation. I use tables or my computer programs to give me theses results.
5) On average your business gets 40 phone calls a week. You put an advert in the paper and receive 52 phone calls the next week. Is this due to chance (and not the advert)? Rough estimation: If your average is 40 per week the chances of getting more than 50 in a week (if nothing has changed) is 0.05263 (ie 5.263% probability). It is unlikely you will get more than 50 if circumstances have not changed.
6) You want to start a business in some area. You randomely choose some people in area A and ask them if they would buy from your business. You do the same in area B. Now you can test to see if there is a significant difference in support comparing area A and area B. You can do this by using a comparison of two proportions test (I will do the calculations for you).
7) You want to know what percentage of your bond you still have to pay after 3 years of paying into it if the interest charged is 5% and you are paying it off over 20 years.
8) You want to know what interest you must earn every year to double your money over 7 years.
FOR THE MORE MATHEMATICALLY MINDED: One can use the comparison of two proportions test having the null hypothesis as "The proportions are the same." There are also other tests. Now choosing a 5% level of significance and having "The proportions are the same" as the null hypothesis can be illustrated as follows: Say it is election time and someone takes a survey of the whole city A (expensive survey) and finds that the proportion of people that will vote for candidate Bill is p1. He goes to city B and finds, after surveying the whole city, that the proportion in city B that will vote for Bill is also p1. Someone else who does not want to spend so much money on a survey takes a small sample in city A and a small sample in city B and finds that the proportions are p2 and p3. We now test to see if at the 5% level of significance the proportions p2 and p3 are significantly different. If repeated small samples are taken, after doing the mathematics, using tables, the mathematics will tell us to reject the null hypothesis 5% of the time, based on the samples we take. So we may formulate it like this: If the proportions are actually the same (we could find this by taking a survey of the whole city in both city A and city B and have this coincidence), then the mathematics done on our small samples will tell us to reject the null hypothesis 5% of the time. In other words, if the proportions are actually the same (this is the null hypothesis), our samples (after doing the mathematics) will tell us to reject that they are the same 5% of the time. At the 1% level of significance, if the null hypothesis (that the two proportions are the same) is true, the samples we take will tell us to reject the null hypothesis 1% of the time. In other words, the probability of rejecting the null hypothesis, if it is indeed true, is 5% at the 5% level of significance and is 1 % at the 1% level of significance, based on our small samples and the mathematics done using the small samples. Expressed another way: The probability we will reject the null hypothesis, if it is true, is the probability that we will get small samples that, after doing the mathematics, tell us to reject it. I SAY SMALL SAMPLES IN THE SENSE THAT the sample size will almost always be far smaller than the population size because huge surveys are expensive.
Example for the mathematically minded and not so mathematically minded: 12 breakdowns every 20 working days. a) What is the probability of more than 2 breakdowns in a day? b) What is the probability of more than 5 days in 240 working days where I will have more than 3 breakdowns on each of those 5 days? Answer: a)  P(X>2) on any day = 0.02312. 
b) P(X>3) for more than 5 days in a total of 240 working days is 0.0001835. I used distributions and my programs to calculate. You might want to see if you agree.
FORECASTING. Are you getting a good forecast? SPREADSHEET EXAMPLE: I have the following figures in column A: 1,2,3,4 In Column B:1,4,3,3. Then I ask the spreadsheet what the correlation coefficient is. It says 0.51. I ask it to find the forecast value for x=8 and it tells me 5.26. Now if you look at significant values of the correlation coefficient (a table) you see that the correlation coefficient should be as high as about 0.9 for the data to be even considered as reasonably correlated (in the table df=2, because n=4 (4 numbers in each column) and df = n-2 = 4-2 = 2). I have written a computer program that will tell you that r is far too small for a reasonable forecast and also gives a confidence interval for the forecast. If one makes a bad forecast it could cost millions. See below: if you are in big business you will often want a forecast (for example when I have 30 lines I get twice as much business as I do when I have 12 lines. What if I have 45 lines?). Now this forecast depends on the correlation coefficient r, but when I looked to see if there was a function to determine whether there was sufficient correlation to do a forecast in the first place I could not find such a function. Thereafter I wrote a free program (freeware) that can be used to determine if sufficient correlation is there and also gives a confidence interval for the forecast value. I called the program CorrForecastInterval and it may be downloaded free at https://groups.yahoo.com/neo/groups/schoolfreeware/info It requires that you put in six values from your spreadsheet and then tells you about the extent of correlation and gives a confidence interval. For those interested, a forecast is done using y = a +bx with b = r(sx/sy),showing the forecast depends on r. The confidence interval depends on the residual standard deviation (RSD) which also is affected by the value of r.