A Statistical study of Walgreen’s

Statistics is the study of how to collect, organizes, analyze, and interpret numerical information from data. The prerequisite for statistical decision making is the gathering of data. First, we need to identify the individuals or objects to be included in the study; this becomes the raw data of the study. The data must then be organized. We can organize it by drawing charts, finding the measures of central tendency and variation, and finding its linear regression and correlations. We can then begin to analyze the data. Finding: Probability and Probability distributions, z scores, areas, and p values, using the central limit theorem helps us analyze the data. Now the data is ready to be interpreted, and conclusions can be made about the data and the results of its organization and analysis.

Collection of Data

Data was collected from 27 Walgreen franchises. These franchises are the individuals of the study. Each store has six different quantitative variables associated with them: Annual net sales, Number of square feet, Inventory, amount spent on Advertising, size of sales district, and number of competing stores in their district. The level of measurement of all the variables in the data is at the ratio level. It is assumed that the sample of data is a simple random sample of Walgreen franchises.

Organization of Data Description Raw Data All Greens Franchise The data (X1, X2, X3, X4, X5, X6) are for each franchise store. X1 = annual net sales/$1000 X2 = number sq. ft./1000 X3 = inventory/$1000 X4 = amount spent on advertizing/$1000 X5 = size of sales district/1000 families X6 = number of competing stores in district x7 = store number X1 X2 X3 X4 X5 X6 x7 231 3 294 8.2 8.2 11 1 156 2.2 232 6.9 4.1 12 2 10 0.5 149 3 4.3 15 3 519 5.5 600 12 16.1 1 4 437 4.4 567 10.6 14.1 5 5 487 4.8 571 11.8 12.7 4 6 299 3.1 512 8.1 10.1 10 7 195 2.5 347 7.7 8.4 12 8 20 1.2 212 3.3 2.1 15 9 68 0.6 102 4.9 4.7 8 10 570 5.4 788 17.4 12.3 1 11 428 4.2 577 10.5 14 7 12 464 4.7 535 11.3 15 3 13 15 0.6 163 2.5 2.5 14 14 65 1.2 168 4.7 3.3 11 15 98 1.6 151 4.6 2.7 10 16 398 4.3 342 5.5 16 4 17 161 2.6 196 7.2 6.3 13 18 397 3.8 453 10.4 13.9 7 19 497 5.3 518 11.5 16.3 1 20 528 5.6 615 12.3 16 0 21 99 0.8 278 2.8 6.5 14 22 0.5 1.1 142 3.1 1.6 12 23 347 3.6 461 9.6 11.3 6 24 341 3.5 382 9.8 11.5 5 25 507 5.1 590 12 15.7 0 26 400 8.6 517 7 12 8 27

Raw data charts

Measures of Central Tendency and Variation Annual Sales ( 1000’s of dollars) # of Sq Ft ( 1000’s) Inventory ($1000’s) Ad $'s spent (1000’s) Size of district (1000’s of people) # of Competition Mean 286.6 3.3 387.4 8.09 9.7 7.7 Median 341 3.5 382 8.1 11.3 8 Mode #N/A 1.2 #N/A 12 16 12 Standard Deviation 192 2.01 191.16 3.77 5.14 4.89 Range 569.5 8.1 686 14.89 14.69 15 Minimum 0.5 0.5 102 2.5 1.6 0 Maximum 570 8.6 788 17.39 16.29 15 Sum 7737.5 89.8 10462 218.69 261.69 209 Count 27 27 27 27 27 27 Mean is the average of all the numbers Median is the central value of the distribution Mode is the value that occurs most frequently Standard Deviation is how the data differs from the mean Range is the difference between the largest and smallest values of the distribution Minimum means, what is the lowest number in the distribution Maximum means, what is the highest number in the distribution Sum is all the numbers in the distribution added together. Count is the number of numbers in the distribution.

Regression and Correlation Paired data and scatter diagrams

All Scatter plots are correlated linearly. The “Linear trend line” is the line of least-squares. Analyzation of Data Probability and Probability distributions

Normal Standard Distributions vary from on another in two ways: the mean may be located anywhere on the x axis, and the bell shape may be more or less spread according to the size of the standard deviation. Because of this we use a formula, z= (x-mean) / (standard deviation), to compute the “Z Values” of the “X Values”. This tells us the number of standard deviations the original measurement is from the mean. When using z scores the mean of the original distribution is always zero standard variations from it (i.e. we set the mean to zero).

Z scores and raw scores. Z Values of Annual Sales x value z value z value ascending Area left Area right 231 -0.28936 -1.489490295 0.0694 0.9306 156 -0.67985 -1.44002703 0.0749 0.9251 10 -1.44003 -1.413993733 0.0793 0.9207 519 1.21016 -1.387960435 0.0838 0.9162 437 0.78322 -1.153660758 0.1251 0.8749 487 1.04355 -1.138040779 0.1292 0.8708 299 0.0647 -0.981840994 0.1635 0.8365 195 -0.4768 -0.976634335 0.166 0.834 20 -1.38796 -0.679854743 0.2514 0.7486 68 -1.13804 -0.653821446 0.2578 0.7422 570 1.4757 -0.476795023 0.3192 0.67 428 0.73636 -0.289355281 0.3897 0.6103 464 0.9238 0.064697565 0.4761 0.4761 15 -1.41399 0.283377264 0.6103 0.3897 65 -1.15366 0.314617221 0.6217 0.3783 98 -0.98184 0.574950196 0.7157 0.2843 398 0.58016 0.580156856 0.719 0.281 161 -0.65382 0.590570175 0.7224 0.2776 397 0.57495 0.736356641 0.7673 0.2327 497 1.09562 0.783216576 0.7823 0.1894 528 1.25702 0.923796383 0.8212 0.1788 99 -0.97663 1.043549551 0.8508 0.1492 0.5 -1.48949 1.095616146 0.8621 0.1379 347 0.31462 1.147682741 0.8729 0.1271 341 0.28338 1.210162655 0.8869 0.1131 507 1.14768 1.257022591 0.8944 0.1056 400 0.59057 1.47570229 0.9292 0.0708

Above the z values and the areas left and right have been found using area of standard distribution tables. Using the central limit theorem, which states, if x possesses any distribution with mean (“m”)and standard deviation(“s”), the sample mean (“x”) based on a random sample of size (“n”), will have a distribution that approaches the distribution of a normal random variable with mean(“m”) and standard deviation(“s”/square root of “n”) as n increases without limit. Also if “n” is 30 or larger, the mean of x’s distribution will appear to be normal and the central limit theorem will apply. Because the z values are now known it will allow comparison and contrasting with other data sets, and the ability to make probability statements.

We can also figure out some confidence levels from the data. For instance we can state with 95% confidence that the population mean of annual sales for Walgreen is between 210.60 and 362.54 by using the Student’s t variable for estimating small samples. In addition we can state with a 95% confidence that the population mean of competing stores is between 5.81 and 9.67. We can also do some Hypothesis testing. Say Walgreen’s wishes to sell off it’s worst performing stores. The company states that average annual sales should be 286,574.07 dollars per store. This is our null hypotheses or H0: 286.57. Our alternate hypothesis is H1: mean < 286.57. Because this decision involves the lives of people we will test at low levels of significance, or (a=.01), this sets the critical region that we are testing at z=-2.33. Calculating for the six stores with the smallest annual sales by using z=Sample test statistic - H0: mean/ standard deviation of the distribution/ (square root of sample size). We find that they are well into the critical region with a z = -3.27. Although it must be stated that the population size was very small, never the less the z value is so low to can be assumed that using any testing method the results would be the same.

Interpretation of Data

Lets start our interpretation of the data with the raw data first. At first glance it seems to be “just a bunch of numbers.(Chart One)” The first thing I did was assign each row of data a number, called the store number, this way I could sort the numbers into more meaningful data. I sorted each column in to find out which stores had the highest and lowest sales, Sq Feet, Inventory, etc…after doing this a clear pattern began to appear that can be seen in Chart 2. In this chart we begin to see that in districts with heavy competition, the stores annual sales are at there lowest.

When we compare these stores to the amount of competition by looking at Chart 3 it can be clearly seen that these stores are in districts with heavy competition. By doing the Hypothesis test on the annual sales of the stores with the lowest annual sales it would seem that these stores are not making enough money and should be sold off if sales do not improve soon. In fact when we compare the advertising dollars being spent in these areas to annual sales, it would seem that the advertising budget is being spent haphazardly. As an example store #24’s ad dollars being spent is six times its annual sales (Chart 6) and at store #1 it is four times its annual sales.

In conclusion I recommend spending little or no money advertising on the six smallest stores that have the heaviest competition, if not selling them off completely before more losses occur. With the money saved then increase the budgets in large districts with few competitors in an attempt to drive them out of town before more come in and do the same thing they have done to stores #24 and #1.

In conclusion I recommend spending little or no money advertising on the six smallest stores that have the heaviest competition, if not selling them off completely before more losses occur. With the money saved then increase the budgets in large districts with few competitors in an attempt to drive them out of town before more come in and do the same thing they have done to stores #24 and #1.


BACK TO STATISTIC'S HOME