1. Go to Data.gov and find a data set (I had to look for one with an Excel file that I could use) with data that will have a meaningful mean and median.

NO repeats.

2. Provide information from the data set:

  • Dataset_Name: Arsenic_nov2001.xls
  • 20,000 arsenic samples from potable ground water, retrieved from the USGS National Water Information System in 2001.
  • This dataset is a product of the U.S, Geological Syrvey’s National Water-Quality Assessment (NAWQA) program.
  • Data format: excel
  • Worldwide Web URL:
  • Citation: Updated from: Focazio, M.J., Welch, A.H., Watkins, S.A., Helsel, D.R., and Horn, M.A.

3. Pick a row/ column

  • I used the row for arsenic concentration
  • AS_CONC: Concentration of arsenic in sample, in micrograms per liter (ug/L) as arsenic

4. Use technology to calculate the mean.

  • Mine = 7.3789 micrograms/L

5. Use technology to calculate the median.

  • Mine = 1 micrograms/L

6. Discuss what this implies about your data values in terms of distribution.

  • My minimum was .9 (there was only 1 at this value). The next value was 1. For 1 to be my median means basically the entire first half of the 20,043 samples was 1 micrograms/L. For the mean to be so much higher indicates that the higher values are considerably higher. The maximum was 2,600 micrograms/L with the next highest being 2,200 micrograms/L. With these extreme values, the mean gets pulled up and we would see an extreme skewed right distribution. (You would need to reference the text or a website for this as well)

WRITE MY PAPER


Comments

Leave a Reply