|
|
|
|
|
|
|
|
|
|

Introduction to Spreadsheets
Using Microsoft Excel 97

By Dan Browning
Star Tribune/The McClatchy Company
612-673-4493
dbrowning@startribune.com

Introduction

You'll hear a lot of hype about how computers are changing the nature of reporting. Much of it is self-aggrandizement by the reporters who took the time to learn a few digital stunts, then took it all a little too seriously. But one fundamental fact remains true: The job you do will be limited or enhanced to the degree that you understand public records. These days, that means you must understand computerized records. Why? Because your sources in government, industry and philanthropic organizations do. And more and more often, so does our competition.

One of the most useful tools you can have on your computer is a spreadsheet. Excel, Lotus 1-2-3, QuatroPro, they're all good. Each has its strengths. The Star Tribune has decided Excel will be its preferred spreadsheet program.

In this class, you will learn to use Excel 97 to import data. You also will enter data by hand, edit data and format the display of data. You will learn to use spreadsheets for basic math, averages and percentages. And by the end of the class you should have some idea of how to sort and rank data, and how to use the chart wizard to build basic graphs. We'll also show you how to navigate the Help file. Each of you will get copies of the data we're using today. Use this written guide to go back over what you do in class and reinforce the lessons.

The Interface

First, let's take a look at the interface. When you open Excel by double-clicking on the icon, you get a document that says Microsoft Excel — Book 1. That tells you you're working in a blank "workbook," which has several blank pages, or "worksheets". At the bottom of the screen you'll see the worksheets indicated by tabs labeled "Sheet 1", etc. At the top of the screen you'll see a menu bar with these command words in it: File, Edit, View, Insert, Format, Tools, Data, Window, Help. Beneath the menu bar is a "toolbar" with a bunch of buttons. Those buttons perform certain "macro" functions that are also found in the drop-down menu bar at the top of the screen. Don't worry about the terminology for the interface; it takes time to sink in.

Columns, Rows and Cells

The main part of the screen is called the worksheet. That’s where you'll do your work. Each worksheet is cut into a grid by vertical lines, which divide the thing into columns, and horizontal lines, which divide it into rows. The columns are labeled with letters. The rows are labeled with numbers. At the intersection of each row and column are rectangular cells. Each cell can be identified by its address: the intersection of the column and row. In other words, A1 is in the upper left corner; B1 is immediately to its right; A2 is beneath A1, and so on.

Excel 97 can handle 65,536 rows by 256 columns, or 16.77 million cells. That seems like a lot, but for many projects that's way too limited. For instance, the drivers license database contains several million records, or rows. So before you work with any data, it's a good idea to have some idea how many records are in it. Otherwise, you might lop off the bottom of the file and never know it!

Importing Data

Normally when you ask for computerized data you will specify a storage format. If you ask for it as a spreadsheet, you should say you prefer it in an *.XLS format, the file extension used by Excel. The official may say you can't have it that way, but you can have it in Lotus 1-2-3, or QuatroPro, or some other spreadsheet form. In most cases, that's fine. If you're unsure, though, ask the official to save the data as an ASCII text file, or "comma-separated values" file (CSV), or a tab-delimited file. Nearly all programs allow the operator to do this.

A CSV file is an alphanumeric text document in which all of the values are separated by commas. Alpha text is generally encapsulated by quotation marks. Numeric text is not. The end of each line is denoted by two "control characters" that tell the computer to return to the left side of the screen, then move to the next line. If you were to open such a file with Xywrite or another word processor, the first two lines of the data might look something like this:

LINE 1: "City","State","Year","Crime Index","Modified Crime Index","Murder","Forcible Rape","Robbery","Aggravated Assault","Larceny","Motor Vehicle Theft"

LINE 2: "Abilene","Texas",1993,5474,5507,8,75,134,673,1089,3323

When you open the file with Excel, everything should magically line up. In the real world, though, there's often some weird, embedded control characters in the data, and you'll have to use Xywrite, Professional File Editor (PFE) or some other word processor to cull them. But for now, let's assume everything works as it should.

Let's open a file called stclass.csv. You'll do that by clicking on File, Open, choosing File Type CSV, going to the STCLASS directory on the hard drive, and double-clicking on a file called stclass.csv. Viola! It opens and everything lines up just as it should. Now, let's close the file by choosing File, Close. It will ask you if you want to save any changes. If you wanted to save this document as an Excel file, you would click on "yes", then fill in the name of the file. But I've already done that, so for now, just say "no" and close the file

Entering labels and data; sizing columns; renaming the workshee

I've already modified the .csv file and stored it as an Excel workbook. It's on a file called stclass.xls. Please open that file. (You will need a copy of Excel on your computer to work with this file.) To simplify things, I've cut out the 1993 data; we'll just work with the 1994 data.

1. Entering Labels

If the data you're using has labels at the top of each column, great. If it doesn't, you'll want to add them so that you don't get confused. You may also want to label the spreadsheet itself. Our spreadsheet has most of the column labels we need (more on that later), so all we're going to do now is add a spreadsheet label.

Click on cell G1. Type: FBI UCR DATA, 1994. (Notice that as you type, the text appears in two places. In the cell itself and in the line just under the toolbar area.) When you're done typing, hit ENTER or TAB. If you hit ENTER, the cursor moves down one cell, to G2. If you hit TAB, it moves to the right, to H1.

2. Entering Data

By now you've probably figured out that there's nothing tricky about entering data. You go to the cell you want and type it in. Let's go to cell number C48. That should be the population column for Dallas. You'll note that cells C48-58 are empty. Oops! Someone forgot to enter the 1993 population figures. We got them from a book called Crime in the U.S. — 1993.

Enter the following figures (don't use any commas) for the appropriate cities:

Dallas 1042619
Dayton 184352
Denver 498402
Des Moines 195485
Detroit2 1020062
Durham 143172
Elizabeth 109131
El Monte 108028
El Paso 554515
Erie 109749
Escondido 114318

3. Sizing Columns

You may note that the entire number does not display for Dallas and Detroit. The column isn't wide enough. Click on Cell C48. Although the number does not display fully on the spreadsheet, you can see all of it in the line right below the toolbar. Obviously, it can be irritating to have to do this every time you want to see the full number. So let's resize the column, making it wide enough to show the figure.

Place your mouse pointer on the vertical line that separates the letters C and D at the top of the spreadsheet. The cursor will turn into a double-headed arrow. Click-and-hold the left mouse button, then drag your mouse to the right. You can widen or narrow the column this way. Now, let's make it really skinny, so that it just displays one digit. Now, place your cursor between the C and D again. This time, double-click. Excel automatically adjusts the column width for the best fit. You might ask why you wouldn't just do this every time. The answer can be found by placing your cursor between columns G and H and double-clicking. Excel widens the column far more than is necessary so that it can accommodate the "data" in cell G1, your column label. Let's manually resize the column to make it narrower.

4. Renaming the Worksheet. You might say, who cares what the worksheet is named? But if you're working with a complex project or one that goes through several different steps, you will want to name your worksheets so that you don't get lost. Double click on the tab that says Sheet1. Rename it UCR94.

Basic math functions

We're working with raw data from preliminary FBI Uniform Crime Reports. It would be helpful to combine the totals from the violent crimes into a Violent Crime Index, then adjust that number to the populations of each city so that we can rank them based on a common denominator, such as crimes per 100,000 people. We could do the same thing for property crimes but in the interest of time, I'll leave that for you to do some other time on your own.

First, go to cell N6 and enter a column label: Violent Index. Hit enter and you'll be move to cell N7. Now you'll want to add the figures from each of the violent crime columns: Murder, rape, robbery and aggravated assault. You could do that by typing in the numbers from those columns. For instance, for Abilene, Texas you would add 5+90+119+506. But you don't want to do that for every city. It would take forever. That's why you use cell addresses.

Every formula begins with an equal sign (=). In cell N7, enter the following formula: =F7+G7+H7+I7. Now all you have to do is copy the formula down to the bottom of the spreadsheet. Excel automatically changes the cell address as you do so. Here's how you copy the formula: Make sure your cursor is on cell N7. Note the tiny box at the bottom-right corner of the cell. Put your cursor arrow on that point, click and hold the left mouse button, and drag straight down. When you get there, let go of the button.

Now, let's figure the violent crimes per 100,000 residents. That would be Violent Index/(Pop93/100000). Go to cell O6 and enter the label, Violent/100k. Now, in cell O7, enter the following formula: =N7/(C7/100000). Now copy the formula to the bottom.

WARNING: Your numbers may not be correct for some figures in the Violent Index column. Did you notice that some cities lacked data for the various categories of violent crimes? Well, you can't compare the Violent Index of those cities to the Violent Index of the cities that reported data for all of the violent crimes. Of course, this means that the Violent/100k column is also off for some of those cities. This is an example of what can go wrong in computer-assisted reporting. If you fail to note these idiosyncrasies as you go, the error can get magnified.

Let's clean up by deleting the figures in the Violent Index and Violent/100k columns for the following cells. Put your cursor on the following cells and hit the delete key:

N16,O16
N21,O21
N38,O38
N52,O52
N61,O61
N97,097
N103,O103
N149,O149
N176,O176

We're going to come back to the UCR data in a moment. But for now let's take a look at a different worksheet. Click on the tab at the bottom of your screen that says Ramstad.

Sums and Averages

Let's say your working on a profile of a politician named Jim Ramstad and he's made a big deal out of the line that he's the candidate for the little guy. He claims he's beholden to no special interest groups. You decide to check his PAC contributions. First, it would be nice to know how much money he got in a particular election cycle. So we'll want to add up all of the contributions. Put your cursor in the cell at the bottom of the Amount column. (That's cell D148.) Now, you could sum everything in either of the following ways:

1. By entering this formula:
= SUM(Beginning of Range:End of Range). In this case, that would be: = SUM(D2:D147)

2. By hitting the button that looks like a sideways capital M (it's right under the "t" in the word Format in the menu bar). It’s actually the Greek letter, S (sigma). It will automatically plop in the formula. Look at the range to see if it is correct. If so, enter it.

Now you need to calculate the average amount. The formula is the same as the one for summing, except that you put the word "average" in place of the word "sum". In this case, it would look like this: = AVERAGE(D2:D147)

So you find that the average PAC contribution was about 606. What kind of 606? Dollars, of course. Click on the column heading for Amount, which is the letter D at the top of the worksheet. Now go to your tool bar and click on the $ symbol. Nice, eh? You can increase or decrease the decimals by clicking on the two buttons to the right of the % symbol.

Is $606 a lot? A little? Or about average? Let's click on the spreadsheet tab that says PACS and find out. This is a list of all 1,363 PAC contributions to all candidates in the same election cycle.

Please calculate the average contribution PACs gave to the candidates and put it at the bottom of the Amount column, in cell D1364. You'll see that I've already entered the average that Ramstad got just below it, in cell D1365.

Now I ask you again: Is Ramstad's contribution from PACs small, large or about average? How much different is the average contribution to Ramstad from the average contribution to all candidates? Let's find out. In cell D1366, calculate the percent difference from the average. To do that, you use the following formula: =(Starting Value-Ending Value)/StartValue.

Don't worry about multiplying the end result by 100. We'll take care of that with a formatting command. So, in cell D1366, you should have entered the following formula: =(D1364-D1365)/D1364.

The end result? 13 percent. To make it look like 13 percent click on the percent symbol in the toolbar. It's under the H in Help.

Let's say you're losing interest in this story. Instead, you want to look at the high rollers, if there are any. Which PACs made impressively large contributions?

Sorting Data

You could hunt through every line of the data, or you could sort it in descending order, based on the amount of the contribution, to quickly scan the biggest contributions.

BE CAREFUL! Sorting data can get you in big trouble. If you just sort Amount column, for instance, all the rest of the data will stay in the same spot and the amounts will no longer correspond to either the contributor or the candidate. Here's how to do it properly:

You highlight an entire row of data by clicking on the corresponding row number at the left edge of the screen. If you click-hold-and-drag the mouse downward, you can highlight as many rows as you like. We want to highlight rows 1-1363. (Note: Don't highlight rows 1364-6, which contain the averages that we just calculated! That would be mixing apples with sugarplums.)

Once you've highlighted rows 1-1363, release the left mouse button. Go to the menu bar at the top of the screen and click on the word Data. A drop-down menu will reveal itself. Choose the option, Sort.

If you included column headings in the first row, you will be able to choose your sort based on those headings. In our case, choose the word, Amount. (If you don't have column headings, you'll have to choose the sort criteria based on the letter at the top of the column; in this case, that would be D.) Choose ascending or descending order (we want descending) and click on OK.

Presto! At the top you should have the biggest contributions. Now, let's sort them based on the candidate's name in ascending order, and the amount in descending order.

Ranking

Please go back to the FBI crime data. Many people confuse ranking with sorting. After all, if you were to sort all the data in this spreadsheet in descending order of violent crimes per 100,000, you would have a pretty good idea of which cities were the most violent.

But sorting does not address what rank you're in if the preceding six cities were tied for say, 50th place. Are you in 51st place? Or 57th place? Or some other place in between. Statisticians have a formula for redistributing ties like this, which we won't delve into because Excel does it for us. Microsoft included a "macro", or a "function call", that adjusts the rankings for you automatically.

Here's the formula:
= RANK(Number, Reference, order)

Number is the cell value that you want to attach a rank to. It can either be a number or a cell address. Reference is the range of values, indicated by cell address. Order can be either blank, zero or any value other than zero. If it's blank or zero, Excel calculates the rank in descending order. If it's any value besides zero, Excel calculates the rank in ascending order.

It's harder to describe than it is to do, so let's just do it. Open the spreadsheet labeled UCR94. Start by entering a column label in cell P6 that says, Violent Rank.

Now, in cell P7, enter the following formula exactly as it's written here: = RANK(O7,$O$7:$O$198, )

Note: The $ sign in this formula has nothing to do with money. It's an Excel convention that is used to lock the position of cell range. $O locks the column. $7 and $198 lock in the beginning and ending points of the range. If you don't lock in the cell range, it will change incrementally as the formula is copied down the sheet. You want the first cell address to change as you copy the formula, so you leave it unlocked.

Let's dissect the formula so that it doesn't seem so weird. The formula assigns a rank for the number in cell O7 out of a reference defined by the values in cells O7 through O198 (inclusive).

Right now, the data are in alphabetical order by city, making it easy for you to scroll through it to find out the rank of your city. You could also sort the data in ascending order of Violent Rank to find the most violent cities at a glance. Let's do that.

Highlight rows 7 through 198. Click on Data-Sort. Choose column P. Make sure it's in ascending order.

Charting

Now, let's chart the top 10 cities by the number of violent crimes per 100,000 residents. To save some time, I've already broken them out into a worksheet called CHART. To create a chart, highlight the stuff you want to chart and click on the "chart wizard" button in the toolbar. It's the one that looks like a bar graph.

Ok, highlight cells A2 through A12. Hold down the control key and highlight cells I2 through I12. Let go of the mouse button and the control key. The two columns of data — CITIES and Violent/100k — should be highlighted. Now click on the chart wizard.

You’ll be given a choice of charts. Pick the one that suits you. To see a preview, click on the bar on the bottom right portion of the window. Shizam! You can step through the Wizard and alter the chart as you wish.

Summary

Congratulations (and thanks) for sticking with it this far. You've learned to import, enter and edit data. You've learned to do basic math, percentages, averages, sums, sorting and ranking. And you've seen the charting function.

You may forget some of what we did. If so, you can find help on just about any topic by clicking on the word Help in the menu bar. Select the Search option and type in a word. You'll be launched into a good description of what you want to know. If you have any questions, please call me: 612/673-4493.

World Press Institute
3415 University Avenue • St. Paul, Minnesota 55114 • Phone: 651-208-9378
Contact us at
: info@worldpressinstitute.org