Comparing Databases and Spreadsheets for Storing Data, Natural Resources Information Management Program, ODFW

Comparing Databases and Spreadsheets for Storing Data

--by Shannon Hurn

Many information managers and biologists are still using spreadsheets to store their data, even though the data could be better managed in a database. There is a lot of confusion concerning the differences between spreadsheets and databases. The confusion stems from the basic similarity that the key function of both spreadsheets and databases is to store and manipulate data. However, there are some distinct advantages and disadvantages to both spreadsheets and databases that define their usefulness as a data management tool.

Spreadsheets are designed to analyze data and sort list items, not for long-term storage of raw data. A spreadsheet should be used for ‘crunching’ numbers and storage of single list items. Spreadsheet programs provide the means for keeping inventory, statistical data modeling, and computing data. They also include graphing functions that allow for quick reporting and analysis of data. Spreadsheet programs are relatively easy to use, require little training to get started, and have the advantage that most data mangers are somewhat familiar with them.

The disadvantages of spreadsheets include having to re-copy data over and over again to maintain it in separate data files, the inability to efficiently identify data errors, the lack of detailed sorting and querying abilities, and sharing violations among users wishing to view or change data at the same time. Additionally, spreadsheets are restricted to a finite number of records, and can require a large amount of hard-drive space for data storage.

To store large amounts of raw data, it is best to use a database. This is especially true in circumstances where two or more users share the information. Databases require little or no duplication of data between information tables, and changes made to the data do not corrupt the programming (like at the cell level of a spreadsheet where calculations are running). Databases offer better security to restrict users from accessing privileged information, and from changing coded information in the programming. Furthermore, the two most important benefits gained by using a database are the capacity to increase data integrity, and the ease of reporting and sharing data.

The disadvantages of databases include requiring the user to learn a new system, and a greater investment in training and software. In addition, there is the initial time and cost of migrating all of the data into a new database system.

For optimal effectiveness, spreadsheets and databases should be used in conjunction with one another. The database should be used to store and protect all of the data. This includes when the data is initially entered, checked for errors, and verified. Since almost all databases easily import and export data to and from spreadsheet programs, the data can then be shared with a spreadsheet program to analyze the data, run calculations, and create tables and graphs.

There are some common signs that indicate when an individual or a group needs to update its data collection and storage system from a spreadsheet to a database. These signs include:

1. You are creating numerous spreadsheets containing similar data.

2. Your hard drive is overrun with spreadsheet files.

3. Spreadsheets are not allowing you the functionality extent that you require.

4. You are constantly modifying the data to export it to another user.

5. Changes made in one spreadsheet require you to make changes in one or more additional spreadsheets.

6. Your current system needs to be accessed by two or more people at the same time, and sharing violations are occurring.

The scenario below contains more information on deciding when a database is needed, and the benefits that can be gained from shifting your data management system from a spreadsheet to a database.

Scenario: A biologist at a district office has conducted spawning surveys for three species of anadromous salmon for the past 30 years. This survey project involves collecting data on the number of redds, number of fish observed, occupied/unoccupied redds, water temperature, observer’s name, as well as other species observed during the survey. The data is collected and stored in Excel spreadsheets, which requires the biologist to add new files each year to contain the information collected. It also requires the biologist to make individual files to separate the information by data type, species, habitat information, etc., which had added up to numerous files on the biologist’s hard-drive.

The biologist is finding that in order to meet her yearly reporting requirements, she is opening and closing numerous spreadsheets, mixing data for analysis while trying to keep it intact elsewhere, and correcting data inconsistencies due to seasonal staff entering data in different formats. She is also finding that file management is becoming difficult, and that each time she needs to report to a different entity, she is spending a lot of time searching for data and manipulating it into a new format. Furthermore, the spreadsheet program is not capturing the comprehensive field notes taken by the seasonal staff.

The biologist decides that it is time to upgrade to a database system. This requires her appropriate funding in the budget for software and to attend a weeklong training class. The training class is required for her to learn some basic database programming skills, and to learn to design a system to meet her needs. Data from all of the previous years is then imported from the Excel spreadsheets into the database, and tables are created to hold it in a consistent format. The biologist decides to store all of the survey records in a table that contains a date field, where she can query out any year’s data that she wants to view. After spending time transferring the data and formatting the tables, she creates data entry forms for her seasonal staff for easier data entry. The forms contain defaults, codes for commonly used terms, drop-down lists to limit choices, and a separate field for notes. She will then be able to run a query on any word in the notes or incidental species fields, and quickly find the information she is looking for. In addition, she creates standard queries she uses routinely for meetings and annual reporting procedures. For her annual report, she can query the records by species and survey date, and then export the data to Excel where she can create tables for publication.

Once the need for a database is established, the next action is to choose the most effective database system. To do this, you will need to decide whether a stand-alone database or a client-server database will be needed.

A stand-alone database is located entirely on a user’s hard-drive, and involves a much more simple system. The front-end of the database (containing the user interface) and the back-end (containing the data) can either be contained within the same database file, or split into two, separate files. If the computer on which the database resides has a shared hard-drive and is connected to a network, the database can be opened and used by two or more users at any time. If the computer does not have a shared hard-drive, replicas of the database can be created for each additional user, and the replicas can be periodically synchronized with the master database.

A client-server database is a slightly more complicated system, where the back-end of the database is located on a network server, and a copy of the front-end of the database is located on each user’s hard drive. The advantages of this system include the ability to customize each user interface, and a larger data storage capacity. There is also no need for periodic synchronizations, since all of the users are working with the same data file. This type of system may be needed if a stand-alone database is inefficient for the number of users needing access to it, or when a considerable volume of data storage is required.

Once a database system is chosen, there are some questions you can ask when determining which database software to use. These questions include:

1. What types of data are you working with (numbers, text, and/or graphics) and how will the data be used (reports, searches, data storage)?

2. What type of information or analysis are you seeking from your data?

3. Will you be importing or exporting data?

4. How do you want users to access the data?

5. How does data flow in your current collection and storage process?

The answers to these questions can be taken to a software store or consultant, or you can do some research on the web to determine which database software will best meet your needs and expectations.

Hopefully, the information contained in this article has provided some clarity of the use of spreadsheets and databases in data management. For more information, you can go to the following sources:

· http://www.smartcomputing.com/editorial/article.asp?article=articles%2F1998%2Faug98%2F0815%2F0815%2Easp

· http://searchwin2000.techtarget.com/sDefinition/0,,sid1_gci532933,00.html

· http://phoenix.bcentral.com/phoenix/stories/1998/10/19/smallb4.html

OSU Web Disclaimer

Funding provided by:

Comparing Databases and Spreadsheets for Storing Data

Webmaster: HerberW@for.orst.edu Copyright © 2000-2003 Oregon Department of Fish and Wildlife Page last updated on 12/01/03

Webmaster: HerberW@for.orst.edu
Copyright © 2000-2003 Oregon Department of Fish and Wildlife
Page last updated on 12/01/03