|
|
Comparing Databases and Spreadsheets for Storing Data
--by Shannon Hurn Many
information managers and biologists are still using spreadsheets to store their
data, even though the data could be better managed in a database. There is a lot
of confusion concerning the differences between spreadsheets and databases. The
confusion stems from the basic similarity that the key function of both
spreadsheets and databases is to store and manipulate data. However, there are
some distinct advantages and disadvantages to both spreadsheets and databases
that define their usefulness as a data management tool. Spreadsheets
are designed to analyze data and sort list items, not for long-term storage of
raw data. A spreadsheet should be used for ‘crunching’ numbers and storage
of single list items. Spreadsheet programs provide the means for keeping
inventory, statistical data modeling, and computing data.
They also include graphing functions that allow for quick reporting and
analysis of data. Spreadsheet programs are relatively easy to use, require
little training to get started, and have the advantage that most data mangers
are somewhat familiar with them. The
disadvantages of spreadsheets include having to re-copy data over and over again
to maintain it in separate data files, the inability to efficiently identify
data errors, the lack of detailed sorting and querying abilities, and sharing
violations among users wishing to view or change data at the same time.
Additionally, spreadsheets are restricted to a finite number of records, and can
require a large amount of hard-drive space for data storage. To
store large amounts of raw data, it is best to use a database. This is especially true in circumstances where two or more
users share the information. Databases
require little or no duplication of data between information tables, and changes
made to the data do not corrupt the programming (like at the cell level of a
spreadsheet where calculations are running). Databases offer better security to
restrict users from accessing privileged information, and from changing coded
information in the programming. Furthermore, the two most important benefits
gained by using a database are the capacity to increase data integrity, and the
ease of reporting and sharing data. The
disadvantages of databases include requiring the user to learn a new system, and
a greater investment in training and software. In addition, there is the initial
time and cost of migrating all of the data into a new database system. For
optimal effectiveness, spreadsheets and databases should be used in conjunction
with one another. The database should be used to store and protect all of the
data. This includes when the data is initially entered, checked for errors, and
verified. Since almost all
databases easily import and export data to and from spreadsheet programs, the
data can then be shared with a spreadsheet program to analyze the data, run
calculations, and create tables and graphs. There
are some common signs that indicate when an individual or a group needs to
update its data collection and storage system from a spreadsheet to a database.
These signs include: 1.
You are creating numerous spreadsheets containing similar data. 2.
Your hard drive is overrun with spreadsheet files. 3.
Spreadsheets are not allowing you the functionality extent that you
require. 4.
You are constantly modifying the data to export it to another user. 5.
Changes made in one spreadsheet require you to make changes in one or
more additional spreadsheets. 6.
Your current system needs to be accessed by two or more people at the
same time, and sharing violations are occurring. The
scenario below contains more information on deciding when a database is needed,
and the benefits that can be gained from shifting your data management system
from a spreadsheet to a database. Scenario:
A biologist at a district office has conducted spawning surveys for three
species of anadromous salmon for the past 30 years. This survey project involves
collecting data on the number of redds, number of fish observed,
occupied/unoccupied redds, water temperature, observer’s name, as well as
other species observed during the survey. The data is collected and stored in
Excel spreadsheets, which requires the biologist to add new files each year to
contain the information collected. It also requires the biologist to make
individual files to separate the information by data type, species, habitat
information, etc., which had added up to numerous files on the biologist’s
hard-drive. The
biologist is finding that in order to meet her yearly reporting requirements,
she is opening and closing numerous spreadsheets, mixing data for analysis while
trying to keep it intact elsewhere, and correcting data inconsistencies due to
seasonal staff entering data in different formats. She is also finding that file
management is becoming difficult, and that each time she needs to report to a
different entity, she is spending a lot of time searching for data and
manipulating it into a new format. Furthermore,
the spreadsheet program is not capturing the comprehensive field notes taken by
the seasonal staff. The
biologist decides that it is time to upgrade to a database system. This requires
her appropriate funding in the budget for software and to attend a weeklong
training class. The training class
is required for her to learn some basic database programming skills, and to
learn to design a system to meet her needs. Data from all of the previous years
is then imported from the Excel spreadsheets into the database, and tables are
created to hold it in a consistent format. The biologist decides to store all of
the survey records in a table that contains a date field, where she can query
out any year’s data that she wants to view. After spending time transferring
the data and formatting the tables, she creates data entry forms for her
seasonal staff for easier data entry. The forms contain defaults, codes for
commonly used terms, drop-down lists to limit choices, and a separate field for
notes. She will then be able to run a query on any word in the notes or
incidental species fields, and quickly find the information she is looking for.
In addition, she creates standard queries she uses routinely for meetings and
annual reporting procedures. For her annual report, she can query the records by
species and survey date, and then export the data to Excel where she can create
tables for publication. Once
the need for a database is established, the next action is to choose the most
effective database system. To do
this, you will need to decide whether a stand-alone database or a client-server
database will be needed. A
stand-alone database is located entirely on a user’s hard-drive, and involves
a much more simple system. The front-end of the database (containing the user
interface) and the back-end (containing the data) can either be contained within
the same database file, or split into two, separate files. If the computer on which the database resides has a shared
hard-drive and is connected to a network, the database can be opened and used by
two or more users at any time. If the computer does not have a shared
hard-drive, replicas of the database can be created for each additional user,
and the replicas can be periodically synchronized with the master database. A
client-server database is a slightly more complicated system, where the back-end
of the database is located on a network server, and a copy of the front-end of
the database is located on each user’s hard drive.
The advantages of this system include the ability to customize each user
interface, and a larger data storage capacity.
There is also no need for periodic synchronizations, since all of the
users are working with the same data file. This type of system may be needed if
a stand-alone database is inefficient for the number of users needing access to
it, or when a considerable volume of data storage is required. Once a
database system is chosen, there are some questions you can ask when determining
which database software to use. These questions include: 1.
What types of data are you working with (numbers, text, and/or graphics)
and how will the data be used (reports, searches, data storage)? 2.
What type of information or analysis are you seeking from your data? 3.
Will you be importing or exporting data? 4.
How do you want users to access the data? 5.
How does data flow in your current collection and storage process? The
answers to these questions can be taken to a software store or consultant, or
you can do some research on the web to determine which database software will
best meet your needs and expectations. Hopefully,
the information contained in this article has provided some clarity of the use
of spreadsheets and databases in data management. For more information, you can
go to the following sources: ·
http://searchwin2000.techtarget.com/sDefinition/0,,sid1_gci532933,00.html ·
http://phoenix.bcentral.com/phoenix/stories/1998/10/19/smallb4.html |
|