Welcome to the first module of Thing 1! This module starts at the very beginning, defining data, providing specific examples, and challenging the different ways you might think about data. Before we dive right into what data is, consider this introductory video from the University of Houston Library.
Watch this video as an introduction to data, making note of what you already knew, and what information is new to you.
What is data?
Data is all around us. Right now, you are swimming in your very own ocean of data! Data can be almost anything.
If data can be anything, then what separates it from information? How do we even define and keep track of data if there are so many options?
Context is key! Data is information that has been put into a specific context. It is information used to answer a question that you might have, whether it’s as simple as: what is the average height in my family or as complex as: how do repeated hurricanes affect the Nova Scotia lobster population over a long-term period?
Let’s look at some examples of data being all around you.
Digital Data Examples:
There are likely a few digital examples that come to mind when you think about data. Examples such as social media, where you can find data about your views, likes and other interactions with the content you post.
You can also get data even just from your photo gallery, or looking at your videos. Who is in them? How long are the videos? Where and when were they filmed?
With digital data, you can have a huge wealth of it at your fingertips in mere moments!
Analog Data Examples:
It’s likely that all of you taking this course had a digital example of data that you could think of. But data isn’t merely digital, and in fact digital data is still relatively new in comparison! Even things that you wouldn’t expect at first glance, can be considered data.
No matter where you are, whether it’s working from home, working in an office, or even taking this course in a library or coffee shop, there is data that can be seen and understood around you. Look around your surroundings for things that could be data.
If you are working at home in either a home office or converted room, most likely you have several books or textbooks for school, work or leisure that you own. What data points can you get from those books?
If you’re in an office there are likely other cubicles, computers or people around you, and they can be just as much data as books might be. Finally, even when in a coffee shop there is data everywhere, such as prices of various drinks and snacks, stock amount, even things like cook time for each different pastry, or number of people in the store each hour.
Consider your surroundings. Is there another collection of objects that could provide data points, beyond the examples mentioned here? Why do you think so? What data do they provide?
Our Answer: Consider a purse, backpack or bag. The items inside can tell you a lot of data about a person, and some items individually might also be datasets. Such as a receipt from a store, which contains the data of what item was bought, what price it cost, often the date and time of transaction, the name of the cashier and any coupons or deals. The receipt itself would be just one data point when considering everything that could be in a bag, which would offer information such as, where does a person shop, what do they normally carry with them (laptop, book, snacks), do they carry physical money, or only cards? And these are just some of the possible answers!
Let’s take it even further, going back to our original example of books in one’s room. The following is an example answer of possible data points from a collection of objects, specifically on books that a person might have in their room.
There are quite a number of different data points that could be gathered from books. There is the physical size of the books, such as A5, pocket size, or letter size to name a few. There are also distinctions between the type of book, (textbook, hardback, paperback, trade paperback), the number of pages, the cost of each book, or the subject. There are even more data points beyond the examples listed here that can be found from just a few books! What do you think those data points are? Come up with a few that aren’t listed here on your own.
Using this book as an example, here are just some of the answers that you could come up with:
This book has a title: ‘Automate the Boring Stuff With Python: Practical Programming for Total Beginners’, and edition number: ‘2nd edition’, is a textbook of a unique size, contains 547 pages, is on the subject of programming and computer science and costs $53.95 Canadian. Other data points to consider: the author ‘Al Sweigart’, the publishers ‘No Starch Press’ and the original publishing location which is San Francisco. There is even further data which might be available with research, such as the word count of the book.
Using these examples (or your own examples that you’ve come up with), think of some questions you could answer from the data. Once you’ve thought of a question, consider: how simple or complex it is? Does it need only one kind of data to answer the question, or will you need to use more than one?
Challenge 2 Answers
Here are some possible answers, using the books from your room example from earlier:
With just one book we could answer simple questions such as: who is the author? How long is the book? What kind of book is it?
When we add another kind of data is when things get truly interesting! Other data points could include another book to compare the two, or a more abstract kind, such as time or reviews/opinions of the book.
With more data points more complicated questions can be answered such as:
What is the average price point per book subject? How many pages on average is there in each different book subject?
Combining multiple variables and kinds of data makes a dataset and then things become even more interesting! Suddenly there are even more questions to ask and answer. Questions such as: do students find books of a certain subject and length to be more or less worth their money for the learning outcomes they received? Consider more complex questions that could be answered with larger datasets.
Now that we’ve thought about which questions we can ask from the data we have, let’s approach data from another angle and think about what data we will need for a particular question.
We’ll start with a fairly easy research question and discuss the various kinds of data we would need to answer that question.
We’ll be focusing on the ocean in this course, so naturally the question will be about ocean research.
Our question is: How does the temperature of the ocean affect the weather of Nova Scotia over a period of 1 year?
What kind of data would we need to answer this question?
Here are just some possible answers:
- Would need old data to compare to the new data
- Temperature of the ocean, daily, weekly, monthly?
- Weather of Nova Scotia, near the ocean, daily, weekly, monthly?
- Specific kinds of weather, wind, rain, snow, etc
- Control area, inland part of Nova Scotia that is less affected by ocean to compare weather data to
And this is just a taste of some of the data you could use to answer such a question! The actual research project would probably turn out much more complicated, but this lesson gives a good overview of the basic steps to do so.
Data when combined with other data becomes datasets, which can be used across time and space to answer questions. Most research begins with a question and from there researchers slowly refine what kind of data they will need to answer the question. This module has attempted to get you thinking about data in a variety of ways. It can be useful to turn ones thinking around when stumped and think about research from another angle, such as the example provided earlier of thinking about what questions can be asked from data that is gathered.
In our next module we will introduce managing data.
Before You Go! Please consider for the next module:
Consider a research question that is interesting to you. What kind of data would you need to gather to answer that question?
Then, turn your thinking around. What other kinds of questions could be answered from this data? Are there any? Would it be easy to combine it with another study and answer new questions? And why is it important to think about data like this? (See the next module for the answers!)