How can we tap the wisdom of the crowd? “Crowd Truthing,” also known as “crowd sourcing” is one approach, in which hundreds or thousands of individuals analyze numerous small segments of very large data sets. The collective findings are then sifted to answer questions no single researcher could hope to tackle alone. This approach is being used with great success to help scientists do everything from classify galaxies to map craters on the moon.
Researchers at the University of Victoria wondered if crowd sourcing might be harnessed to identify and classify fish swimming through Ocean Networks Canada’s many hundreds of hours of underwater video archives. Two biology instructors, Thomas Reimchen and Roswitha Marx along with graduate assistant Steve Leaver conducted a unique experiment this spring, engaging an ichthyology (fish zoology) class in fish identification exercises via SeaTube, our underwater video viewer.
The objective of this experiment was to evaluate the reliability and consistency of fish identifications made by 3rd year biology students. 30 two-student teams used SeaTube to view and annotate 6 videos recorded at deep, medium and shallow water locations. Together, the students recorded over 1000 unique annotations including the following information:
- fish identifications (IDs: order, family, genus and/or species when possible)
- fish counts
- seabed classifications (eg. mud, silt, small pebbles, large rocks)
- fish activity levels (low, medium, high)
Following this initial ID exercise, student teams were then asked to analyze the resulting data, comparing annotations from different groups to see how consistent they were. These analyses were then presented, along with conclusions about the effectiveness of this crowd sourcing methodology and suggestions for ways to improve both their methods and Ocean Networks Canada’s software tools.
The crowd sourcing experiment uncovered interesting findings related to both methodology and interface design. Although seabed and fish activity level IDs were highly consistent across groups, species IDs were highly inconsistent. This means that repeatability of species IDs was more variable than expected.
Several reasons were suggested for this result. First, ID to species via video can be a difficult task, especially when fish move quickly in and out of view. Second, when multiple fish appeared during a video segment, students were sometimes confused about which fish belonged to which ID. Finally, there were also some logistical obstacles: time segments were inconsistent, there were occasional website outages, high-resolution video was not always available, and overall video length was too long for students to maintain highly focused attention.
Despite technical glitches and methodology inconsistencies, it was clear to all involved that, with improvements, there is tremendous potential for crowd sourcing to be used both as a research tool and as a teaching method.
Students suggested improvements for both SeaTube functionality and crowd sourcing methodology. These suggestions will be used to shape future SeaTube improvements as well as another crowd sourcing tool, Digital Fishers, currently under development. We hope to improve video reliability and provide more video in HD. We’re also working on improving the annotation interface so students will find it easier to create, review and discover annotations. Methodology suggestions, such as increased ID support materials, more up-front training and standardizing time segments, will help guide future classroom-based experiments in crowd sourcing, planned for this coming July and September.