Data Quality

Ocean Networks Canada aims at the highest standards of data Quality Assurance and Quality Control (QAQC) testing and reporting. One of the problems facing real-time ocean observatories is the ability to provide a fast and accurate assessment of the data quality. Ocean Networks Canada is in the process of implementing real-time quality control on incoming data. For scalar data, the aim is to meet the guidelines of the Quality Assurance of Real Time Oceanographic Data (QARTOD) group. QARTOD is a US organization tasked with identifying issues involved with incoming real-time data from the U.S Integrated Ocean Observing System (IOOS). A large portion of their agenda is to create guidelines for how the quality of real-time data is to be determined and reported to the scientific community. Ocean Networks Canada is striving to adhere to the QARTOD’s ‘Seven Laws of Data Management’ to provide trusted data to the scientific community.

QARTOD’s Seven Laws of Data Management:

  • Every real-time observation distributed to the ocean community must be accompanied by a quality descriptor.
  • All observations should be subject to some level of automated real-time quality test.
  • Quality flags and quality test descriptions must be sufficiently described in the accompanying metadata.
  • Observers should independently verify or calibrate a sensor before deployment.
  • Observers should describe their method / calibration in the real-time metadata.
  • Observers should quantify the level of calibration accuracy and the associated expected error bounds.
  • Manual checks on the automated procedures, the real-time data collected and the status of the observing system must be provided by the observer on a timescale appropriate to ensure the integrity of the observing system.

Real-time data quality testing at Ocean Networks Canada includes tests designed to catch instrument failures and major spikes or data dropouts before the data is made available to the user. Real-time quality tests include meeting instrument manufacturer’s standards and overall observatory/site ranges determined from previous data. Due to the positioning of some instrument platforms in highly productive areas, we have also designed dual-sensor tests e.g. for some conductivity sensors.

The quality control testing is split into 3 separate categories. The first category is in real-time and tests the data before the data are parsed into the database. The second category is delayed-mode testing where archived data are subject to testing after a certain period of time. The third category is manual quality control by an Ocean Networks Canada data expert.

Quality Control Flags:

Ocean Networks Canada has adopted the ARGO quality control flags. These flag and descriptions are as follows:

ARGO Data Quality Flag Description
0 No quality control on data.
1 Data passed all tests.
2 Data probably good.
3 Data probably bad. Failed minor tests.
4 Data bad. Failed major tests.
7 Averaged Value
8 Interpolated Value
9 Missing data

 

Testing Terminology

Major test

A test that sets gross limits on the incoming data such as instrument manufacturer’s specifications or climatological values. If failed, we recommend that the flagged data not be used by Ocean Networks Canada users.

Minor test

A test that sets local limits on the incoming data such as site-level values based on statistics and dual-sensor testing to catch conductivity cell plugs. If failed, the data are considered suspect and require further attention by the user to decide whether or not to include these data in their analysis.

Real-time quality control tests

Instrument Level: Tests at this level are to determine whether the data meet manufacturer’s range specifications for each sensor. Failure of this test level is considered major and is likely due to sensor failure or a loss of calibration. Calibration and configuration information is kept in the database so that it can be returned in the metadata.

Regional Level: Tests at this level are to eliminate extreme values in water properties that are not climatogically associated with the overall region. Failure of this test is considered major and could be due to sensor drift, bio fouling, etc. The defined regions are collections of stations that have similar values where minimum/maximum test values are chosen based on years of available high-quality data at the collection of stations. Examples of Regions are:

  • Northeast Pacific Shallow - encompasses all stations in the Salish Sea and West Coast Vancouver Island between 50-300m in depth

  • Northeast Pacific Deep - encompasses all stations below 300m in depth

  • Northeast Pacific Near-Surface - encompasses all stationary stations between 0-50m in depth.

  • Northeast Pacific Mobile platforms - encompasses all stations with shipboard flow-through systems, for example, BC Ferries and various research vessels etc. This test excludes ROV's that go to depth.

  • Arctic - encompasses all stations in the Canadian Arctic archipelago.

Station Level: Testing at this level puts the data through more stringent limits based on the previous data from each station that has adequate data to support these tests:

Single-Sensor Testing:

Range: Minimum/maximum values for this level of testing originate in the statistics of the previous years of data. The limits are set as +/- 3 standard deviations about the mean without considering seasonal effects. Failure of this test is considered minor as it could stem from a rarely occurring but real water mass or short term bio-fouling such as a plugged conductivity cell. Further testing is provided to determine whether a failure is a plugged conductivity cell.

Dual-Sensor Testing:

Temperature-Conductivity Testing: This test is designed to catch a dropout in conductivity that are not necessarily outside the site level range given by the single-sensor testing. This is a dual-sensor test that uses both the temperature and conductivity sensors of a single device to determine whether the conductivity data are good. Dropouts in conductivity are very apparent in a temperature-conductivity plot and can be flagged relatively easily using a simple equation. Failure of this test is minor. All derived sensors (salinity, density and sigma-T) inherit the quality flag from this test.

Delayed-Mode Testing

 

Spike Tests: This test designed to identify implausible singular spikes in scalar data. This test requires three consecutive values where the central value is compared to the surrounding values. If the test exceeds a threshold, the data point is flagged as bad. The Spike Test formula is:

Test Value = |V2-(V3+V1)/2| - |(V3-V1)/2|     

  • If the Test Value >= Threshold value, test fails.
  • If the Test Value < Threshold value, test passes.

Gradient Tests: This test designed to identify implausible gradients in scalar data. This test requires three consecutive values where the central value is compared to the surrounding values. If the test exceeds a threshold, the data point is flagged as bad. The Gradient Test formula is:

Test Value =  |V2-(V3+V1)/2|

  • If the Test Value >= Threshold value, test fails.
  • If the Test Value < Threshold value, test passes.

Stuck Value Tests: If the value of a scalar sensor stream has not changed within a given time period, the data are flagged as bad.

Moving Average Spike Test: Coming soon in Spring 2016!

 

Automatic QAQC Test Inheritance

Data that are automatically derived from sensor data, the QAQC flags are inherited from the sensors that are in the derivation formula. For example, density is derived from temperature, salinity and pressure - if any of these sensors fail a test, the density will also inheret the failed QAQC value.

 

Manual Quality Control

All ONC data undergoes some form of manual quality control to assure the user that an expert is regularly checking the data. If real-time or delayed tests do not pick up an entire episode of ‘bad’ data, manual quality control will. This is a major test.

Time gaps

When clean data are requested in the data access tools, QAQC test failures result in data gaps, and these are filled with ‘NaN’ (Not a Number) on demand if applicable.

Frequently Asked Questions

Q1. How does Ocean Networks Canada determine the final quality control flag?

All data are passed through each level of testing to create a quality control vector containing the output for each test. The overall output quality control flag is determined from the set of QC flags for each datum as follows:

  • If passed all tests, the final output flag assigned is 1 (Data passed all tests).
  • If passed major tests but failed minor tests, the final output flag assigned is 3 (Data probably bad. Failed minor tests.)
  • If failed major tests, the the final flag is 4 (Data bad. Failed major tests.)

Q2. How do I determine which tests have been applied to the data you downloaded?

In the accompanying metadata, there is a section called Data Quality Information that contains all the information regarding quality control for the requested data. Quality control test information is based on device and is listed, if available, along with the valid time period of the test as well as the values used in the formula. Also listed in this section are significant time gaps.

Q3. How do I know what range or threshold values are applied for each QAQC test?

There are multiple ways to determine the values that are used in the automatic QAQC tests, both available in Ocean's 2.0.

  • By specific sensor on a device: Each device has a listing page that displays a variety of information particular to that device. Under the sensors tab, each sensor has a similar listing page that detail derivation or calibration details as well as applied automatic QAQC tests. Example
  • By specific automatic QAQC test: Ocean's 2.0 has a QAQC Test Finder that lists the details for all the automatic QAQC tests.