As part of our Data Agenda project, CAHF and 71point4 are working to improve the credibility and replicability of housing finance data available. To this end, we have developed a Tidy Data Protocol to be applied to data collected in research commissioned and curated by CAHF. This document comprises the elements of the tidy data protocol, the definitions of an optimal dataset structure as well as discussing strategies for dealing with the most common types of obstacles in achieving the said structure. The protocol is consistent with the industry standard for data warehousing and follows “tidy data” best practices.
The principles of tidy data prescribe a standard way to organize data values within a dataset. A standard makes initial data cleaning easier because you do not need to start from scratch and reinvent the wheel every time. The tidy data standard has been designed to facilitate initial exploration and analysis of the data, and to simplify the development of data analysis tools that work well together.
The purpose of this document is therefore to provide guidelines to CAHF consultants for use in their data collection and warehousing, in order to ensure databases submitted to CAHF as part of CAHF research projects adhere to tidy data protocols.Download PDF