Skip to content

Access Data

 

WPDx Global Data Repository

The WPDx Global Data Repository includes all data uploaded to WPDx in a cleaned format.

  • Explore the full WPDx dataset in the WPDx Global Data Repository
    • Sort and filter data
    • Create your own sub-dataset based on location or other parameter of interest
    • Visualize data using charts, graphs and simple maps
    • Download data
  • What is included in WPDx Global Data Repository?
    • All data shared to WPDx is included in WPDx Global Data Repository. There are four data-cleaning steps which occur during the ingestion process:
      • De-duplication check to ensure that the same dataset is not inadvertently uploaded multiple times. Records found to be duplicates of existing records will not be added. Users can check to see how many of their records were accepted or identified as duplicates in the upload report portion of an individual data page. The de-duplication algorithm may evolve and be updated over time. For details on the current de-duplication algorithm, please click here.
      • Assignment of a wpdx_id. The wpdx_id allows for records from the same water point to be linked, regardless of the organization which may have provided the information. The wpdx_id assignment algorithm may evolve and be updated over time. For details on the current wpdx_id assignment algorithm, please click here.
      • Creation of “water_source_clean” and “water_tech_clean” columns. These new columns are created using fuzzy matching to organize entries into consistent categories. For more information on the cleaning process, please see here.
      • Formatting of entries for consistency. For example, for the Presence of Water When Assessed (#status_id) parameter, the repository will show “Yes” or “No”.
  • Explore the full WPDx repository in the WPDx-Basic online data playground.
    • Sort and filter data
    • Create your own sub-dataset based on location or other parameter of interest
    • Visualize data using charts, graphs and simple maps
    • Download data
  • What is included in WPDx-Basic?
    • All data shared to WPDx is included in WPDx-Basic. There are four data-cleaning steps which occur during the ingestion process:
      • De-duplication check to ensure that the same dataset is not inadvertently uploaded multiple times. Records found to be duplicates of existing records will not be added. Users can check to see how many of their records were accepted or identified as duplicates in the error report portion of an individual data page. The algorithm may evolve and be updated over time. For details on the current de-duplication algorithm, please click here.
      • Assignment of a wpdx_id. The wpdx_id allows for records from the same water point to be linked, regardless of the organization which may have provided the information. The algorithm may evolve and be updated over time. For details on the current wpdx_id assignment algorithm, please click here.
      • Creation of “water_source_clean” and “water_tech_clean” columns. These new columns are created using fuzzy matching to organize entries into consistent categories. For more information on the cleaning process, please see here.
      • Formatting of entries for consistency. For example, for any parameter which requires a binary response, the repository will show the answer as “TRUE/FALSE.”
  • Explore an enhanced subset of data on the WPDx+ data playground
  • The WPDx+ dataset is an enhanced version of the WPDx-Basic dataset for a subset of countries for which WPDx has enough data for the decision support tools to be activated. These enhancements can be completed for any country if a representative dataset can be shared with WPDx. For more information on how to add a new country to WPDx+, please email info@waterpointdata.org with “New Country Interest” in the subject line.
  • Enhancements include:
    • Additional data cleaning and addition of relevant parameters:
      • Addition of #adm1_clean, #adm2_clean, #adm3_clean, #country_name_clean, #likely_users to ensure consistent spellings, correct administrative division identification and gap filling. These parameters are added based on the record’s GPS coordinates. For more information on each of the additional parameters, please click on the parameter name.
      • Management_clean: a clean version of the management parameter similar to water_source_clean and water_tech_clean. For more details on the cleaning process, please go here.
    • Tabular access to results from advance decision support tools:
      • Prioritize Locations for Rehabilitation – which non-functional water point should be prioritized for repair based on the number of people currently unserved?
  • Which districts should be prioritized in terms of funding in order to reach the highest proportion of unserved people?

Latest News