By: Biplav Srivastava, IBM Research, India
The COMAD Data Challenge Contest, “DataView 2016” (the “Contest” ), was the first of its kind contest to ask participants for insights from data government has put in public domain in everyday governance issues affecting the society. Traditionally, data mining and visualization contests have been targeted to marketing problems. The aim of the contest was to: (a) bring attention of machine learning community to open data, (b) show application of state-of-the-art techniques to societal and governance issues, and (c) promote new useful apps.
The contestants were asked 4 sets of questions relating to diseases afflicting public health, how much money was being spent over the years, which region (states) were doing better than others and on what diseases, and whether water pollution in rivers was correlated to prevalence of water-borne diseases. These questions should matter to stakeholders tackling public health and citizens looking for relief. Public datasets on diseases, government spending and water pollution from many years were identified for use by the contestants who could also use any other public data as long as they brought it to the attention of organizers and thence informed to all other teams.
The participants were required to analyze the identified datasets and answer the questions. There were many data related issues that made the task challenging – prominent are differing scales of data, inconsistency in nomenclature and semantics, missing data. Furthermore, they were required to develop and release an application through which public can explore further on these questions and beyond. See questions and datasets summarized in appendix and full details on contest site .
The first round of the contest attracted 6 serious submissions out of which 1 was designated the winner and 2 got honorable mentions. The second round had 3 working submissions. In arriving at the final results, the submissions of both rounds were considered by the organizers. The decision was based on quality of the analysis, the extent to which they answered the questions and any other insights, the usage of the data-sets identified and any others, the explanation of their approach, and the quality of the app. The results were further discussed with data.gov.in, which found all the finalists of exceptional value.
The winning teams and their submissions are:
1. iFuse:A Visual Data Fusion Approach, by Gunjan Sehgal, Kaushal Paneri, Aditeya Pandey, and Garima Gupta, TCS Research
(Username: Comad/ Password: Comad/ Dataset : Comad)
2a. Aniya Aggarwal, Mayur Saxena, Varun Parashar, Nishtha Madaan, IBM Research
The prize included a cash reward, citation, an opportunity to give a 10-minute short presentation and demonstration at COMAD. Further, the teams were encouraged to create a blog explaining their insights; approach and tools, and submit for publication on data.gov.in.
Let us congratulate these teams and all the participants, and see their work!
One may also observe that although the contest was open to all including students, none of the completed submissions had a student. We think that students at college levels are equally capable as professionals is tackling these problems, and hope they will get especially motivated to pick social sector for their data analysis explorations.
DataView 2016 was organized by Biplav Srivastava of IBM Research and Debtanu Dutta and Hemant Mittal of LatentView, with active help of COMAD 2016 conference organizers and D P Mishra at data.gov.in. Biplav is a researcher with over two decades of experience. Above is a personal opinion.
2. Contest facebook page: https://www.facebook.com/dataview2016
B. Appendix: Contest Scope
1. What diseases are most prevalent in a given area (e.g., state, district, city, by keyword)?
2. Which diseases have been better controlled than others in India? What states have done better than others? Are there approaches which have worked for controlling / reducing instances of diseases better than others?
3. How much money has been allocated to tackle specific diseases compared to others? Which regions do better than others in controlling diseases relative to money spent?
4. Is their a relationship between water-borne diseases and their relation to water pollution?
• H-DS-1: http://data.gov.in/catalog/number-cases-and-deaths-due-diseases , AllIndia (from 2000 to 2011) and State-wise (2010 and 2011) number of cases and deaths due to specified diseases (Acute Diarrhoeal Diseases, Malaria, Acute Respiaratory Infection, Japanese Encephalitis, Viral Hepatitis).
• H-DS-2: http://data.gov.in/catalog/cases-and-deaths-due-kala-azar , Cases and Deaths due to the illness Kala-Azar in Bihar, West Bengal and Country during the years 1996 till 2000.
• H-DS-3: https://data.gov.in/catalog/cases-and-deaths-due-japanese-encephalitis-and-dengue-dhf-during-tenth-plancases and deaths due to Japanese Encephalitis and Dengue / DHF during Tenth Plan.
• H-DS-4: https://data.gov.in/catalog/water-quality-affected-habitations, Water Quality Affected Habitations
• H-DS-5: Hospital Directory with Geo Code as on September 2015, https://data.gov.in/catalog/hospital-directory-national-health-portal
• F-DS-1: https://data.gov.in/catalog/outlays-and-expenditure-aids-control-programme-during-ninth-plan, outlays and expenditure of AIDS Control Programme during Ninth Plan.
• F-DS-2: https://data.gov.in/catalog/public-sector-outlaysexpenditure-during-eleventh-five-year-plan, public sector outlays and expenditures during Eleventh Five Year Plan (2007-12) under various Heads of Development (Rs. Crore).
• F-DS-3: http://data.gov.in/catalog/outlays-department-health-agreed-planning-commission-during-tenth-plan , data related to 9th Plan Allocation, 9th Plan Anticipated Expenditure, 10th Plan Allocation as Agreed by Planning Commission.
• F-DS-4: https://data.gov.in/catalog/percentage-share-household-expenditure-health-and-drugs-various-states-during-eleventh-five, data related to percentage share of household expenditure on health and drugs in various states during Eleventh Five Year Plan.
• F-DS-5: https://data.gov.in/catalog/state-wise-plan-outlays-and-expenditure, table provides state-wise plan outlays and expenditure during 2011-2012.
• F-DS-6: https://data.gov.in/catalog/outlay-tenth-plan-tenth-plan-sum-annual-outlay-and-tenth-plan-actual-expenditure-department, data related to Outlay Tenth Plan, Tenth Plan (200207) sum of Annual Outlay and Tenth Plan (2002-07) Actual Expenditure for Department of Health and Family Welfare.
• W-DS-1: https://data.gov.in/catalog/status-water-quality-india-2012, http://data.gov.in/catalog/number-cases-and-deaths-due-diseases , status of Water Quality in India in 2012
• W-DS-2: https://data.gov.in/catalog/status-water-quality-india-2008-and-2011, status of Water Quality in India – 2008 and 2011
1. APIs are available on data.gov.in to pull the data programmatically rather that doing download.
2. Contestants can use map data from any provider (e.g., Google, Microsoft, Open Street Map)
3. Contestants can use any other open dataset by informing the organizers ahead of time and taking approval. The same information will then be made available to all other contestants (We encourage participants to post all these information in the Dataview Facebook page). Data in itself will not give competitive advantage to any team. Using any paid data source is not permissible.