The Big News This Week!

A dataset with over 80 million cases and 80 thousand judges was released to the public

Unless you’ve been hiding under a rock, you’ve probably heard of the massive dataset that was unleashed onto the open web just a few days ago.

This structured dataset released by DevDataLab containing 80 million cases from 2010–2018, 80 thousand judges: the near universe of Indian lower court cases is now open for all of us to see, build upon and use to improve the justice system!!!! We can explore charges, filing, hearing and decision dates, trial outcomes, and case type. What is equally incredible is that they have applied a neural network to identify the gender and religion of the accused, advocate, and judge for each case! No small feat! 

When we connected with Paul Novosad (Co-Founder of DevDataLab) about 3 weeks ago to discuss their paper on in-group bias among judges, he had mentioned they were sitting on this incredible volume and quality of data (a key by-product of their paper). What’s more, he emphatically stated that he was not going to wait till their paper gets published to release it (a standard practice in academic circles). This was the first time we had heard this and were very excited!

“You should have it with you before the alpha launch of Justice Hub”, said Paul.

And they sure have lived up to their promise! Here’s the 15-minute clip of my interview with Aditi today afternoon. Aditi Bhowmick is the India Director of DevDataLab.

TL;DR Version/Our key takeaways:

  1. We can hoard data, churn paper after paper and contribute to the discourse. But we will always be limited by our imagination and to limited circles. The possible benefits-public externalities, of opening up data, is more exciting. 

  2. Opening up datasets allows us to reduce upfront costs for others who don’t have our resources: students, researchers, journalists, think tanks, civil society, faculty. It allows them to apply contextual knowledge and go at it! 

  3. It does take proactive effort: funding, commitment, and human resources, to release easily usable data. Unfortunately, academic incentives are not aligned to open up data. The focus is more to incentivise the publication of papers. Funders and donors of research should start thinking of the positive externalities and encourage openness institutionally.

  4. It is not a thankless job to invest in opening data! The appetite for transparency and data-driven content has grown substantially, even in just the last 2 years. Eg: they are seeing a demand for data around in-group bias around caste and on bail.

  5. What can this structured dataset unlock? Possibilities include: 

    1. Introducing a data-oriented curriculum / program in law schools to equip students to work with data to better understand the justice system and see what they can create.

    2. Young researchers from across disciplines in India, could use it to analyse the data from very different perspectives. 

    3. Solutions to support policymakers to gain feedback and improve governance. Organizations can use this data to build smart dashboards as needed for government agencies. Government can also generate district level report cards on how pendency has evolved to better allocate resources. 

  6. What's next? As data for 2019 and 2020 becomes available, DevDataLab will look to add to it. But other than that they are working to create a state of the art gender platform! We cant wait! 

Supriya Sankaran

Leave a comment