5 open source tools to supercharge your data workflow!
Do you spend hours manually copying data from PDFs or images to excel? Does your computer crash while working on large excel files?
Here are 5 powerful free tools, I recommend, for your repertoire to supercharge your data practice:
Tabula:If you have tried to look for datasets in an Indian context, it is very likely that you have found yourself manually copying the data from PDFs to an excel sheet. This must-have resulted in hours of delay before even beginning with your analysis. Tabula is an open-source tool that provides an easy-to-use interface to extract data from PDFs. It is available for Windows, Mac, and Linux for free. Though the project is volunteer-run it is quite powerful and currently being used by leading news organisations including ProPublica and New York Times. Liberate the data from the shackles of PDFs!
Tesseract: What’s worse than having data captured in PDFs? Data captured in an image. Worry not! Optical Character Recognition (OCR) is a technology that extracts text data from images. Tesseract is a neural net-based OCR engine that was originally developed at Hewlett-Packard. The project was open-sourced in 2005 and since 2006 has been developed by Google. Tesseract helps extract text from images and non-computer-generated PDFs. The lovely open-source community gives you a plethora of options of tools built on top of Tesseract to choose from for your work. Choose a package that suits you from this link. Normcap is one such tool built on top of tesseract
Tad: Dealing with large tabular data is a pain if not equipped with the right set of tools. Tad viewer helps you to explore large tabular data files easily on your desktop/laptop without needing a sophisticated infrastructure. Tad allows you to filter, pivot, and sort data with a lightweight GUI that utilises your computing resource to manipulate data while giving you a lightweight clean interface! Download Tad viewer using this link.
Tableau Public: Unlike most of the proprietary data visualisation softwares the free version of Tableau does not lack any feature compared to its paid version. The only caveat is that whatever you build using Tableau public must be publicly available. Tableau public does this with a little more elegance by giving you a personalised page to host your data visualization stories. Download this feature-rich free visualisation tool through this link.
JASP: Started as a free and open-source alternative to proprietary statistical analysis software packages, JASP is built to be easy and intuitive for statistical analysis. JASP is supported by the University of Amsterdam and does not collect any user data whatsoever. JASP offers standard analysis procedures in both classical and Bayesian form. Download JASP and get your stats right by clicking here.
Since now you have all the tools to extract, visualise and analyse data try your hands on some messy data sources. Here is a dataset from Justice Hub for you to play with. Do let us know about your favorite tool in the comment below.
And finally, if you haven’t already subscribed to our Substack, click the subscribe button below for more updates on the Justice Hub and interesting stories from the world of justice data.