December 5, 2020

Text Data Augmentation using NLP Cloud APIs | by Prakhar Mishra | Nov, 2020

Using apis like spaCy, WordNet, SyntaxNet, etc

Data augmentation techniques are used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. You will almost never find enough organic data to train Deep Learning models. Also Hiring people for annotations is costly. One very good alternative is to generate synthetic data by analyzing patterns in existing ones.

This document “Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs from Claude Coulombe discusses many techniques for data augmentation on textual data using various nlp cloud apis such as spaCy, google neural machine translation, wordnet, nltk, etc.

Results show significant improvement in classification accuracy after text augmentation on IMDB review dataset for Sentiment Analysis.

I recently made a walk-through of this document which can be found below —

