
In Random Deletion, we randomly delete a word if a uniformly generated number between 0 and 1 is smaller than a pre-defined threshold. To create a larger diversity of sentences, one could try to replace 1 word, then 2, then 3, and so on… Random Deletion (RD) This function can then be used in an apply function on a data frame for example. We randomly select n words, and replace them by their synonyms. choice ( list ( synonyms )) new_words = num_replaced += 1 if num_replaced >= n : #only replace up to n wordsīreak sentence = ' '. shuffle ( random_word_list ) num_replaced = 0 for random_word in random_word_list : synonyms = get_synonyms ( random_word ) if len ( synonyms ) >= 1 : synonym = random. copy () random_word_list = list ( set ()) random. We use WordNet, a large linguistic database, to identify relevant synonyms.ĭef synonym_replacement ( words, n ): words = words. Synonym replacement is a technique in which we replace a word by one of its synonyms.

The simple data augmentation techniques are the following: I’ll also introduce the EDA package which wraps all this code into a single library. The code is mostly from the EDA library, but extracting it and breaking it down is a good way to get used to those techniques.
#Random synonym how to#
In this article, we’ll go through the different data augmentation techniques and how to implement them by hand. When should we use Data Augmentation?ĭata Augmentation techniques in NLP show substantial improvements on datasets with less than 500 observations, as illustrated by the original paper.Ĭlassification accuracy can increase by as much as 3% if we create 16 augmented sentences per input sentence.

But can we achieve something similar with text? We’ll introduce “Easy Data Augmentation (EDA)”, a state-of-the-art paper that is both easy to understand and highly effective. It’s really helpful when we have a limited amount of data available. We create new images and add noise in input data by rotating, zooming or flipping images. Data Augmentation is a key element in Computer Vision.
