The Best Netflix Movies & Series to Learn English According to Data Science | by Frank Andrade | Nov, 2020 | Towards Data Science

by learn a language journalist

As of 2020, there are about 3712 movies and 1845 TV shows available on Netflix. That’s a lot of content to choose from if you’re learning English, but probably you don’t have the time to watch all of them. That’s why I used my data science skills to analyze the transcripts of the top 1500 movies and TV shows available on Netflix. After so much work, I found the best content on Netflix to learn English with. The goal of this is to give you a lot of good options, so you can find movies or TV shows you like, which are also good for learning English, instead of forcing you to watch 1 TV show that you don’t like, but your teacher and friends insist you must watch because ‘it helps everyone learn English.’

To find the best movies and TV shows on Netflix, I compared the vocabulary used in dialogues. Before revealing the best content on Netflix for English learners, let’s compare the best and probably worst content, in case English isn’t your native language.

The Best and Worst Netflix Originals

The picture below shows the top and last 10 Netflix original movies in terms of difficulty in English vocabulary. As you can see, there’s a big difference in the vocabulary used between them. For example, you only need to know the most common 1,000 English words to understand 94,5% of words spoken in the movie Bird Box, but you need at least 3,000 words to cover 94.5% of the dialogue in the movie Spelling The Dream. Those extra 2,000 words may be the reason why you don’t understand what the characters say — even if you have an advanced English level!

Image for post

So let’s avoid watching those TV shows and movies with difficult vocabulary in the beginning. Instead, let’s start with the easiest and coolest content on Netflix!

The Best TV Shows on the Netflix Catalog

Naturally, American and British TV shows on Netflix are made for native English speakers. That’s why if English isn’t your native language, you might’ve had some difficulties understanding dialogues in some scenes. Fortunately, I ranked around 500 first-rate TV shows available on the Netflix catalog (223 Netflix Originals) by difficulty in vocabulary.

Because of its easy vocabulary, Friends is considered one of the best TV shows to learn English with. However, this TV show only ranks 78 in the Netflix catalog, which means that there are other 77 TV shows as good as Friends — or even better! available on Netflix to learn English and have fun with. For example, according to my findings, TV series like The End Of The F*ing World (rank 13) or 13 Reasons Why (rank 40) have an even simpler vocabulary in their episodes.

You can find how easy or hard the vocabulary is in your favorite TV show by searching them in the following box. You’ll find their rank and vocabulary coverage. TV shows in the top 10 have the easiest vocabulary in the whole Netflix catalog.

If you’re looking for the perfect TV show for your English level, then I have good news for you! I also ranked the shows for all English levels (beginner, intermediate, advanced). The shows furthest to the right use more vocabulary on each level. The higher the coverage, the easier for you to understand episodes from a TV show.

Remember that these are the top TV shows found in the Netflix catalog. Some TV shows you love are not available on Netflix but don’t worry; I already analyzed some of these TV shows like or . Moreover, the Netflix catalog may be slightly different in your country. That’s why I made a list of only Netflix originals that most likely would be available worldwide.

The following are the top 10 Netflix Original Shows to learn English with:

The Best Movies on the Netflix Catalog

If you’re more into movies, then Netflix has also great movies to learn English with. I ranked the most popular 950 movies on Netflix (173 Netflix originals) by difficulty in vocabulary. Some popular movies that ranked in the top 100 are Bird Box (30), Spiderman Into The Spiderverse (84) and The Pursuit Of Happyness (81).

Find what other movies are in the top 100 by playing with the box below. You can also discover the ranking of your favorite movie and vocabulary coverage!

You can find the perfect movie for your English level with the plot below. The movies furthest to the right have more vocabulary for beginner, intermediate and advanced level. Keep in mind that these are the top movies found in the Netflix catalog. You won’t find movies such as Harry Potter, Avatar or Toy Story on Netflix, but if you still plan to watch this kind of movie, you should check out my other article where I analyzed the most popular 3,000 movies. You can find it here.

I also made a list of Netflix original movies which most likely would be available worldwide, in case the Netflix catalog is different in your country.

The following are the top 10 Netflix Original movies to learn English with:

For this analysis, I used 3 main datasets that consist of transcripts, the Netflix catalog and a list of Netflix Originals. I googled until I find a good number of transcripts for the analysis. I used the catalog to match the transcripts with the titles available on Netflix. You can find the Netflix Catalog dataset on Kaggle. It consists of titles available on Netflix as of 2019, so probably some movies or TV shows aren’t available on Netflix today. Finally, I found here a list of Netflix originals released until 2020 that came in handy for the analysis.

I did all this analysis in Python and this is how I prepared the data:

Tokenization: To analyze the vocabulary in the transcripts, I tokenized all words spoken by characters. There are many tools for tokenization in Python, but I used CountVectorizer because it converts the transcripts collected to a dataframe of token counts, which simplifies the analysis. I explained a bit more about how the CountVectorizer works in the article where I analyzed 3000 movies.

Lemmatization: After tokenizing, I had to find the base form of each token. You can do this by using techniques like lemmatization, which you can find in the NLTK library. However, I used word-family lists that do a similar job but also give you the level of difficulty of each word based on its frequency. As of 2020, there are 29-word-family lists and you can find some of them here. These lists were evaluated on research papers topics related to linguistics and English learning as a second language.

Data Cleaning: I removed words that couldn’t be heard in the movies or episodes, such as scenes’ descriptions and speakers’ names. I also excluded transcripts whose dialogue had more than 3.5% words that didn’t match the word-family lists (they could be outliers or corrupted data).

All the code is available on Github!

If you enjoy reading this article, then you might like these too:

This content was originally published here.

Share this article

Leave a comment

Your email address will not be published. Required fields are marked *