- Español – América Latina
- Português – Brasil
- Tiếng Việt
TFDS now supports the Croissant 🥐 format ! Read the documentation to know more.
imdb_reviews
- Description :
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Additional Documentation : Explore on Papers With Code north_east
Homepage : http://ai.stanford.edu/~amaas/data/sentiment/
Source code : tfds.datasets.imdb_reviews.Builder
- 1.0.0 (default): New split API ( https://tensorflow.org/datasets/splits )
Download size : 80.23 MiB
Auto-cached ( documentation ): Yes
Supervised keys (See as_supervised doc ): ('text', 'label')
Figure ( tfds.show_examples ): Not supported.
imdb_reviews/plain_text (default config)
Config description : Plain text
Dataset size : 129.83 MiB
Feature structure :
- Feature documentation :
- Examples ( tfds.as_dataframe ):
imdb_reviews/bytes
Config description : Uses byte-level text encoding with tfds.deprecated.text.ByteTextEncoder
Dataset size : 129.88 MiB
imdb_reviews/subwords8k
Config description : Uses tfds.deprecated.text.SubwordTextEncoder with 8k vocab size
Dataset size : 54.72 MiB
imdb_reviews/subwords32k
Config description : Uses tfds.deprecated.text.SubwordTextEncoder with 32k vocab size
Dataset size : 50.33 MiB
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-10 UTC.
Movie Review Data
Sentiment polarity datasets.
- polarity dataset v2.0 ( 3.0Mb) (includes README v2.0 ): 1000 positive and 1000 negative processed reviews. Introduced in Pang/Lee ACL 2004. Released June 2004.
- Pool of 27886 unprocessed html files (81.1Mb) from which the polarity dataset v2.0 was derived. (This file is identical to movie.zip from data release v1.0.)
- sentence polarity dataset v1.0 (includes sentence polarity dataset README v1.0 : 5331 positive and 5331 negative processed sentences / snippets. Introduced in Pang/Lee ACL 2005. Released July 2005.
- polarity dataset v1.1 (2.2Mb) (includes README.1.1 ): approximately 700 positive and 700 negative processed reviews. Released November 2002. This alternative version was created by Nathan Treloar , who removed a few non-English/incomplete reviews and changing some of the labels (judging some polarities to be different from the original author's rating). The complete list of changes made to v1.1 can be found in diff.txt .
- polarity dataset v0.9 (2.8Mb) (includes a README ):. 700 positive and 700 negative processed reviews. Introduced in Pang/Lee/Vaithyanathan EMNLP 2002. Released July 2002. Please read the "Rating Information - WARNING" section of the README.
- movie.zip (81.1Mb) : all html files we collected from the IMDb archive.
Sentiment scale datasets
- Sep 30, 2009: Yanir Seroussi points out that due to some misformatting in the raw html files, six reviews are misattributed to Dennis Schwartz (29411 should be Max Messier, 29412 should be Norm Schrager, 29418 should be Steve Rhodes, 29419 should be Blake French, 29420 should be Pete Croatto, 29422 should be Rachel Gordon) and one (23982) is blank.
Subjectivity datasets
- subjectivity dataset v1.0 (508K) (includes subjectivity README v1.0 ): 5000 subjective and 5000 objective processed sentences. Introduced in Pang/Lee ACL 2004. Released June 2004.
- Pool of unprocessed source documents (9.3Mb) from which the sentences in the subjectivity dataset v1.0 were extracted. Note : On April 2, 2012, we replaced the original gzipped tarball with one in which the subjective files are now in the correct directory (so that the subjectivity directory is no longer empty; the subjective files were mistakenly placed in the wrong directory, although distinguishable by their different naming scheme).
If you have any questions or comments regarding this site, please send email to Bo Pang or Lillian Lee .
Datasets: ajaykarthick / imdb-movie-reviews like 4
Imdb movie reviews.
This is a dataset for binary sentiment classification containing substantially huge data. This dataset contains a set of 50,000 highly polar movie reviews for training models for text classification tasks.
The dataset is downloaded from
https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
This data is processed and splitted into training and test datasets (0.2% test split). Training dataset contains 40000 reviews and test dataset contains 10000 reviews.
Equal distribution among the labels in both training and test dataset. in training dataset, there are 20000 records for both positive and negative classes. In test dataset, there are 5000 records both the labels.
Citation Information
IMDB movie review sentiment classification dataset
Load_data function.
Loads the IMDB dataset .
This is a dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a list of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words".
As a convention, "0" does not stand for a specific word, but instead is used to encode the pad token.
- path : where to cache the data (relative to ~/.keras/dataset ).
- num_words : integer or None. Words are ranked by how often they occur (in the training set) and only the num_words most frequent words are kept. Any less frequent word will appear as oov_char value in the sequence data. If None, all words are kept. Defaults to None .
- skip_top : skip the top N most frequently occurring words (which may not be informative). These words will appear as oov_char value in the dataset. When 0, no words are skipped. Defaults to 0 .
- maxlen : int or None. Maximum sequence length. Any longer sequence will be truncated. None, means no truncation. Defaults to None .
- seed : int. Seed for reproducible data shuffling.
- start_char : int. The start of a sequence will be marked with this character. 0 is usually the padding character. Defaults to 1 .
- oov_char : int. The out-of-vocabulary character. Words that were cut out because of the num_words or skip_top limits will be replaced with this character.
- index_from : int. Index actual words with this index and higher.
- Tuple of Numpy arrays : (x_train, y_train), (x_test, y_test) .
x_train , x_test : lists of sequences, which are lists of indexes (integers). If the num_words argument was specific, the maximum possible index value is num_words - 1 . If the maxlen argument was specified, the largest possible sequence length is maxlen .
y_train , y_test : lists of integer labels (1 or 0).
Note : The 'out of vocabulary' character is only used for words that were present in the training set but are not included because they're not making the num_words cut here. Words that were not seen in the training set but are in the test set have simply been skipped.
get_word_index function
Retrieves a dict mapping words to their index in the IMDB dataset.
The word index dictionary. Keys are word strings, values are their index.
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications
IMDB Dataset of 50K Movie Reviews
nijatmammadov/review-classification-nlp
Folders and files, repository files navigation.
About Dataset IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms.
- Jupyter Notebook 100.0%
13 Best Movie data sets for Machine Learning Projects
July 21, 2021
After the year inside that was 2020, it’s safe to say that just about all of us are film buffs. That’s why we at iMerit have compiled this list of movie data sets for machine learning for the film buffs among us. These data sets are perfect for anyone looking to experiment and master basic machine learning concepts, and are decidedly more interesting than the typical data set one might leverage in such an endeavor.
Build your own proprietary movie dataset. Get a quote for an end-to-end data solution to your specific requirements.
The data that’s most useful for machine learning purposes contained within these data sets include cast and crew member information, script, plot, screen time, reviews, and more. Each of these can be leveraged for different machine learning purposes including natural language processing, sentiment analysis, and more.
Here are our iMerit’s top 13 movie data sets for machine learning basics.
Movie data sets for Machine Learning
IMDB Reviews : Ideal for sentiment analysis, this movie data set contains 5,000 movie reviews. The data set has a perfect 10 review in terms of usability by the nearly 7,000 people who’ve downloaded it, making it a perfect data set to test with.
IMDB Film Reviews data set : Designed for binary sentiment classification, this movie data set contains a substantial sum of data than the previous IMDB entry on this list. The data set contains 25,000 highly polar movie reviews for training with another 25,000 for testing. It also contains some unlabeled data and raw text for those looking to cut their teeth in annotation.
MovieLens 25M data set : Collected from the MovieLens website, this movie data set contains 25 million ratings along with one million tag applications that have been applied to over 62,000 movies.
OMDB API : This web service is a crowdsourced movie database that continuously updates with the most current movies. It contains content and images for various films including over 280,000 posters.
Film data set from UCI : Containing over 10,000 films, this movie data set was donated back in 1997 to the University of California, Irvine. It contains information around casting, roles, actors, writers, producers, cinematographers, remakes, and studios involved.
Cornell Film Review Data : Featuring movie-review data that’s perfect for anyone looking to conduct sentiment-analysis experiments, this body of data contains over 220,000 conversations between 10,000+ pairs of movie characters.
Full MovieLens data set on Kaggle : This movie data set contains metadata for the 45,000 films that are listed on the Full MovieLens Dataset. Information contained within pertains to films released on or before July 2017 that focuses on cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages. It also contains 26 million ratings from over 270,000 users for every film.
French National Cinema Center data sets : This data set focuses exclusively on french films gathered by the CNC (Centre National du Cinema) and features 33 data sets around movie attendance, television demand, cinematographic practices and establishments, blockbuster films, and more.
Linguistic Data of 32k Film Subtitles with IMBDb Meta-Data : Linguistic data from more than 32,000 films with all meta-data matched to word-count categories from subtitle files.
Movie Industry : This data repository includes 6820 movies (220 movies per year between 1986 and 2016). The following attributes have been intimately detailed from each film: budget, company, year, writer, star, cotes, score, runtime, reviews, release date, rating, name, gross, genre, director, and country.
Indian Movie Theaters : This data set features intimate knowledge surrounding Indian theaters and their corresponding theatre capacities, screen sizes, average ticket prices, and local coordinates.
Movie Body Counts : This data set contains a tally of the number of on-screen deaths, bodies, kills, and violent action across a slew of classic hollywood sci-fi, fantasy, and action films.
You might also like
Selecting data labeling tools doesn’t have to be hard – read these simple tips, what data quality means to the success of your ml models – 6 rules you need to follow, 3 best emerging solutions in data labeling – how to achieve both quality and speed.
Subscribe to our newsletter
- Awards & Recognition
- Compliance & Certifications
- Social Impact
- Privacy & Whistleblower Policy
- Environmental & Social Policy
- AI Ethics Policy
- [email protected]
- +1 (650) 777-7857
Sentiment Classification on the Large Movie Review Dataset
Data mining project, bert sentiment classification.
- Monticone Pietro
- Moroni Claudio
- Orsenigo Davide
Problem: Sentiment Classification
A sentiment classification problem consists, roughly speaking, in detecting a piece of text and predicting if the author likes or dislikes what he/she is talking about: the input X is a piece of text and the output Y is the sentiment we want to predict, such as the rating of a movie review.
If we can train a model to map X to Y based on a labelled dataset then it can be used to predict sentiment of a reviewer after watching a movie.
Data: Large Movie Review Dataset v1.0
The dataset contains movie reviews along with their associated binary sentiment polarity labels.
- The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets.
- The overall distribution of labels is balanced (25k pos and 25k neg).
- 50,000 unlabeled documents for unsupervised learning are included, but they won’t be used.
- The train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorizing movie-unique terms and their associated with observed labels.
- In the labeled train/test sets, a negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets.
- In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and ≤ 5.
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis . The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
Theoretical introduction
The encoder-decoder sequence.
Roughly speaking, an encoder-decoder sequence is an ordered collection of steps ( coders ) designed to automatically translate sentences from a language to another (e.g. the English “the pen is on the table” into the Italian “la penna è sul tavolo”), which could be useful to visualize as follows: input sentence → ( encoders ) → ( decoders ) → output/translated sentence .
For our practical purpose, encoders and decoders are effectively indistinguishable (that’s why we will call them coders ): both are composed of two layers: a LSTM or GRU neural network and an attention module (AM) . They only differ in the way in which their output is processed.
LSTM or GRU neural network
Both the input and the output of an LSTM/GRU neural network consists of two vectors:
- the hidden state : the representation of what the network has learnt about the sentence it’s reading;
- the prediction : the representation of what the network predicts (e.g. translation).
Each word in the English input sentence is translated into its word embedding vector (WEV) before being processed by the first coder (e.g. with word2vec ). The WEV of the first word of the sentence and a random hidden state are processed by the first coder of the sequence. Regarding the output: the prediction is ignored, while the hidden state and the WEV of the second word are passed as input into the second coder and so on to the last word of the sentence. Therefore in this phase the coders work as encoders .
At the end of the sequence of N encoders (N being the number of words in the input sentence), the decoding phase begins:
- the last hidden state and the WEV of the “START” token are passed to the first decoder ;
- the decoder outputs a hidden state and a prection;
- the hidden state and the prediction are passed to the second decoder;
- the second decoder outputs a new hidden state and the second word of the translated/output sentence
and so on up until the whole sentence has been translated, namely when a decoder of the sequence outputs the WEV of the “END” token. Then there is an external mechanism to convert prediction vectors into real words, so it’s very importance to notice that the only purpose of decoders is to predict the next word .
Attention module (AM)
The attention module is a further layer that is placed before the network which provides the collection of words of the sentence with a relational structure. Let’s consider the word “table” in the sentence used as an exampe above. Because of the AM, the encoder will weight the preposition “on” (processed by the previous encoder) more than the article “the” which refers to the subject “cat”.
Bidirectional Encoder Representations from Transformers (BERT)
Transformer.
The transformer is a coder endowed with the AM layer. Transformers have been observed to work much better than the basic encoder-decoder sequences.
BERT is a sequence of encoder-type transformers which was pre-trained to predict a word or sentence (i.e. used as decoder). The benefit of improved performance of Transformers comes at a cost: the loss of bidirectionality , which is the ability to predict both next word and the previous one. BERT is the solution to this problem, a Tranformer which preserves biderectionality .
The first token is not “START”. In order to use BERT as a pre-trained language model for sentence-classification, we need to input the BERT prediction of “CLS” into a linear regression because
- the model has been trained to predict the next sentence, not just the next word;
- the semantic information of the sentence is encoded in the prediction output of “CLS” as a document vector of 512 elements.
- bert_final_data
- https://www.kaggle.com/dataset/5f1193b4685a6e3aa8b72fa3fdc427d18c3568c66734d60cf8f79f2607551a38
- https://www.kaggle.com/dataset/9850d2e4b7d095e2b723457263fbef547437b159e3eb7ed6dc2e88c7869fca0b
- Bert-For-Tf2
- Google github repository
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- A Visual Guide to Using BERT for the First Time
- Machine Translation(Encoder-Decoder Model)!
- The Illustarted Tranformers
- The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
- BERT Explained: State of the art language model for NLP
- Learning Word Vectors for Sentiment Analysis .
Subscribe to the PwC Newsletter
Join the community, edit dataset, edit dataset tasks.
Some tasks are inferred based on the benchmarks list.
Add a Data Loader
Remove a data loader, edit dataset modalities, edit dataset languages, edit dataset variants.
The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset.
Add a new evaluation result row
Movie reviews (movie review polarity dataset enriched with "annotator rationales").
This dataset is based on the movie review polarity dataset (v2.0) collected and maintained by Bo Pang and Lillian Lee. Their dataset (we'll call it PL2.0) consists of 1000 positive and 1000 negative movie reviews obtained from the Internet Movie Database (IMDb) review archive.
The main contribution of this release is the enrichment of the documents with "annotator rationales," a concept we describe in our NAACL HLT 2007 paper.
Basically, "rationales" are segments of the text that support an annotator's classification. Let's say we have a movie review that is labeled as positive (i.e. the writer has a favorable opinion of the movie). Then the rationales would be segments of the text that support the claim (by an annotator) that the review is, indeed, positive.
Here are some examples of positive rationales (the segments enclosed by double square brackets):
- [[you will enjoy the hell out of]] American Pie.
- fortunately, they [[managed to do it in an interesting and funny way]].
- he is [[one of the most exciting martial artists on the big screen]], continuing to perform his own stunts and [[dazzling audiences]] with his flashy kicks and punches.
- the romance was [[enchanting]].
And here are some examples of negative rationales:
- A woman in peril. A confrontation. An explosion. The end. [[Yawn. Yawn. Yawn.]]
- when a film makes watching Eddie Murphy [[a tedious experience, you know something is terribly wrong]].
- the movie is [[so badly put together]] that even the most casual viewer may notice the [[miserable pacing and stray plot threads]].
- [[don't go see]] this movie
Benchmarks Edit Add a new result Link an existing benchmark
Dataset loaders edit add remove, similar datasets, tweet sentiment extraction, license edit, modalities edit, languages edit.
Help | Advanced Search
Computer Science > Artificial Intelligence
Title: detecting spoilers in movie reviews with external movie knowledge and user networks.
Abstract: Online movie review platforms are providing crowdsourced feedback for the film industry and the general public, while spoiler reviews greatly compromise user experience. Although preliminary research efforts were made to automatically identify spoilers, they merely focus on the review content itself, while robust spoiler detection requires putting the review into the context of facts and knowledge regarding movies, user behavior on film review platforms, and more. In light of these challenges, we first curate a large-scale network-based spoiler detection dataset LCS and a comprehensive and up-to-date movie knowledge base UKM. We then propose MVSD, a novel Multi-View Spoiler Detection framework that takes into account the external knowledge about movies and user activities on movie review platforms. Specifically, MVSD constructs three interconnecting heterogeneous information networks to model diverse data sources and their multi-view attributes, while we design and employ a novel heterogeneous graph neural network architecture for spoiler detection as node-level classification. Extensive experiments demonstrate that MVSD advances the state-of-the-art on two spoiler detection datasets, while the introduction of external knowledge and user interactions help ground robust spoiler detection. Our data and code are available at this https URL
Submission history
Access paper:.
- Other Formats
References & Citations
- Google Scholar
- Semantic Scholar
BibTeX formatted citation
Bibliographic and Citation Tools
Code, data and media associated with this article, recommenders and search tools.
- Institution
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
IMDb Charts
Imdb top 250 movies.
1. The Shawshank Redemption
2. The Godfather
3. The Dark Knight
4. The Godfather Part II
5. 12 Angry Men
6. Schindler's List
7. The Lord of the Rings: The Return of the King
8. Pulp Fiction
9. The Lord of the Rings: The Fellowship of the Ring
10. The Good, the Bad and the Ugly
11. Forrest Gump
12. The Lord of the Rings: The Two Towers
13. Fight Club
14. Inception
15. Star Wars: Episode V - The Empire Strikes Back
16. The Matrix
17. Goodfellas
18. One Flew Over the Cuckoo's Nest
20. Interstellar
21. It's a Wonderful Life
22. Seven Samurai
23. The Silence of the Lambs
24. Saving Private Ryan
25. Dune: Part Two
26. City of God
27. Life Is Beautiful
28. The Green Mile
29. Terminator 2: Judgment Day
30. Star Wars: Episode IV - A New Hope
31. Back to the Future
32. Spirited Away
33. The Pianist
34. Parasite
36. Spider-Man: Across the Spider-Verse
37. Gladiator
38. The Lion King
39. The Departed
40. Léon: The Professional
41. American History X
42. Whiplash
43. The Prestige
44. Grave of the Fireflies
45. Harakiri
46. The Usual Suspects
47. Casablanca
48. The Intouchables
49. Cinema Paradiso
50. Modern Times
51. Rear Window
52. Once Upon a Time in the West
54. City Lights
55. Django Unchained
56. Apocalypse Now
57. Memento
58. 12th Fail
60. Raiders of the Lost Ark
61. The Lives of Others
62. Sunset Boulevard
63. Avengers: Infinity War
64. Paths of Glory
65. Spider-Man: Into the Spider-Verse
66. Witness for the Prosecution
67. The Shining
68. The Great Dictator
70. Inglourious Basterds
71. The Dark Knight Rises
72. Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb
73. American Beauty
76. Amadeus
77. Toy Story
78. Das Boot
79. Avengers: Endgame
80. Braveheart
81. Good Will Hunting
82. Princess Mononoke
84. Your Name.
85. High and Low
86. Once Upon a Time in America
87. 3 Idiots
88. Singin' in the Rain
89. Capernaum
90. Come and See
91. Requiem for a Dream
92. Toy Story 3
93. Star Wars: Episode VI - Return of the Jedi
94. Eternal Sunshine of the Spotless Mind
95. The Hunt
96. 2001: A Space Odyssey
97. Oppenheimer
98. Reservoir Dogs
100. Lawrence of Arabia
101. The Apartment
102. North by Northwest
103. Incendies
104. Citizen Kane
106. Vertigo
107. Double Indemnity
108. Scarface
109. Full Metal Jacket
110. Amélie
112. A Clockwork Orange
114. To Kill a Mockingbird
115. A Separation
116. The Sting
117. Indiana Jones and the Last Crusade
118. Die Hard
119. Metropolis
120. Like Stars on Earth
121. Snatch
122. Hamilton
123. L.A. Confidential
125. Bicycle Thieves
126. Taxi Driver
127. Downfall
128. Dangal
129. For a Few Dollars More
130. Batman Begins
131. The Wolf of Wall Street
132. Some Like It Hot
133. Green Book
134. The Kid
135. The Father
136. Judgment at Nuremberg
137. The Truman Show
138. All About Eve
139. Top Gun: Maverick
140. Shutter Island
141. There Will Be Blood
142. Casino
143. Jurassic Park
145. The Sixth Sense
146. Pan's Labyrinth
147. Unforgiven
148. No Country for Old Men
149. A Beautiful Mind
150. The Thing
151. Kill Bill: Vol. 1
152. The Treasure of the Sierra Madre
153. Yojimbo
154. Monty Python and the Holy Grail
155. The Great Escape
156. Finding Nemo
157. Prisoners
158. Rashomon
159. Howl's Moving Castle
160. The Elephant Man
161. Chinatown
162. Dial M for Murder
163. Gone with the Wind
164. V for Vendetta
165. Lock, Stock and Two Smoking Barrels
166. The Secret in Their Eyes
167. Inside Out
168. Raging Bull
169. Three Billboards Outside Ebbing, Missouri
170. Trainspotting
171. The Bridge on the River Kwai
174. Spider-Man: No Way Home
175. Catch Me If You Can
176. Warrior
177. Gran Torino
178. My Neighbor Totoro
179. Million Dollar Baby
180. Harry Potter and the Deathly Hallows: Part 2
181. Children of Heaven
182. 12 Years a Slave
183. Blade Runner
184. Before Sunrise
185. Ben-Hur
186. Barry Lyndon
187. The Grand Budapest Hotel
188. Gone Girl
189. Hacksaw Ridge
190. The Gold Rush
191. Memories of Murder
192. In the Name of the Father
193. Dead Poets Society
194. Mad Max: Fury Road
195. Wild Tales
196. The Deer Hunter
197. The General
198. On the Waterfront
199. Monsters, Inc.
200. Sherlock Jr.
202. How to Train Your Dragon
203. The Third Man
204. The Wages of Fear
205. Wild Strawberries
206. Mary and Max
207. Mr. Smith Goes to Washington
208. Ratatouille
209. Ford v Ferrari
210. Tokyo Story
211. The Big Lebowski
213. The Seventh Seal
216. Spotlight
217. Hotel Rwanda
218. The Terminator
219. Platoon
220. The Passion of Joan of Arc
221. La haine
222. Before Sunset
223. The Best Years of Our Lives
224. Pirates of the Caribbean: The Curse of the Black Pearl
225. The Exorcist
227. Network
228. Jai Bhim
229. Stand by Me
230. The Wizard of Oz
231. The Incredibles
232. Hachi: A Dog's Tale
233. The Handmaiden
234. Into the Wild
235. My Father and My Son
236. The Sound of Music
237. The Battle of Algiers
238. To Be or Not to Be
239. The Grapes of Wrath
240. Groundhog Day
241. Amores Perros
242. Rebecca
243. The Iron Giant
244. Cool Hand Luke
245. The Help
246. It Happened One Night
247. Aladdin
248. Drishyam
249. Dances with Wolves
250. Gangs of Wasseypur
The Top Rated Movie list only includes feature films.
- Shorts, TV movies, and documentaries are not included
- The list is ranked by a formula which includes the number of ratings each movie received from users, and value of ratings received from regular users
- To be included on the list, a movie must receive ratings from at least 25000 users
More to explore
Top box office (us), most popular movies, top rated english movies, most popular tv shows, top 250 tv shows, lowest rated movies, most popular celebs.
- by Rebecca Rubin
- Variety - Film News
- by Richard Natale and Tim Gray
- Variety - TV News
- by Martin Miller
- The Playlist
- ScreenDaily
- by Sandy Schaefer
Top Rated Movies by Genre
Recently viewed.
IMAGES
VIDEO
COMMENTS
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10.
IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and ...
Sentiment Analysis. Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
imdb_reviews. Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Movie Review Data This page is a distribution site for movie-review data for use in sentiment-analysis experiments. Available are collections of movie-review documents labeled with respect to their overall sentiment polarity (positive or negative) or subjective rating (e.g., "two and a half stars") and sentences labeled with respect to their subjectivity status (subjective or objective) or ...
Movie Review Dataset. This is a dataset of containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. This data was first used in Bo Pang and Lillian Lee, ``Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.'', Proceedings of the ACL, 2005. ...
If the issue persists, it's likely a problem on our side. Unexpected token < in JSON at position 4. keyboard_arrow_up. content_copy. SyntaxError: Unexpected token < in JSON at position 4. Refresh. Movie Review Dataset.
Training dataset contains 40000 reviews and test dataset contains 10000 reviews. Equal distribution among the labels in both training and test dataset. in training dataset, there are 20000 records for both positive and negative classes. In test dataset, there are 5000 records both the labels.
This dataset contains nearly 1 Million unique movie reviews from 1150 different IMDb movies spread across 17 IMDb genres - Action, Adventure, Animation, Biography, Comedy, Crime, Drama, Fantasy, History, Horror, Music, Mystery, Romance, Sci-Fi, Sport, Thriller and War. The dataset also contains movie metadata such as date of release of the movie, run length, IMDb rating, movie rating (PG-13, R ...
Loads the IMDB dataset. This is a dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a list of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most ...
About Dataset IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of ...
The reviews were originally released in 2002, but an updated and cleaned up version was released in 2004, referred to as "v2.0". The dataset is comprised of 1,000 positive and 1,000 negative movie reviews drawn from an archive of the rec.arts.movies.reviews newsgroup hosted at IMDB. The authors refer to this dataset as the "polarity ...
Learn about 13 movie data sets for machine learning basics, such as IMDB Reviews, MovieLens 25M, and Film data set from UCI. These data sets contain cast, crew, plot, ratings, reviews, and more for various films and genres.
Dataset Description. The IMDb dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative (this is the polarity). The dataset contains of an even number of positive and negative reviews (balanced). Only highly polarizing reviews are considered.
The dataset contains movie reviews along with their associated binary sentiment polarity labels. The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets. The overall distribution of labels is balanced (25k pos and 25k neg). 50,000 unlabeled documents for unsupervised learning are included, but they won't be used.
The dataset is the Large Movie Review Dataset, often referred to as the IMDB dataset. The IMDB dataset contains 25,000 highly polar movie reviews (good or bad) for training and the same amount again for testing. The problem is to determine whether a given movie review has a positive or negative sentiment.
This dataset is based on the movie review polarity dataset (v2.0) collected and maintained by Bo Pang and Lillian Lee. Their dataset (we'll call it PL2.0) consists of 1000 positive and 1000 negative movie reviews obtained from the Internet Movie Database (IMDb) review archive. The main contribution of this release is the enrichment of the documents with "annotator rationales," a concept we ...
The IMDB Dataset. The IMDB dataset is a set of 50,000 highly polarized reviews from the Internet Movie Database. They are split into 25000 reviews each for training and testing. Each set contains an equal number (50%) of positive and negative reviews. The IMDB dataset comes packaged with Keras.
The Large Movie Review Dataset (often referred to as the IMDB dataset) contains 25,000 highly-polar movie reviews (good or bad) for training and the same amount again for testing. The problem is to determine whether a given movie review has a positive or negative sentiment. The data was collected by Stanford researchers and was used in a 2011 ...
Explore and run machine learning code with Kaggle Notebooks | Using data from IMDB Dataset of 50K Movie Reviews. code. New Notebook. table_chart. New Dataset. tenancy. New Model. emoji_events. New Competition. corporate_fare. New Organization. No Active Events. Create notebooks and keep track of their status here. add New Notebook. auto_awesome ...
Explore different ways to pass in new reviews to generate predictions. Parametrize options such as where to save and load trained models, whether to skip training or train a new model, and so on. This project uses the Large Movie Review Dataset, which is maintained by Andrew Maas. Thanks to Andrew for making this curated dataset widely ...
In light of these challenges, we first curate a large-scale network-based spoiler detection dataset LCS and a comprehensive and up-to-date movie knowledge base UKM. We then propose MVSD, a novel Multi-View Spoiler Detection framework that takes into account the external knowledge about movies and user activities on movie review platforms.
Discover the best movies of all time according to IMDb voters. Browse the top 250 list and find your favorites.