logo

Data Science in Practice

Project guide, project guide ¶.

This is an edited version of the project guidelines used for the course.

If you wish to pursue an independent data science project, this outline may be a useful guide.

Project Overview ¶

The Final Project will give you the chance to explore a topic of your choice and to expand your analytical skills. By working with real data of your choosing you can examine questions of particular interest to you.

The broad objectives for the project are to:

Identify the problems and goals of a real situation and dataset.

Choose an appropriate approach for formalizing and testing the problems and goals, and be able to articulate the reasoning for that selection.

Implement your analysis choices on the dataset.

Interpret the results of the analyses.

Contextualize those results within a greater scientific and social context, acknowledging and addressing any potential issues related to privacy and ethics.

The basic project steps (broken down in more detail below):

Find a real world dataset and problem that you believe can be solved with one or more of the techniques we have learned in class.

After selecting a dataset and identifying the goal, write out a proposed analysis plan using template provided and submit it through GitHub for review.

Apply the techniques outlined and come up with a result for the dataset that you proposed.

Assemble a Jupyter notebook that communicates your hypothesis, methods, and results. Submit this as your final project.

Submit feedback about group and individual group members. This is done individually.

Project Components ¶

Project proposal ¶.

The project proposal includes the following sections:

RESEARCH QUESTION : What is your research question? Include the specific question you’re setting out to answer. This question should be specific, answerable with data, and clear. A general question with specific subquestions is permitted. (1-2 sentences)

BACKGROUND & PRIOR WORK : This section will present the background and context of your topic and question in a few paragraphs. Include a general introduction to your topic and then describe what information you currently know about the topic after doing your initial research. Include references to other projects who have asked similar questions or approached similar problems. Explain what others have learned in their projects.

Find some relevant prior work, and reference those sources, summarizing what each did and what they learned. Even if you think you have a totally novel question, find the most similar prior work that you can and discuss how it relates to your project.

References can be research publications, but they need not be. Blogs, GitHub repositories, company websites, etc., are all viable references if they are relevant to your project. It must be clear which information comes from which references. (2-3 paragraphs, including at least 2 references)

HYPOTHESIS : What is your main hypothesis/predictions about what the answer to your question is? Briefly explain your thinking. (2-3 sentences)

DATA : Here, you are to think about and describe the ideal dataset (or datasets) you you would need to answer this question:

What variables would you have?

How would they be stored?

How many observations would you have?

What/who would the observations be? Over what time period? etc.

Note: For the project proposal, you do NOT have to find the actual dataset(s) needed for your project. For the first checkpoint and onward, you will.

ETHICS & PRIVACY : Acknowledge and address any ethics & privacy related issues of your question(s), proposed dataset(s), and/or analyses. Use the information provided in lecture to guide your group discussion and thinking. If you need further guidance, check out Deon’s Ethics Checklist . In particular:

Are there any biases/privacy/terms of use issues with the data you propsed?

Are there potential biases in your dataset(s), in terms of who it composes, and how it was collected, that may be problematic in terms of it allowing for equitable analysis? (For example, does your data exclude particular populations, or is it likely to reflect particular human biases in a way that could be a problem?)

How will you set out to detect these specific biases before, during, and after/when communicating your analysis?

Are there any other issues related to your topic area, data, and/or analyses that are potentially problematic in terms of data privacy and equitable impact?

How will you handle issues you identified?

(1-2 paragraphs)

Project Proposal - Style Guidelines ¶

The proposal should be written clearly and at a level understandable by a typical undergraduate student.

This is a short but detailed proposal meant to give us time to assess and critique your Final Project idea (further described below), in order to give you time to improve upon it throughout the quarter.

Remember to proofread your Project Proposal. Do not use overly flowery and/or vague language.

Final Project ¶

Time to put it all together! The main products of the final project are 1) a report submitted as single Jupyter Notebook on GitHub and 2) a 3-5 minute video communicating your group project.

Final Report ¶

This single notebook should include all the code you used for all components of the project (cleaning, visualization, analysis). Because we won’t be running the code in your notebook, it is important to make sure your notebook as submitted to GitHub has the code evaluated and outputs present (e.g., plots) so that we can read the project as is.

Report Sections - Instructions ¶

Each of the following sections corresponds to a section in the file FinalProject_groupXXX.ipynb (template is in your group’s GitHub repo).

For sections included in your proposal and previous checkpoints, you can copy and paste into your final project, but be sure to edit these sections with feedback you received on your proposal or additional information you learned throughout the project. This report should read clearly from start to finish, explaining what you did, why you did it, and what you learned. This should be a concise and well-written report.

PERMISSIONS : Specify whether you want your group project to be made publicly available. Place an X in the square brackets where appropriate.

OVERVIEW : Include 3-4 sentences summarizing your group’s project and results.

NAMES : See proposal specifications.

RESEARCH QUESTION : See proposal specifications.

BACKGROUND & PRIOR WORK : See proposal specifications.

HYPOTHESIS : See proposal specifications.

DATASET(S) : Same as Checkpoint #1.

SETUP : See Checkpoint #1.

DATA CLEANING : See Checkpoint #1.

DATA ANALYSIS & RESULTS : This section should include markdown text and code walking us through the following:

EDA (Same as Checkpoint #2, but clean visualizations up and feel free to remove unecessary visualizations)

What distributions do your variables take?

Are there any outliers?

Relationship between variables?

Analysis (Note that you will likely have to do some Googling for analytical approaches not discussed in class. This is expected for this project and an important skill for a data scientist to master.)

What approaches did you use? Why?

What were the results?

What were your interpretation of these findings.

Data Visualization - There must be at least three (3) appropriate data visualizations throughout these sections. Each visualization must included an interpretation of what is displayed and what should be learned from that visualization. Be sure that the appropriate type of visualization is generated given the data that you have, axes are all labeled, and the visualizations clearly communicate the point you’re trying to make.

ETHICS & PRIVACY : See proposal specifications. (be sure to update with what you actually did to take the ethical considerations into account for the analysis you did!)

CONCLUSION & DISCUSSION : Discuss your project. Summarize your data and question. Briefly describe your analysis. Summarize your results and conclusions. Be sure to mention any limitations of your project. Discuss the impact of this work on society. (2-3 paragraphs)

Previous Final Projects ¶

See Prof. Voytek’s write-up of excellent class projects from the Spring 2017 instance of COGS 108 here , all of which received perfect scores.

Additionally, previous projects can be viewed from when this course ran in Spring 2017 , Winter 2018 , Spring 2019 , Fall 2019 , Winter 2020 , Spring 2020 , Fall 2020 , or Winter 2021 . Note first, that these projects are of variable quality and second, that if you get inspiration or code from previous projects, this must be noted in your project, giving attribution to the former groups’ work.

How to Find Datasets ¶

The purpose of this project is to find a real-world problem and dataset (or likely, datasets!) that can be analyzed with the techniques learned in class and those you learn on your own. It is imperative that by doing so you believe extra information will be gained — that you believe you can discover something new!

You must use at least one dataset containing at least approximately 1000 observations (if your data are smaller but you feel they are sufficient. You are welcome (and in fact recommended) to find multiple datasets!

The best datasets are the ones that can help you answer your question of interest.

Your question could be just for fun: Using text mining of song lyric websites to identify the most commonly used phrases and sentiments by decade.

Your question could be scientific: Scrape data from animal taxonomies and Wikipedia to figure out if larger animals are more likely to be carnivores?.

Or, ideally, your question can be aimed at civic or social good, for example, use mapping, transit, and car accident data to identify which parts of San Diego are most in need of dedicated bike lanes.

To help you find datasets, we have collected a list of websites that have a considerable number of open source data sets and included them at the end of this document.

Dataset Resource List ¶

Here, is a list of potential locations to find datasets and problems to investigate. If you have another dataset or search location, that is great!

Awesome Public Datasets

Data Is Plural

UCSD Datasets

Datasets | Deep Learning

Stanford | Social Science Data Collection

Eviction Lab (email required)

San Diego Data

Open Climate Data

Data and Story Library

UCSD behavioral mobile data

FiveThirtyEight

Free Datasets - R and Data Mining

Data Sources for Cool Data Science Projects

Natural Language Processing

Project Proposal

Data Science Projects for Python Practice

Author's photo

  • data science

Looking to start a data science career? Just as in any new field, you’ll need a lot of practice. Let’s explore where you can find data science projects to practice your newly acquired Python skills.

Organizations large and small all over the world use Python in their software development and data science projects. But even if you are very excited about a career in data science, it can seem very challenging to learn a new programming language. So you may wonder whether Python is worth learning and how difficult it is to learn a programming language like Python.

In fact, Python is very beginner-friendly; you can learn it pretty fast, especially with enough practice. In this article, I’ll guide you through several resources for practicing Python coding skills with real-world projects. But first, let’s start with some basic definitions.

What Is Data Science?

Data science combines programming, math, statistics, and business expertise to extract meaningful insights from data. Basically, data scientists are given business problems to be solved. They apply their understanding of industry and business processes, statistical and machine learning tools, and Python to solve the problems.

Data scientists work along with data engineers and data analysts to assist businesses with data-driven decisions. However, their roles are different:

  • Data engineers focus on preparing the infrastructure for the data. This data will later be used by data analysts and data scientists.
  • Data analysts usually work with structured data to spot trends and patterns that can be translated into actionable insights.
  • Data scientists are generally considered a more advanced version of a data analyst. They can work with both structured and unstructured data. They usually use more advanced data techniques to spot the current trends as well as make predictions about the future. Most data scientists are expected to be comfortable using advanced machine learning and Artificial Intelligence models.

Data science is a career of the future and Python is one of its key tools . Big tech companies, small startups, research organizations, and even academia choose Python because of its simplicity, rich ecosystem, large and supportive community, efficiency, and scalability.

Practice python

If you are new to programming but excited to learn coding with Python, I recommend trying our Python Basics mini-track. Its three interactive courses have 200+ coding challenges.

Once you are familiar with the basics, you can continue your learning journey with your first data science project.

How to Start Your First Data Science Project

For your first project, it’s a good idea to choose a topic that you’re interested in – it’s a great source of motivation. So think about what you’d find fun to work on: football statistics, climate change visualization, forecasting cryptocurrency prices, etc. You can find more data science project ideas here.

For example, let’s say you want to explore crime statistics in your city so you can choose the safest neighborhood to buy a house. You can consider lots of different factors, including the number of murders, robberies, car thefts, and other crimes per 1,000 people; the number of policemen per 1,000 people; average household income, etc. Here are just a few examples of what you can do using the data science toolkit:

  • Predict the number of different crimes based on the historical data (i.e. time series analysis).
  • Analyze which factors have the largest impact on the number of crimes.
  • Build a machine learning model to predict the number of crimes next year based on crime dynamics and other factors
  • Visualize the intensity of crimes on the city map.

Python can assist with all these tasks, including time series forecasting, exploratory data analysis, building machine learning models, visualizing data, and more. Data science and Python are really powerful together. However, you need to practice Python a lot to become an effective data scientist. Writing code for different scenarios and testing your skills with various projects and challenges is the shortest path to getting expertise in data science. So, let’s see where you can find real-world data science projects.

Where to Find Datasets and Sample Data Projects

There are numerous resources that offer real-world datasets to practice newly acquired Python and data science skills. Here are a few options:

  • LearnPython.com is a learning platform with many interactive Python courses, including Python Basics: Practice , which offers 15 coding exercises to practice basic programming skills. These exercises offer some problems that you are likely to encounter in real-world job assignments. However, this is not like your independent data science project, but rather a set of coding challenges. So, it is best for total newbies.
  • Kaggle is arguably the largest data science community. The platform has 50,000 public datasets, allowing you to practice all kinds of data science and Python skills. Some examples include a dataset to predict credit card defaults , sales information from the largest US retailers , World Bank data by region and nation , and data on all episodes of the TV show House . You can also grow your data science skills by participating in their regular competitions , which have difficulty levels from beginner to expert.
  • Data.gov provides access to the US government’s open data. This includes agriculture and climate data, resources on key energy topics, datasets for marine transportation, and more.
  • NASA Open Data Portal is a catalog of publicly available NASA datasets. It includes tens of thousands of datasets that cover a very wide range of topics, including national aeronautics and space data, physical oceanography, ocean biology data, earth resources observations, social-economic data, and more.
  • Earthdata can be a very useful source if you are interested in topics like atmosphere, land, ocean, cryosphere, and similar. Here, you’ll find NASA Earth observation data that was made available to a broad base of users.
  • DrivenData is a small-scale data competition website focusing on datasets and use cases from non-profit organizations .
  • Registry of Open Data on AWS includes over 300 datasets covering healthcare, space, climate change, and other topics.
  • UCI Machine Learning Repository is one of the oldest data sources on the Web. Even though many of the datasets on this platform are very old, they can still be good for practicing basic Python skills.
  • NASDAQ Data Link is a premier source of data for financial and economic projects. If you are interested in analyzing stock prices, trading activity, or interest rate dynamics, this should be your primary source of data.

It’s Time to Practice Python!

Hopefully, you’ll find your perfect dataset for your next data science project somewhere on the above list. However, if you feel you need to refresh and/or consolidate your Python skills  – or if you’re like me and prefer to learn Python with fun, easy-to-follow interactive online courses – you might want to start with one of the following learning tracks:

  • Python Basics is a mini-track perfect for people who just want to see if programming is for them. The track includes 229 coding challenges covering the basics of Python syntax, variables, and their purposes, if statements, loops, functions, and basic data structures (including lists, dictionaries, and sets). No prior programming or IT knowledge is required.
  • Python for Data Science is a 5-course learning track covering the essentials needed to start working in the field of data science. It includes hundreds of coding challenges covering basic calculations, simple data analyses, data visualizations, working with tabular and text data, and processing data from CSV, Excel, and JSON files. You can read more about this learning track here .
  • Learning Programming with Python is aimed at newcomers who want to understand foundational Python and then go beyond the basics and learn more advanced programming concepts. In addition to the Python basics described above, it covers data structures and built-in algorithms.

The constant (and long-term) demand for data scientists shows how popular this field is. Today’s companies and organizations prefer to make data-driven decisions, and they need data scientists for this. So, do your best to learn and practice Python for data science. Very soon, you'll have a successful and well-paid career as a data scientist.

Thanks for reading, and happy learning!

You may also like

final assignment python project for data science

How Do You Write a SELECT Statement in SQL?

final assignment python project for data science

What Is a Foreign Key in SQL?

final assignment python project for data science

Enumerate and Explain All the Basic Elements of an SQL Query

10 Real World Data Science Case Studies Projects with Example

Top 10 Data Science Case Studies Projects with Examples and Solutions in Python to inspire your data science learning in 2023.

10 Real World Data Science Case Studies Projects with Example

BelData science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare , education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses.  We have developed ten exciting data science case studies to explain how data science is leveraged across various industries to make smarter decisions and develop innovative personalized products tailored to specific customers.

data_science_project

Walmart Sales Forecasting Data Science Project

Downloadable solution code | Explanatory videos | Tech Support

Table of Contents

Data science case studies in retail , data science case study examples in entertainment industry , data analytics case study examples in travel industry , case studies for data analytics in social media , real world data science projects in healthcare, data analytics case studies in oil and gas, what is a case study in data science, how do you prepare a data science case study, 10 most interesting data science case studies with examples.

data science case studies

So, without much ado, let's get started with data science business case studies !

With humble beginnings as a simple discount retailer, today, Walmart operates in 10,500 stores and clubs in 24 countries and eCommerce websites, employing around 2.2 million people around the globe. For the fiscal year ended January 31, 2021, Walmart's total revenue was $559 billion showing a growth of $35 billion with the expansion of the eCommerce sector. Walmart is a data-driven company that works on the principle of 'Everyday low cost' for its consumers. To achieve this goal, they heavily depend on the advances of their data science and analytics department for research and development, also known as Walmart Labs. Walmart is home to the world's largest private cloud, which can manage 2.5 petabytes of data every hour! To analyze this humongous amount of data, Walmart has created 'Data Café,' a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters. The Walmart Labs team heavily invests in building and managing technologies like cloud, data, DevOps , infrastructure, and security.

ProjectPro Free Projects on Big Data and Data Science

Walmart is experiencing massive digital growth as the world's largest retailer . Walmart has been leveraging Big data and advances in data science to build solutions to enhance, optimize and customize the shopping experience and serve their customers in a better way. At Walmart Labs, data scientists are focused on creating data-driven solutions that power the efficiency and effectiveness of complex supply chain management processes. Here are some of the applications of data science  at Walmart:

i) Personalized Customer Shopping Experience

Walmart analyses customer preferences and shopping patterns to optimize the stocking and displaying of merchandise in their stores. Analysis of Big data also helps them understand new item sales, make decisions on discontinuing products, and the performance of brands.

ii) Order Sourcing and On-Time Delivery Promise

Millions of customers view items on Walmart.com, and Walmart provides each customer a real-time estimated delivery date for the items purchased. Walmart runs a backend algorithm that estimates this based on the distance between the customer and the fulfillment center, inventory levels, and shipping methods available. The supply chain management system determines the optimum fulfillment center based on distance and inventory levels for every order. It also has to decide on the shipping method to minimize transportation costs while meeting the promised delivery date.

Here's what valued users are saying about ProjectPro

user profile

Gautam Vermani

Data Consultant at Confidential

user profile

Savvy Sahai

Data Science Intern, Capgemini

Not sure what you are looking for?

iii) Packing Optimization 

Also known as Box recommendation is a daily occurrence in the shipping of items in retail and eCommerce business. When items of an order or multiple orders for the same customer are ready for packing, Walmart has developed a recommender system that picks the best-sized box which holds all the ordered items with the least in-box space wastage within a fixed amount of time. This Bin Packing problem is a classic NP-Hard problem familiar to data scientists .

Whenever items of an order or multiple orders placed by the same customer are picked from the shelf and are ready for packing, the box recommendation system determines the best-sized box to hold all the ordered items with a minimum of in-box space wasted. This problem is known as the Bin Packing Problem, another classic NP-Hard problem familiar to data scientists.

Here is a link to a sales prediction data science case study to help you understand the applications of Data Science in the real world. Walmart Sales Forecasting Project uses historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and you must build a model to project the sales for each department in each store. This data science case study aims to create a predictive model to predict the sales of each product. You can also try your hands-on Inventory Demand Forecasting Data Science Project to develop a machine learning model to forecast inventory demand accurately based on historical sales data.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Amazon is an American multinational technology-based company based in Seattle, USA. It started as an online bookseller, but today it focuses on eCommerce, cloud computing , digital streaming, and artificial intelligence . It hosts an estimate of 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Through its constant innovation in data science and big data Amazon is always ahead in understanding its customers. Here are a few data analytics case study examples at Amazon:

i) Recommendation Systems

Data science models help amazon understand the customers' needs and recommend them to them before the customer searches for a product; this model uses collaborative filtering. Amazon uses 152 million customer purchases data to help users to decide on products to be purchased. The company generates 35% of its annual sales using the Recommendation based systems (RBS) method.

Here is a Recommender System Project to help you build a recommendation system using collaborative filtering. 

ii) Retail Price Optimization

Amazon product prices are optimized based on a predictive model that determines the best price so that the users do not refuse to buy it based on price. The model carefully determines the optimal prices considering the customers' likelihood of purchasing the product and thinks the price will affect the customers' future buying patterns. Price for a product is determined according to your activity on the website, competitors' pricing, product availability, item preferences, order history, expected profit margin, and other factors.

Check Out this Retail Price Optimization Project to build a Dynamic Pricing Model.

iii) Fraud Detection

Being a significant eCommerce business, Amazon remains at high risk of retail fraud. As a preemptive measure, the company collects historical and real-time data for every order. It uses Machine learning algorithms to find transactions with a higher probability of being fraudulent. This proactive measure has helped the company restrict clients with an excessive number of returns of products.

You can look at this Credit Card Fraud Detection Project to implement a fraud detection model to classify fraudulent credit card transactions.

New Projects

Let us explore data analytics case study examples in the entertainment indusry.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Netflix started as a DVD rental service in 1997 and then has expanded into the streaming business. Headquartered in Los Gatos, California, Netflix is the largest content streaming company in the world. Currently, Netflix has over 208 million paid subscribers worldwide, and with thousands of smart devices which are presently streaming supported, Netflix has around 3 billion hours watched every month. The secret to this massive growth and popularity of Netflix is its advanced use of data analytics and recommendation systems to provide personalized and relevant content recommendations to its users. The data is collected over 100 billion events every day. Here are a few examples of data analysis case studies applied at Netflix :

i) Personalized Recommendation System

Netflix uses over 1300 recommendation clusters based on consumer viewing preferences to provide a personalized experience. Some of the data that Netflix collects from its users include Viewing time, platform searches for keywords, Metadata related to content abandonment, such as content pause time, rewind, rewatched. Using this data, Netflix can predict what a viewer is likely to watch and give a personalized watchlist to a user. Some of the algorithms used by the Netflix recommendation system are Personalized video Ranking, Trending now ranker, and the Continue watching now ranker.

ii) Content Development using Data Analytics

Netflix uses data science to analyze the behavior and patterns of its user to recognize themes and categories that the masses prefer to watch. This data is used to produce shows like The umbrella academy, and Orange Is the New Black, and the Queen's Gambit. These shows seem like a huge risk but are significantly based on data analytics using parameters, which assured Netflix that they would succeed with its audience. Data analytics is helping Netflix come up with content that their viewers want to watch even before they know they want to watch it.

iii) Marketing Analytics for Campaigns

Netflix uses data analytics to find the right time to launch shows and ad campaigns to have maximum impact on the target audience. Marketing analytics helps come up with different trailers and thumbnails for other groups of viewers. For example, the House of Cards Season 5 trailer with a giant American flag was launched during the American presidential elections, as it would resonate well with the audience.

Here is a Customer Segmentation Project using association rule mining to understand the primary grouping of customers based on various parameters.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

In a world where Purchasing music is a thing of the past and streaming music is a current trend, Spotify has emerged as one of the most popular streaming platforms. With 320 million monthly users, around 4 billion playlists, and approximately 2 million podcasts, Spotify leads the pack among well-known streaming platforms like Apple Music, Wynk, Songza, amazon music, etc. The success of Spotify has mainly depended on data analytics. By analyzing massive volumes of listener data, Spotify provides real-time and personalized services to its listeners. Most of Spotify's revenue comes from paid premium subscriptions. Here are some of the examples of case study on data analytics used by Spotify to provide enhanced services to its listeners:

i) Personalization of Content using Recommendation Systems

Spotify uses Bart or Bayesian Additive Regression Trees to generate music recommendations to its listeners in real-time. Bart ignores any song a user listens to for less than 30 seconds. The model is retrained every day to provide updated recommendations. A new Patent granted to Spotify for an AI application is used to identify a user's musical tastes based on audio signals, gender, age, accent to make better music recommendations.

Spotify creates daily playlists for its listeners, based on the taste profiles called 'Daily Mixes,' which have songs the user has added to their playlists or created by the artists that the user has included in their playlists. It also includes new artists and songs that the user might be unfamiliar with but might improve the playlist. Similar to it is the weekly 'Release Radar' playlists that have newly released artists' songs that the listener follows or has liked before.

ii) Targetted marketing through Customer Segmentation

With user data for enhancing personalized song recommendations, Spotify uses this massive dataset for targeted ad campaigns and personalized service recommendations for its users. Spotify uses ML models to analyze the listener's behavior and group them based on music preferences, age, gender, ethnicity, etc. These insights help them create ad campaigns for a specific target audience. One of their well-known ad campaigns was the meme-inspired ads for potential target customers, which was a huge success globally.

iii) CNN's for Classification of Songs and Audio Tracks

Spotify builds audio models to evaluate the songs and tracks, which helps develop better playlists and recommendations for its users. These allow Spotify to filter new tracks based on their lyrics and rhythms and recommend them to users like similar tracks ( collaborative filtering). Spotify also uses NLP ( Natural language processing) to scan articles and blogs to analyze the words used to describe songs and artists. These analytical insights can help group and identify similar artists and songs and leverage them to build playlists.

Here is a Music Recommender System Project for you to start learning. We have listed another music recommendations dataset for you to use for your projects: Dataset1 . You can use this dataset of Spotify metadata to classify songs based on artists, mood, liveliness. Plot histograms, heatmaps to get a better understanding of the dataset. Use classification algorithms like logistic regression, SVM, and Principal component analysis to generate valuable insights from the dataset.

Explore Categories

Below you will find case studies for data analytics in the travel and tourism industry.

Airbnb was born in 2007 in San Francisco and has since grown to 4 million Hosts and 5.6 million listings worldwide who have welcomed more than 1 billion guest arrivals in almost every country across the globe. Airbnb is active in every country on the planet except for Iran, Sudan, Syria, and North Korea. That is around 97.95% of the world. Using data as a voice of their customers, Airbnb uses the large volume of customer reviews, host inputs to understand trends across communities, rate user experiences, and uses these analytics to make informed decisions to build a better business model. The data scientists at Airbnb are developing exciting new solutions to boost the business and find the best mapping for its customers and hosts. Airbnb data servers serve approximately 10 million requests a day and process around one million search queries. Data is the voice of customers at AirBnB and offers personalized services by creating a perfect match between the guests and hosts for a supreme customer experience. 

i) Recommendation Systems and Search Ranking Algorithms

Airbnb helps people find 'local experiences' in a place with the help of search algorithms that make searches and listings precise. Airbnb uses a 'listing quality score' to find homes based on the proximity to the searched location and uses previous guest reviews. Airbnb uses deep neural networks to build models that take the guest's earlier stays into account and area information to find a perfect match. The search algorithms are optimized based on guest and host preferences, rankings, pricing, and availability to understand users’ needs and provide the best match possible.

ii) Natural Language Processing for Review Analysis

Airbnb characterizes data as the voice of its customers. The customer and host reviews give a direct insight into the experience. The star ratings alone cannot be an excellent way to understand it quantitatively. Hence Airbnb uses natural language processing to understand reviews and the sentiments behind them. The NLP models are developed using Convolutional neural networks .

Practice this Sentiment Analysis Project for analyzing product reviews to understand the basic concepts of natural language processing.

iii) Smart Pricing using Predictive Analytics

The Airbnb hosts community uses the service as a supplementary income. The vacation homes and guest houses rented to customers provide for rising local community earnings as Airbnb guests stay 2.4 times longer and spend approximately 2.3 times the money compared to a hotel guest. The profits are a significant positive impact on the local neighborhood community. Airbnb uses predictive analytics to predict the prices of the listings and help the hosts set a competitive and optimal price. The overall profitability of the Airbnb host depends on factors like the time invested by the host and responsiveness to changing demands for different seasons. The factors that impact the real-time smart pricing are the location of the listing, proximity to transport options, season, and amenities available in the neighborhood of the listing.

Here is a Price Prediction Project to help you understand the concept of predictive analysis which is widely common in case studies for data analytics. 

Uber is the biggest global taxi service provider. As of December 2018, Uber has 91 million monthly active consumers and 3.8 million drivers. Uber completes 14 million trips each day. Uber uses data analytics and big data-driven technologies to optimize their business processes and provide enhanced customer service. The Data Science team at uber has been exploring futuristic technologies to provide better service constantly. Machine learning and data analytics help Uber make data-driven decisions that enable benefits like ride-sharing, dynamic price surges, better customer support, and demand forecasting. Here are some of the real world data science projects used by uber:

i) Dynamic Pricing for Price Surges and Demand Forecasting

Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers. When the prices increase, the driver and the passenger are both informed about the surge in price. Uber uses a predictive model for price surging called the 'Geosurge' ( patented). It is based on the demand for the ride and the location.

ii) One-Click Chat

Uber has developed a Machine learning and natural language processing solution called one-click chat or OCC for coordination between drivers and users. This feature anticipates responses for commonly asked questions, making it easy for the drivers to respond to customer messages. Drivers can reply with the clock of just one button. One-Click chat is developed on Uber's machine learning platform Michelangelo to perform NLP on rider chat messages and generate appropriate responses to them.

iii) Customer Retention

Failure to meet the customer demand for cabs could lead to users opting for other services. Uber uses machine learning models to bridge this demand-supply gap. By using prediction models to predict the demand in any location, uber retains its customers. Uber also uses a tier-based reward system, which segments customers into different levels based on usage. The higher level the user achieves, the better are the perks. Uber also provides personalized destination suggestions based on the history of the user and their frequently traveled destinations.

You can take a look at this Python Chatbot Project and build a simple chatbot application to understand better the techniques used for natural language processing. You can also practice the working of a demand forecasting model with this project using time series analysis. You can look at this project which uses time series forecasting and clustering on a dataset containing geospatial data for forecasting customer demand for ola rides.

Explore More  Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

7) LinkedIn 

LinkedIn is the largest professional social networking site with nearly 800 million members in more than 200 countries worldwide. Almost 40% of the users access LinkedIn daily, clocking around 1 billion interactions per month. The data science team at LinkedIn works with this massive pool of data to generate insights to build strategies, apply algorithms and statistical inferences to optimize engineering solutions, and help the company achieve its goals. Here are some of the real world data science projects at LinkedIn:

i) LinkedIn Recruiter Implement Search Algorithms and Recommendation Systems

LinkedIn Recruiter helps recruiters build and manage a talent pool to optimize the chances of hiring candidates successfully. This sophisticated product works on search and recommendation engines. The LinkedIn recruiter handles complex queries and filters on a constantly growing large dataset. The results delivered have to be relevant and specific. The initial search model was based on linear regression but was eventually upgraded to Gradient Boosted decision trees to include non-linear correlations in the dataset. In addition to these models, the LinkedIn recruiter also uses the Generalized Linear Mix model to improve the results of prediction problems to give personalized results.

ii) Recommendation Systems Personalized for News Feed

The LinkedIn news feed is the heart and soul of the professional community. A member's newsfeed is a place to discover conversations among connections, career news, posts, suggestions, photos, and videos. Every time a member visits LinkedIn, machine learning algorithms identify the best exchanges to be displayed on the feed by sorting through posts and ranking the most relevant results on top. The algorithms help LinkedIn understand member preferences and help provide personalized news feeds. The algorithms used include logistic regression, gradient boosted decision trees and neural networks for recommendation systems.

iii) CNN's to Detect Inappropriate Content

To provide a professional space where people can trust and express themselves professionally in a safe community has been a critical goal at LinkedIn. LinkedIn has heavily invested in building solutions to detect fake accounts and abusive behavior on their platform. Any form of spam, harassment, inappropriate content is immediately flagged and taken down. These can range from profanity to advertisements for illegal services. LinkedIn uses a Convolutional neural networks based machine learning model. This classifier trains on a training dataset containing accounts labeled as either "inappropriate" or "appropriate." The inappropriate list consists of accounts having content from "blocklisted" phrases or words and a small portion of manually reviewed accounts reported by the user community.

Here is a Text Classification Project to help you understand NLP basics for text classification. You can find a news recommendation system dataset to help you build a personalized news recommender system. You can also use this dataset to build a classifier using logistic regression, Naive Bayes, or Neural networks to classify toxic comments.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Pfizer is a multinational pharmaceutical company headquartered in New York, USA. One of the largest pharmaceutical companies globally known for developing a wide range of medicines and vaccines in disciplines like immunology, oncology, cardiology, and neurology. Pfizer became a household name in 2010 when it was the first to have a COVID-19 vaccine with FDA. In early November 2021, The CDC has approved the Pfizer vaccine for kids aged 5 to 11. Pfizer has been using machine learning and artificial intelligence to develop drugs and streamline trials, which played a massive role in developing and deploying the COVID-19 vaccine. Here are a few data analytics case studies by Pfizer :

i) Identifying Patients for Clinical Trials

Artificial intelligence and machine learning are used to streamline and optimize clinical trials to increase their efficiency. Natural language processing and exploratory data analysis of patient records can help identify suitable patients for clinical trials. These can help identify patients with distinct symptoms. These can help examine interactions of potential trial members' specific biomarkers, predict drug interactions and side effects which can help avoid complications. Pfizer's AI implementation helped rapidly identify signals within the noise of millions of data points across their 44,000-candidate COVID-19 clinical trial.

ii) Supply Chain and Manufacturing

Data science and machine learning techniques help pharmaceutical companies better forecast demand for vaccines and drugs and distribute them efficiently. Machine learning models can help identify efficient supply systems by automating and optimizing the production steps. These will help supply drugs customized to small pools of patients in specific gene pools. Pfizer uses Machine learning to predict the maintenance cost of equipment used. Predictive maintenance using AI is the next big step for Pharmaceutical companies to reduce costs.

iii) Drug Development

Computer simulations of proteins, and tests of their interactions, and yield analysis help researchers develop and test drugs more efficiently. In 2016 Watson Health and Pfizer announced a collaboration to utilize IBM Watson for Drug Discovery to help accelerate Pfizer's research in immuno-oncology, an approach to cancer treatment that uses the body's immune system to help fight cancer. Deep learning models have been used recently for bioactivity and synthesis prediction for drugs and vaccines in addition to molecular design. Deep learning has been a revolutionary technique for drug discovery as it factors everything from new applications of medications to possible toxic reactions which can save millions in drug trials.

You can create a Machine learning model to predict molecular activity to help design medicine using this dataset . You may build a CNN or a Deep neural network for this data analyst case study project.

Access Data Science and Machine Learning Project Code Examples

9) Shell Data Analyst Case Study Project

Shell is a global group of energy and petrochemical companies with over 80,000 employees in around 70 countries. Shell uses advanced technologies and innovations to help build a sustainable energy future. Shell is going through a significant transition as the world needs more and cleaner energy solutions to be a clean energy company by 2050. It requires substantial changes in the way in which energy is used. Digital technologies, including AI and Machine Learning, play an essential role in this transformation. These include efficient exploration and energy production, more reliable manufacturing, more nimble trading, and a personalized customer experience. Using AI in various phases of the organization will help achieve this goal and stay competitive in the market. Here are a few data analytics case studies in the petrochemical industry:

i) Precision Drilling

Shell is involved in the processing mining oil and gas supply, ranging from mining hydrocarbons to refining the fuel to retailing them to customers. Recently Shell has included reinforcement learning to control the drilling equipment used in mining. Reinforcement learning works on a reward-based system based on the outcome of the AI model. The algorithm is designed to guide the drills as they move through the surface, based on the historical data from drilling records. It includes information such as the size of drill bits, temperatures, pressures, and knowledge of the seismic activity. This model helps the human operator understand the environment better, leading to better and faster results will minor damage to machinery used. 

ii) Efficient Charging Terminals

Due to climate changes, governments have encouraged people to switch to electric vehicles to reduce carbon dioxide emissions. However, the lack of public charging terminals has deterred people from switching to electric cars. Shell uses AI to monitor and predict the demand for terminals to provide efficient supply. Multiple vehicles charging from a single terminal may create a considerable grid load, and predictions on demand can help make this process more efficient.

iii) Monitoring Service and Charging Stations

Another Shell initiative trialed in Thailand and Singapore is the use of computer vision cameras, which can think and understand to watch out for potentially hazardous activities like lighting cigarettes in the vicinity of the pumps while refueling. The model is built to process the content of the captured images and label and classify it. The algorithm can then alert the staff and hence reduce the risk of fires. You can further train the model to detect rash driving or thefts in the future.

Here is a project to help you understand multiclass image classification. You can use the Hourly Energy Consumption Dataset to build an energy consumption prediction model. You can use time series with XGBoost to develop your model.

10) Zomato Case Study on Data Analytics

Zomato was founded in 2010 and is currently one of the most well-known food tech companies. Zomato offers services like restaurant discovery, home delivery, online table reservation, online payments for dining, etc. Zomato partners with restaurants to provide tools to acquire more customers while also providing delivery services and easy procurement of ingredients and kitchen supplies. Currently, Zomato has over 2 lakh restaurant partners and around 1 lakh delivery partners. Zomato has closed over ten crore delivery orders as of date. Zomato uses ML and AI to boost their business growth, with the massive amount of data collected over the years from food orders and user consumption patterns. Here are a few examples of data analyst case study project developed by the data scientists at Zomato:

i) Personalized Recommendation System for Homepage

Zomato uses data analytics to create personalized homepages for its users. Zomato uses data science to provide order personalization, like giving recommendations to the customers for specific cuisines, locations, prices, brands, etc. Restaurant recommendations are made based on a customer's past purchases, browsing history, and what other similar customers in the vicinity are ordering. This personalized recommendation system has led to a 15% improvement in order conversions and click-through rates for Zomato. 

You can use the Restaurant Recommendation Dataset to build a restaurant recommendation system to predict what restaurants customers are most likely to order from, given the customer location, restaurant information, and customer order history.

ii) Analyzing Customer Sentiment

Zomato uses Natural language processing and Machine learning to understand customer sentiments using social media posts and customer reviews. These help the company gauge the inclination of its customer base towards the brand. Deep learning models analyze the sentiments of various brand mentions on social networking sites like Twitter, Instagram, Linked In, and Facebook. These analytics give insights to the company, which helps build the brand and understand the target audience.

iii) Predicting Food Preparation Time (FPT)

Food delivery time is an essential variable in the estimated delivery time of the order placed by the customer using Zomato. The food preparation time depends on numerous factors like the number of dishes ordered, time of the day, footfall in the restaurant, day of the week, etc. Accurate prediction of the food preparation time can help make a better prediction of the Estimated delivery time, which will help delivery partners less likely to breach it. Zomato uses a Bidirectional LSTM-based deep learning model that considers all these features and provides food preparation time for each order in real-time. 

Data scientists are companies' secret weapons when analyzing customer sentiments and behavior and leveraging it to drive conversion, loyalty, and profits. These 10 data science case studies projects with examples and solutions show you how various organizations use data science technologies to succeed and be at the top of their field! To summarize, Data Science has not only accelerated the performance of companies but has also made it possible to manage & sustain their performance with ease.

FAQs on Data Analysis Case Studies

A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

To create a data science case study, identify a relevant problem, define objectives, and gather suitable data. Clean and preprocess data, perform exploratory data analysis, and apply appropriate algorithms for analysis. Summarize findings, visualize results, and provide actionable recommendations, showcasing the problem-solving potential of data science techniques.

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Navigation Menu

Instantly share code, notes, and snippets.

@FedeNR

FedeNR / Final assignment.ipynb

  • Download ZIP
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Embed Embed this gist in your website.
  • Share Copy sharable link for this gist.
  • Clone via HTTPS Clone using the web URL.
  • Learn more about clone URLs
  • Save FedeNR/aac0d88ad382ff2e9c92bd1ba918723e to your computer and use it in GitHub Desktop.

Final Project

Final project : a complete data science process.

final assignment python project for data science

The goal of this project is to practice a complete data science workflow: question, data, exploration, modeling, and communication. Results will be presented as a report, supporting code, and a brief presentation to the class. Successful outcomes should include visual, analytical, and perspectival components. The report should be at the level of polish and formality of a blog post (more than a class homework assignment, less than an academic paper). The overview and primary visualizations should be intelligible to a non-technical audience; the methods should be described in precise technical language as appropriate.

An important component of this project is a comparative critique of some related prior work. That is, you are not simply demonstrating that you can perform an analysis, but also that you can evaluate the strengths and weaknesses of others’ analyses. This may be the most important take-away from this class for your future career.

Teams : Both individual and team projects are permitted, with a mild encouragement towards teams.

Presentations : The final course meeting (during the designated final exam period) will be devoted to final project presentations. Feedback on others’ projects will be part of your final project grade, so attendance is mandatory.

There are two sections of this class, thus two sessions of final project presentations. Refer to the Calvin exam schedule for the official dates; for 2023, it’s Saturday Dec. 9 at 1:30 p.m. and Monday Dec. 11 at 1:30 p.m.. Your team must choose which to present at. You are welcome to attend both sessions to see everyone’s work.

Presentation Logistics

Presentations are the “movie trailer” for your report—they should be short, engaging, and give the audience a sense of what you did and why it matters.

Length: Max of 10 minutes per team (ideally under 5 minutes).  Practice at least once  beforehand so your presentation is efficient.

Make sure you include:

  • What  problem  you’re trying to solve (and why it matters)
  • What the  data  looks like (granularity, an example row)
  • What your  model  looks like (features, target, maybe an example prediction that it makes)
  • Results  (headlines only)
  • Conclusion  (have you solved the problem?) and any  limitations

Don’t  include:

  • Every detail (save that for the report)
  • Code screenshots (unless it’s particularly clever); explain what you’re doing in English
  • Data screenshots (export a table instead)
  • Console output screenshots (make a table instead, or better, a chart)

Scrolling through a rendered report is  only okay   if you have practiced  so you can highlight what’s important and not get lost in the details.

Each student should give at least one constructive suggestion or insightful question for other presentations.

  • A tentative topic (kind of data you want to work with, question you’d like to ask). Include a brief description of what drives your interest in it.
  • An example or two of some data science work you’ve found on the topic already. See below for some guidance on this. Include the URLs of resources you found, and some brief commentary about each one (does it seem good?)
  • A proposal of what you’d like to do . A first step will usually be to do an analysis similar to one of the examples you found, and you may not have much idea beyond that at this time. But if you do have some ideas for what you might want to do differently, this is a good place to include them. Note that your overall task should include some modeling (predictive or otherwise).
  • Tentative Teams : If you have already found people who may be interested in working with you, note that here.
  • You should have the EDA section of your report complete, and a good start on everything before that.
  • Milestone 3: Initial modeling and visualization
  • Milestone 4: Final report

Choosing a project and examples

You can find examples of data science work on sites like Kaggle, TowardsDataScience, Reddit, YouTube, and GitHub. Also check out the TidyTuesday project and r-bloggers.

Your project should include some modeling, so it should involve more than making visualizations. Predictive modeling is closest to the emphasis of this class, but other kinds of modeling (such as clustering) are often ok.

Your data should be rich enough to support the kind of modeling you want to do, but not too complex to work with. For example, avoid any dataset larger than 500 MB since it probably won’t fit. Also, avoid image and sound data for similar reasons. (Although if you really want to work with images, there are ways to do this.)

Some types of projects that have worked well in the past:

  • Participating in a Kaggle competition . It should be a competition that is still open. You don’t need to win, but you should be able to do better than the baseline. You should also do some EDA and/or visualization beyond what’s already been done by others.
  • Reproducing a published analysis . Find a published analysis that you can reproduce, and then extend it in some way. For example, you might find a published analysis that uses a dataset you can’t access, and then find a similar dataset that you can access. Or you might find a published analysis that uses a dataset you can access, and then extend it in some way (e.g., by adding a new feature, or by using a different model).

Some projects that could work great, but should be discussed with the instructor first:

  • Redoing some class activities using a different toolkit . For example, what if this class were to be taught in JavaScript using ObservableHQ ? (Talk with Prof Arnold for detailed ideas.)
  • Exploring interpretable models . There are some new types of models coming out that are more interpretable, but we have not gotten to discuss them much in class. Perhaps you could explore one of these models and compare it to a more traditional model.
  • Working with a large language model . For example, could you give ChatGPT some example rows from your dataset, then ask it to write a decision tree for you, and evaluate the result? There are losts of

Detailed Expectations

  • What decisions you made
  • Why you made them
  • What might have been alternative choices.
  • Don’t give the play-by-play of everything you tried, every idea you had, etc., but…
  • Do include things you tried that led to an important observation later on.
  • If you use any code from the Internet, you should acknowledge its source and provide a link.
  • You should submit all of the code needed to replicate your results, but your report should be understandable without looking at the code .
  • You’ll submit both slides and a report. See the Midterm project for submission instructions.

Your report should include the following general elements (though treat this specific outline as a suggestion only; certain reports will need to deviate from this structure in small or large part):

  • A succinct but descriptive title
  • This question should be stated in language that is understandable to someone who hasn’t studied data science and doesn’t know the details of your dataset.
  • The best questions include motivation from prior literature that gives, for example, some pattern or relationship that you’d expect to find and why.
  • A brief (2-4 sentences) high-level description of the dataset: what is the dataset about? Where did it come from? What sort of data does it contain?
  • A summary of what you have found that others have done with your data or question. Include URLs and author names.
  • What did they do well that is inspiring?
  • What could they improve on or explore further?
  • Do you trust their results? Why or why not?
  • In what ways do you intend for this project to extend or enhance that prior work? (Save the details of how for the Approach section.)
  • This question should be stated in more specific technical terms than the real-world question.
  • It should reference the particular features of your dataset.
  • This question ideally helps answer the real-world question, but it’s okay if it doesn’t.
  • The approach that you’ll take to answer that question, probably using some sort of predictive or statistical modeling.
  • where did the data come from originally?
  • Where did you download it from? As much as you can tell or speculate, how did it end up available there?
  • Give an example of some part of the data in your dataset.
  • Consider writing a simple sentence that conveys the information in the first row, as an example.
  • A list of the features in the dataset and their types.
  • An analysis of the appropriateness of your dataset for your approach. (What’s good about it? What do you wish were better?)
  • This section should also discuss the overall approach of any basic data wrangling needed to get the data into an overall usable form. More specific wrangling may be needed for constructing plots or models later.
  • Show plots or tables illustrating the distribution of at least two variables in your dataset. Comment on anything interesting you observe.
  • Show plots illustrating bivariate relationships for at least 2 pairs of variables. Comment on anything interesting you observe (e.g., strength of relationship, dependence on other factors).
  • Summarize your EDA findings: how do your observations inform the modeling?
  • This section is written for predictive modeling; if you’re doing inferential modeling or clustering, adapt this section as needed.
  • what is the target variable you are trying to predict
  • which variables (features) you are using to predict it, and why you chose those features
  • how you will measure accuracy (can you give meaningful units?)
  • what validation method did you choose and why
  • Describe why you chose that model (and its features and any hyperparameters)
  • Describe what kind of performance you expect from it
  • Report the results of your basic predictive model via cross-validation.
  • Make one or more changes to the predictive model to (attempt to) improve the accuracy. Discuss what changes you made, why you made them, and what the results were.
  • The strongest reports will include insightful visualizations of the model, its predictions, and/or its mistakes, and a discussion of what those plots tell us.
  • Report on the final accuracy of your best model on the test set, if applicable.
  • Alternative : instead of a supervised prediction task, you can define an unsupervised learning task and use clustering. In this case, clearly state what you want to understand through the clustering, and report your observations.
  • Findings : Summarize the analyses you performed and what the results told you. What do your findings say about the real-world and prediction (or clustering) questions you posed?
  • Limitations : What are some limitations of your analyses? Did you notice any potential biases in the data you used or analysis you did? Any other ethical questions raised during this project?
  • Future Directions : What new questions came up following your exploration of this data? Identify at least one question that would require new data or a new analysis approach, and specify what steps would be required.

You might additionally walk through what the model predicts, and how it does it, for one or two specific examples, ideally ones that aren’t even part of your dataset.

Report Template

This template includes a suggested outline for your report. You may choose to organize your report differently..

(Project descriptions originally thanks to Ofra Amir )

Interested in a verified certificate or a professional certificate ?

Final Project

Once you have solved each of the course’s problem sets, it’s time to implement your final project, a Python program of your very own! The design and implementation of your project is entirely up to you, albeit subject to these requirements:

  • Your project must be implemented in Python.
  • Your main function must be in a file called project.py , which should be in the “root” (i.e., top-level folder) of your project.
  • Your 3 required custom functions other than main must also be in project.py and defined at the same indentation level as main (i.e., not nested under any classes or functions).
  • Your test functions must be in a file called test_project.py , which should also be in the “root” of your project. Be sure they have the same name as your custom functions, prepended with test_ ( test_custom_function , for example, where custom_function is a function you’ve implemented in project.py ).
  • You are welcome to implement additional classes and functions as you see fit beyond the minimum requirement.
  • Implementing your project should entail more time and effort than is required by each of the course’s problem sets.
  • Any pip -installable libraries that your project requires must be listed, one per line, in a file called requirements.txt in the root of your project.

test_project.py

You are welcome, but not required, to collaborate with one or two classmates on your project. ( You might want to collaborate with Live Share !) But a two- or three-person should entail twice or thrice the time and effort required by a one-person project.

Note that CS50’s staff audits submissions to CS50P including this final project. Students found to be in violation of the Academic Honesty policy will be removed from the course and deemed ineligible for a certificate. Students who have already completed CS50P, if found to be in violation, will have their CS50 Certificate (and edX Certificate, if applicable) revoked.

When to Do It

By 2024-12-31T23:59:00-05:00 .

Getting Started

Creating an entire project may seem daunting. Here are some questions that you should think about as you start:

  • What will your software do? What features will it have? How will it be executed?
  • What new skills will you need to acquire? What topics will you need to research?
  • If working with one or two classmates, who will do what?
  • In the world of software, most everything takes longer to implement than you expect. And so it’s not uncommon to accomplish less in a fixed amount of time than you hope. What might you consider to be a good outcome for your project? A better outcome? The best outcome?

Consider making goal milestones to keep you on track.

How to Submit

You must complete all three steps!

Step 1 of 3

Create a short video (that’s no more than 3 minutes in length) in which you present your project to the world. Your video must begin with an opening section that displays:

  • your project’s title;
  • your GitHub and edX usernames;
  • your city and country;
  • and, the date you have recorded this video.

It should then go on to demonstrate your project in action, as with slides, screenshots, voiceover, and/or live action. See howtogeek.com/205742/how-to-record-your-windows-mac-linux-android-or-ios-screen for tips on how to make a “screencast,” though you’re welcome to use an actual camera. Upload your video to YouTube (or, if blocked in your country, a similar site) and take note of its URL; it’s fine to flag it as “unlisted,” but don’t flag it as “private.”

Submit this form !

Step 2 of 3

Create a README.md text file (named exactly that!) in your ~/project folder that explains your project. This file should include your Project title, the URL of your video (created in step 1 above) and a description of your project. You may use the below as a template.

If unfamiliar with Markdown syntax, you might find GitHub’s Basic Writing and Formatting Syntax helpful. If you are using the CS50 Codespace and are prompted to “Open in CS50 Lab”, you can simply press cancel to open in the Editor. You can also preview your .md file by clicking the ‘preview’ icon as explained here: Markdown Preview in vscode . Standard software project README s can often run into the thousands or tens of thousands of words in length; yours need not be that long, but should at least be several hundred words that describe things in detail!

Your README.md file should be minimally multiple paragraphs in length, and should explain what your project is, what each of the files you wrote for the project contains and does, and if you debated certain design choices, explaining why you made them. Ensure you allocate sufficient time and energy to writing a README.md that documents your project thoroughly. Be proud of it! A README.md in the neighborhood of 500 words is likely to be sufficient for describing your project and all aspects of its functionality. If unable to reach that threshold, that probably means your project is insufficiently complex.

Execute the submit50 command below from within your ~/project directory (or from whichever directory contains README.md file and your project’s code, which must also be submitted). If your project does not meet all the requirements above, it may be rejected, so be sure you have satisfied all of the bullet points atop this specification and written a thorough README :

If you encounter issues because your project is too large, try to ZIP all of the contents of that directory (except for README.md ) and then submit that instead. If still too large, try removing certain configuration files, reducing the size of your submission below 100MB, or try to upload directly using GitHub’s web interface by visiting github.com/me50/USERNAME (where USERNAME is your own GitHub username) and manually dragging and dropping folders, ensuring that when uploading you are doing so to your cs50/problems/2022/python/project branch, otherwise the system will not be able to check it!

Step 3 of 3

That’s it! Your project should be graded within a few minutes. Be sure to visit your gradebook at cs50.me/cs50p a few minutes after you submit. It’s only by loading your Gradebook that the system can check to see whether you have completed the course, and that is also what triggers the (instant) generation of your free CS50 Certificate and the (within 30 days) generation of the Verified Certificate from edX, if you’ve completed all of the other assignments.

This was CS50P!

IMAGES

  1. Python For Data Science

    final assignment python project for data science

  2. Your 101 Guide on How to learn Python Data Science

    final assignment python project for data science

  3. Python for Data Science and AI Coursera all week answers. #coursera

    final assignment python project for data science

  4. GitHub

    final assignment python project for data science

  5. NPTEL Python for Data Science Week 2 Quiz Assignment Solution

    final assignment python project for data science

  6. How to do Data Visualization in Python for Data Science

    final assignment python project for data science

VIDEO

  1. Assignment

  2. Day14

  3. Python For Data Science Assignment Solution

  4. Python For Data Science Assignment Solution

  5. Python For Data Science Week 3 || NPTEL Answers || My Swayam || Jan 2024

  6. Python For Data Science Week 4 || NPTEL Answers || My Swayam || Jan 2024

COMMENTS

  1. pavelporosoff/Python-Project-for-Data-Science-IBM-

    This is a final assignment from one of the IBM courses on Data Science via Coursera. This mini-course is intended to demonstrate foundational Python skills for working with data by developing a simple dashboard using Python.

  2. This is the final project of the course "Python Project for Data

    This is the final project of the course "Python Project for Data Science" from the IBM Data Science Professional Certificate - Final Assignment.ipynb

  3. Data_Science_Final_Project.ipynb

    Task 1: Load Data. Begin your data science exploration by selecting and loading a dataset from the pydataset library. The choice of dataset is crucial as it forms the basis of your analysis. Consider the dataset's size, complexity, and relevance to your interests. Once selected, use Python to load the dataset.

  4. joelcalm/Final-Assignment-Python-Project-for-Data-Science

    Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this assignment, you will extract some stock data, you will then display this data in a graph.

  5. Learner Reviews & Feedback for Python Project for Data Science Course

    Find helpful learner reviews, feedback, and ratings for Python Project for Data Science from IBM. Read stories and highlights from Coursera learners who completed Python Project for Data Science and wanted to share their experience. ... The final assignment took longer than 30min because you must find and learn the new codes required to ...

  6. Python Project for Data Science

    Module 1 • 8 hours to complete. In this module, you will demonstrate your skills in Python - the language of choice for Data Science and Data Analysis. You will apply Python fundamentals, Python data structures, and work with data in Python. By working on a real project, you will model a Data Scientist or Data Analyst's role, and build a ...

  7. Python Project For Data Science (IBM Coursera)

    Python Project For Data Science (IBM Coursera) As part of a startup investment firm, I'll be taking on the role of a data analyst. My mission is to gather crucial financial information—such as ...

  8. Project Guide

    The broad objectives for the project are to: Identify the problems and goals of a real situation and dataset. Choose an appropriate approach for formalizing and testing the problems and goals, and be able to articulate the reasoning for that selection. Implement your analysis choices on the dataset. Interpret the results of the analyses.

  9. Data Science Projects for Python Practice

    LearnPython.com is a learning platform with many interactive Python courses, including Python Basics: Practice, which offers 15 coding exercises to practice basic programming skills. These exercises offer some problems that you are likely to encounter in real-world job assignments. However, this is not like your independent data science project ...

  10. 5 Solved end-to-end Data Science Projects in Python

    1. Sentiment Analysis. The first project of this list is to build a machine learning model that predicts the sentiment of a movie review. Sentiment analysis is an NLP technique used to determine whether data is positive, negative, or neutral.

  11. IBM: Python for Data Science Project

    IBM: Python for Data Science Project. 4.2 stars. 19 ratings. This mini-course is intended for you to demonstrate foundational Python skills for working with data. 1 weeks. 4-5 hours per week. Self-paced. Progress at your own speed. Free.

  12. Python-Project-for-Data-Science-Coursera/Final Assignment ...

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.

  13. Hands-On Data Science Projects with Python

    With libraries such as NumPy, Pandas, Matplotlib, and scikit-learn, Python provides a robust environment for data manipulation, analysis, visualization, and machine learning. Let's embark on a ...

  14. Python Project for Data Science

    Python Project for Data Science - Final Assignment - Final Assignment.ipynb

  15. 20 Data Analytics Projects for All Levels

    Data Analytics Projects for Beginners. As a beginner, you need to focus on importing, cleaning, manipulating, and visualizing the data. Data Importing: learn to import the data using SQL, Python, R, or web scraping. Data Cleaning: use various Python and R libraries to clean and process the data.

  16. 200+ End to End Data Science Projects with Python Source Code

    This data science project uses Python to understand the technique of Market Basket Analysis used by product/service providing companies. You will learn how to use the Fprgrowth and Apriori algorithm to understand the method of associate-rule learning. Source Code: Market Basket Analysis using Apriori Algorithm.

  17. 190+ Data Science Projects You Can Try with Python

    Data Science Projects. Below is a list of Data Science projects with Python that you can try as a beginner. Each of the projects below is solved and explained using Python: Electric Vehicles Market Size Analysis. Music Recommendation System using Spotify API. Fashion Recommendation System using Image Features. User Profiling & Segmentation.

  18. 5 Python Projects for Data Science Portfolio

    In the five projects, we will learn to: Scrape live yahoo stock prices. Perform data analysis on Instagram post reach. Develop a web app for predicting flight prices. Perform time series analysis and forecasting. Build deep learning ASR model for low-resource language. 1.

  19. 10 Real World Data Science Case Studies Projects with Example

    Here are some of the real world data science projects used by uber: i) Dynamic Pricing for Price Surges and Demand Forecasting. Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers.

  20. Final assignment of the Python Basics for Data Science's course

    Created 4 years ago. Star 0. Fork 0. Final assignment of the Python Basics for Data Science's course. Raw. Final assignment.ipynb.

  21. DATA 202 (Wrangling and Analytics), Fall 2023

    Final Project: A complete Data Science process. The goal of this project is to practice a complete data science workflow: question, data, exploration, modeling, and communication. Results will be presented as a report, supporting code, and a brief presentation to the class. Successful outcomes should include visual, analytical, and perspectival ...

  22. Final Project

    Step 2 of 3. Create a README.md text file (named exactly that!) in your ~/project folder that explains your project. This file should include your Project title, the URL of your video (created in step 1 above) and a description of your project. You may use the below as a template. # YOUR PROJECT TITLE.

  23. anthonyrafaelnu/final-assignment-python-project-for-data-science

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.