Chapman University Digital Commons

Home > Dissertations and Theses > Computational and Data Sciences (PhD) Dissertations

Computational and Data Sciences (PhD) Dissertations

Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons. Additional dissertations from years prior to 2019 are available through the Leatherby Libraries' print collection or in Proquest's Dissertations and Theses database.

Dissertations from 2024 2024

A Novel Correction for the Multivariate Ljung-Box Test , Minhao Huang

Machine Learning and Geostatistical Approaches for Discovery of Weather and Climate Events Related to El Niño Phenomena , Sachi Perera

Global to Glocal: A Confluence of Data Science and Earth Observations in the Advancement of the SDGs , Rejoice Thomas

Dissertations from 2023 2023

Computational Analysis of Antibody Binding Mechanisms to the Omicron RBD of SARS-CoV-2 Spike Protein: Identification of Epitopes and Hotspots for Developing Effective Therapeutic Strategies , Mohammed Alshahrani

Integration of Computer Algebra Systems and Machine Learning in the Authoring of the SANYMS Intelligent Tutoring System , Sam Ford

Voluntary Action and Conscious Intention , Jake Gavenas

Random Variable Spaces: Mathematical Properties and an Extension to Programming Computable Functions , Mohammed Kurd-Misto

Computational Modeling of Superconductivity from the Set of Time-Dependent Ginzburg-Landau Equations for Advancements in Theory and Applications , Iris Mowgood

Application of Machine Learning Algorithms for Elucidation of Biological Networks from Time Series Gene Expression Data , Krupa Nagori

Stochastic Processes and Multi-Resolution Analysis: A Trigonometric Moment Problem Approach and an Analysis of the Expenditure Trends for Diabetic Patients , Isaac Nwi-Mozu

Applications of Causal Inference Methods for the Estimation of Effects of Bone Marrow Transplant and Prescription Drugs on Survival of Aplastic Anemia Patients , Yesha M. Patel

Causal Inference and Machine Learning Methods in Parkinson's Disease Data Analysis , Albert Pierce

Causal Inference Methods for Estimation of Survival and General Health Status Measures of Alzheimer’s Disease Patients , Ehsan Yaghmaei

Dissertations from 2022 2022

Computational Approaches to Facilitate Automated Interchange between Music and Art , Rao Hamza Ali

Causal Inference in Psychology and Neuroscience: From Association to Causation , Dehua Liang

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues , Hanna Lu

Novel Techniques for Quantifying Secondhand Smoke Diffusion into Children's Bedroom , Sunil Ramchandani

Probing the Boundaries of Human Agency , Sook Mun Wong

Dissertations from 2021 2021

Predicting Eye Movement and Fixation Patterns on Scenic Images Using Machine Learning for Children with Autism Spectrum Disorder , Raymond Anden

Forecasting the Prices of Cryptocurrencies using a Novel Parameter Optimization of VARIMA Models , Alexander Barrett

Applications of Machine Learning to Facilitate Software Engineering and Scientific Computing , Natalie Best

Exploring Behaviors of Software Developers and Their Code Through Computational and Statistical Methods , Elia Eiroa Lledo

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis , Arin Ghazarian

Multi-Modal Data Fusion, Image Segmentation, and Object Identification using Unsupervised Machine Learning: Conception, Validation, Applications, and a Basis for Multi-Modal Object Detection and Tracking , Nicholas LaHaye

Machine-Learning-Based Approach to Decoding Physiological and Neural Signals , Elnaz Lashgari

Learning-Based Modeling of Weather and Climate Events Related To El Niño Phenomenon via Differentiable Programming and Empirical Decompositions , Justin Le

Quantum State Estimation and Tracking for Superconducting Processors Using Machine Learning , Shiva Lotfallahzadeh Barzili

Novel Applications of Statistical and Machine Learning Methods to Analyze Trial-Level Data from Cognitive Measures , Chelsea Parlett

Optimal Analytical Methods for High Accuracy Cardiac Disease Classification and Treatment Based on ECG Data , Jianwei Zheng

Dissertations from 2020 2020

Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents , Steven Agajanian

Allocation of Public Resources: Bringing Order to Chaos , Lance Clifner

A Novel Correction for the Adjusted Box-Pierce Test — New Risk Factors for Emergency Department Return Visits within 72 hours for Children with Respiratory Conditions — General Pediatric Model for Understanding and Predicting Prolonged Length of Stay , Sidy Danioko

A Computational and Experimental Examination of the FCC Incentive Auction , Logan Gantner

Exploring the Employment Landscape for Individuals with Autism Spectrum Disorders using Supervised and Unsupervised Machine Learning , Kayleigh Hyde

Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations , Oluyemi Odeyemi

On Quantum Effects of Vector Potentials and Generalizations of Functional Analysis , Ismael L. Paiva

Long Term Ground Based Precipitation Data Analysis: Spatial and Temporal Variability , Luciano Rodriguez

Gaining Computational Insight into Psychological Data: Applications of Machine Learning with Eating Disorders and Autism Spectrum Disorder , Natalia Rosenfield

Connecting the Dots for People with Autism: A Data-driven Approach to Designing and Evaluating a Global Filter , Viseth Sean

Novel Statistical and Machine Learning Methods for the Forecasting and Analysis of Major League Baseball Player Performance , Christopher Watkins

Dissertations from 2019 2019

Contributions to Variable Selection in Complexly Sampled Case-control Models, Epidemiology of 72-hour Emergency Department Readmission, and Out-of-site Migration Rate Estimation Using Pseudo-tagged Longitudinal Data , Kyle Anderson

Bias Reduction in Machine Learning Classifiers for Spatiotemporal Analysis of Coral Reefs using Remote Sensing Images , Justin J. Gapper

Estimating Auction Equilibria using Individual Evolutionary Learning , Kevin James

Employing Earth Observations and Artificial Intelligence to Address Key Global Environmental Challenges in Service of the SDGs , Wenzhao Li

Image Restoration using Automatic Damaged Regions Detection and Machine Learning-Based Inpainting Technique , Chloe Martin-King

Theses from 2017 2017

Optimized Forecasting of Dominant U.S. Stock Market Equities Using Univariate and Multivariate Time Series Analysis Methods , Michael Schwartz

  • Collections
  • Disciplines

Advanced Search

  • Notify me via email or RSS

Author Corner

  • Submit Research
  • Rights and Terms of Use
  • Leatherby Libraries
  • Chapman University

ISSN 2572-1496

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

Grad Coach

Research Topics & Ideas: Data Science

50 Topic Ideas To Kickstart Your Research Project

Research topics and ideas about data science and big data analytics

If you’re just starting out exploring data science-related topics for your dissertation, thesis or research project, you’ve come to the right place. In this post, we’ll help kickstart your research by providing a hearty list of data science and analytics-related research ideas , including examples from recent studies.

PS – This is just the start…

We know it’s exciting to run through a list of research topics, but please keep in mind that this list is just a starting point . These topic ideas provided here are intentionally broad and generic , so keep in mind that you will need to develop them further. Nevertheless, they should inspire some ideas for your project.

To develop a suitable research topic, you’ll need to identify a clear and convincing research gap , and a viable plan to fill that gap. If this sounds foreign to you, check out our free research topic webinar that explores how to find and refine a high-quality research topic, from scratch. Alternatively, consider our 1-on-1 coaching service .

Research topic idea mega list

Data Science-Related Research Topics

  • Developing machine learning models for real-time fraud detection in online transactions.
  • The use of big data analytics in predicting and managing urban traffic flow.
  • Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.
  • The application of predictive analytics in personalizing cancer treatment plans.
  • Analyzing consumer behavior through big data to enhance retail marketing strategies.
  • The role of data science in optimizing renewable energy generation from wind farms.
  • Developing natural language processing algorithms for real-time news aggregation and summarization.
  • The application of big data in monitoring and predicting epidemic outbreaks.
  • Investigating the use of machine learning in automating credit scoring for microfinance.
  • The role of data analytics in improving patient care in telemedicine.
  • Developing AI-driven models for predictive maintenance in the manufacturing industry.
  • The use of big data analytics in enhancing cybersecurity threat intelligence.
  • Investigating the impact of sentiment analysis on brand reputation management.
  • The application of data science in optimizing logistics and supply chain operations.
  • Developing deep learning techniques for image recognition in medical diagnostics.
  • The role of big data in analyzing climate change impacts on agricultural productivity.
  • Investigating the use of data analytics in optimizing energy consumption in smart buildings.
  • The application of machine learning in detecting plagiarism in academic works.
  • Analyzing social media data for trends in political opinion and electoral predictions.
  • The role of big data in enhancing sports performance analytics.
  • Developing data-driven strategies for effective water resource management.
  • The use of big data in improving customer experience in the banking sector.
  • Investigating the application of data science in fraud detection in insurance claims.
  • The role of predictive analytics in financial market risk assessment.
  • Developing AI models for early detection of network vulnerabilities.

Research topic evaluator

Data Science Research Ideas (Continued)

  • The application of big data in public transportation systems for route optimization.
  • Investigating the impact of big data analytics on e-commerce recommendation systems.
  • The use of data mining techniques in understanding consumer preferences in the entertainment industry.
  • Developing predictive models for real estate pricing and market trends.
  • The role of big data in tracking and managing environmental pollution.
  • Investigating the use of data analytics in improving airline operational efficiency.
  • The application of machine learning in optimizing pharmaceutical drug discovery.
  • Analyzing online customer reviews to inform product development in the tech industry.
  • The role of data science in crime prediction and prevention strategies.
  • Developing models for analyzing financial time series data for investment strategies.
  • The use of big data in assessing the impact of educational policies on student performance.
  • Investigating the effectiveness of data visualization techniques in business reporting.
  • The application of data analytics in human resource management and talent acquisition.
  • Developing algorithms for anomaly detection in network traffic data.
  • The role of machine learning in enhancing personalized online learning experiences.
  • Investigating the use of big data in urban planning and smart city development.
  • The application of predictive analytics in weather forecasting and disaster management.
  • Analyzing consumer data to drive innovations in the automotive industry.
  • The role of data science in optimizing content delivery networks for streaming services.
  • Developing machine learning models for automated text classification in legal documents.
  • The use of big data in tracking global supply chain disruptions.
  • Investigating the application of data analytics in personalized nutrition and fitness.
  • The role of big data in enhancing the accuracy of geological surveying for natural resource exploration.
  • Developing predictive models for customer churn in the telecommunications industry.
  • The application of data science in optimizing advertisement placement and reach.

Recent Data Science-Related Studies

While the ideas we’ve presented above are a decent starting point for finding a research topic, they are fairly generic and non-specific. So, it helps to look at actual studies in the data science and analytics space to see how this all comes together in practice.

Below, we’ve included a selection of recent studies to help refine your thinking. These are actual studies,  so they can provide some useful insight as to what a research topic looks like in practice.

  • Data Science in Healthcare: COVID-19 and Beyond (Hulsen, 2022)
  • Auto-ML Web-application for Automated Machine Learning Algorithm Training and evaluation (Mukherjee & Rao, 2022)
  • Survey on Statistics and ML in Data Science and Effect in Businesses (Reddy et al., 2022)
  • Visualization in Data Science VDS @ KDD 2022 (Plant et al., 2022)
  • An Essay on How Data Science Can Strengthen Business (Santos, 2023)
  • A Deep study of Data science related problems, application and machine learning algorithms utilized in Data science (Ranjani et al., 2022)
  • You Teach WHAT in Your Data Science Course?!? (Posner & Kerby-Helm, 2022)
  • Statistical Analysis for the Traffic Police Activity: Nashville, Tennessee, USA (Tufail & Gul, 2022)
  • Data Management and Visual Information Processing in Financial Organization using Machine Learning (Balamurugan et al., 2022)
  • A Proposal of an Interactive Web Application Tool QuickViz: To Automate Exploratory Data Analysis (Pitroda, 2022)
  • Applications of Data Science in Respective Engineering Domains (Rasool & Chaudhary, 2022)
  • Jupyter Notebooks for Introducing Data Science to Novice Users (Fruchart et al., 2022)
  • Towards a Systematic Review of Data Science Programs: Themes, Courses, and Ethics (Nellore & Zimmer, 2022)
  • Application of data science and bioinformatics in healthcare technologies (Veeranki & Varshney, 2022)
  • TAPS Responsibility Matrix: A tool for responsible data science by design (Urovi et al., 2023)
  • Data Detectives: A Data Science Program for Middle Grade Learners (Thompson & Irgens, 2022)
  • MACHINE LEARNING FOR NON-MAJORS: A WHITE BOX APPROACH (Mike & Hazzan, 2022)
  • COMPONENTS OF DATA SCIENCE AND ITS APPLICATIONS (Paul et al., 2022)
  • Analysis on the Application of Data Science in Business Analytics (Wang, 2022)

As you can see, these research topics are a lot more focused than the generic topic ideas we presented earlier. So, for you to develop a high-quality research topic, you’ll need to get specific and laser-focused on a specific context with specific variables of interest.  In the video below, we explore some other important things you’ll need to consider when crafting your research topic.

Get 1-On-1 Help

If you’re still unsure about how to find a quality research topic, check out our Research Topic Kickstarter service, which is the perfect starting point for developing a unique, well-justified research topic.

Research Topic Kickstarter - Need Help Finding A Research Topic?

You Might Also Like:

IT & Computer Science Research Topics

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

DigitalCommons@Kennesaw State University

Home > CCSE > Data Science and Analytics > PhD DSA

Doctor of Data Science and Analytics Dissertations

The PhD Website

The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests.

We launched the first formal PhD program in Data Science in 2015. Our program sits at the intersection of computer science, statistics, mathematics, and business. Our students engage in relevant research with faculty from across our eleven colleges. As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community. -Sherry Ni, Director, Ph.D. in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Need to Submit Your Dissertation? Submit Here!

Dissertations from 2024 2024.

A Holistic and Collaborative Behavioral Health Detection Framework Using Sensitive Police Narratives , Martin Keagan Wynne Brown

Multi-Modality Transformer for E-Commerce: Inferring User Purchase Intention to Bridge the Query-Product Gap , Srivatsa Mallapragada

Dissertations from 2023 2023

Quantification of Various Types of Biases in Large Language Models , Sudhashree Sayenju

Dissertations from 2022 2022

Appley: Approximate Shapley Values for Model Explainability in Linear Time , Md Shafiul Alam

Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics , Jonathan Boardman

Novel Instance-Level Weighted Loss Function for Imbalanced Learning , Trent Geisler

Debiasing Cyber Incidents – Correcting for Reporting Delays and Under-reporting , Seema Sangari

Dissertations from 2021 2021

Integrated Machine Learning Approaches to Improve Classification performance and Feature Extraction Process for EEG Dataset , Mohammad Masum

A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in Episodes of Care Healthcare Delivery System , Lauren Staples

Dissertations from 2020 2020

A CREDIT ANALYSIS OF THE UNBANKED AND UNDERBANKED: AN ARGUMENT FOR ALTERNATIVE DATA , Edwin Baidoo

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies , Jessica M. Rudd

Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring , Yan Wang

A Novel Penalized Log-likelihood Function for Class Imbalance Problem , Lili Zhang

ATTACK AND DEFENSE IN SECURITY ANALYTICS , Yiyun Zhou

Dissertations from 2019 2019

One and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles , Bogdan Gadidov

Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis , Jie Hao

Deep Embedding Kernel , Linh Le

Ordinal HyperPlane Loss , Bob Vanderheyden

Advanced Search

  • Notify me via email or RSS
  • All Collections
  • Disciplines
  • Conferences
  • Faculty Works
  • Open Access
  • Research Support
  • Student Works
  • Data Science Homepage

Useful Links

  • Training Materials

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright DigitalCommons@Kennesaw State University ISSN: 2576-6805

Warning icon

Thesis/Capstone for Master's in Data Science | Northwestern SPS - Northwestern School of Professional Studies

  • Post-baccalaureate
  • Undergraduate
  • Professional Development
  • Pre-College
  • Center for Public Safety
  • Get Information

SPS Logo

Data Science

Capstone and thesis overview.

Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can be highlighted on their resumes. Students should consider the factors below when deciding whether a capstone or thesis may be more appropriate to pursue.

A capstone is a practical or real-world project that can emphasize preparation for professional practice. A capstone is more appropriate if:

  • you don't necessarily need or want the experience of the research process or writing a big publication
  • you want more input on your project, from fellow students and instructors
  • you want more structure to your project, including assignment deadlines and due dates
  • you want to complete the project or graduate in a timely manner

A student can enroll in MSDS 498 Capstone in any term. However, capstone specialization courses can provide a unique student experience and may be offered only twice a year. 

A thesis is an academic-focused research project with broader applicability. A thesis is more appropriate if:

  • you want to get a PhD or other advanced degree and want the experience of the research process and writing for publication
  • you want to work individually with a specific faculty member who serves as your thesis adviser
  • you are more self-directed, are good at managing your own projects with very little supervision, and have a clear direction for your work
  • you have a project that requires more time to pursue

Students can enroll in MSDS 590 Thesis as long as there is an approved thesis project proposal, identified thesis adviser, and all other required documentation at least two weeks before the start of any term.

From Faculty Director, Thomas W. Miller, PhD

Tom Miller

Capstone projects and thesis research give students a chance to study topics of special interest to them. Students can highlight analytical skills developed in the program. Work on capstone and thesis research projects often leads to publications that students can highlight on their resumes.”

A thesis is an individual research project that usually takes two to four terms to complete. Capstone course sections, on the other hand, represent a one-term commitment.

Students need to evaluate their options prior to choosing a capstone course section because capstones vary widely from one instructor to the next. There are both general and specialization-focused capstone sections. Some capstone sections offer in individual research projects, others offer team research projects, and a few give students a choice of individual or team projects.

Students should refer to the SPS Graduate Student Handbook for more information regarding registration for either MSDS 590 Thesis or MSDS 498 Capstone.

Capstone Experience

If students wish to engage with an outside organization to work on a project for capstone, they can refer to this checklist and lessons learned for some helpful tips.

Capstone Checklist

  • Start early — set aside a minimum of one to two months prior to the capstone quarter to determine the industry and modeling interests.
  • Networking — pitch your idea to potential organizations for projects and focus on the business benefits you can provide.
  • Permission request — make sure your final project can be shared with others in the course and the information can be made public.
  • Engagement — engage with the capstone professor prior to and immediately after getting the dataset to ensure appropriate scope for the 10 weeks.
  • Teambuilding — recruit team members who have similar interests for the type of project during the first week of the course.

Capstone Lesson Learned

  • Access to company data can take longer than expected; not having this access before or at the start of the term can severely delay the progress
  • Project timeline should align with coursework timeline as closely as possible
  • One point of contact (POC) for business facing to ensure streamlined messages and more effective time management with the organization
  • Expectation management on both sides: (business) this is pro-bono (students) this does not guarantee internship or job opportunities
  • Data security/masking not executed in time can risk the opportunity completely

Publication of Work

Northwestern University Libraries offers an option for students to publish their master’s thesis or capstone in Arch, Northwestern’s open access research and data repository.

Benefits for publishing your thesis:

  • Your work will be indexed by search engines and discoverable by researchers around the world, extending your work’s impact beyond Northwestern
  • Your work will be assigned a Digital Object Identifier (DOI) to ensure perpetual online access and to facilitate scholarly citation
  • Your work will help accelerate discovery and increase knowledge in your subject domain by adding to the global corpus of public scholarly information

Get started:

  • Visit Arch online
  • Log in with your NetID
  • Describe your thesis: title, author, date, keywords, rights, license, subject, etc.
  • Upload your thesis or capstone PDF and any related supplemental files (data, code, images, presentations, documentation, etc.)
  • Select a visibility: Public, Northwestern-only, Embargo (i.e. delayed release)
  • Save your work to the repository

Your thesis manuscript or capstone report will then be published on the MSDS page. You can view other published work here .

For questions or support in publishing your thesis or capstone, please contact [email protected] .

DiscoverDataScience.org

PhD in Data Science – Your Guide to Choosing a Doctorate Degree Program

data science dissertation

Created by aasif.faizal

Professional opportunities in data science are growing incredibly fast. That’s great news for students looking to pursue a career as a data scientist. But it also means that there are a lot more options out there to investigate and understand before developing the best educational path for you.

A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field.

phd data science

This means that PhD programs are the most time-intensive degree option out there, typically requiring that students complete dissertations involving rigorous research. This means that PhDs are not for everyone. Indeed, many who work in the world of big data hold master’s degrees rather than PhDs, which tend to involve the same coursework as PhD programs without a dissertation component. However, for the right candidate, a PhD program is the perfect choice to become a true expert on your area of focus.

If you’ve concluded that a data science PhD is the right path for you, this guide is intended to help you choose the best program to suit your needs. It will walk through some of the key considerations while picking graduate data science programs and some of the nuts and bolts (like course load and tuition costs) that are part of the data science PhD decision-making process.

Data Science PhD vs. Masters: Choosing the right option for you

If you’re considering pursuing a data science PhD, it’s worth knowing that such an advanced degree isn’t strictly necessary in order to get good work opportunities. Many who work in the field of big data only hold master’s degrees, which is the level of education expected to be a competitive candidate for data science positions.

So why pursue a data science PhD?

Simply put, a PhD in data science will leave you qualified to enter the big data industry at a high level from the outset.

You’ll be eligible for advanced positions within companies, holding greater responsibilities, keeping more direct communication with leadership, and having more influence on important data-driven decisions. You’re also likely to receive greater compensation to match your rank.

However, PhDs are not for everyone. Dissertations require a great deal of time and an interest in intensive research. If you are eager to jumpstart a career quickly, a master’s program will give you the preparation you need to hit the ground running. PhDs are appropriate for those who want to commit their time and effort to schooling as a long-term investment in their professional trajectory.

For more information on the difference between data science PhD’s and master’s programs, take a look at our guide here.

Topics include:

  • Can I get an Online Ph.D in Data Science?
  • Overview of Ph.d Coursework

Preparing for a Doctorate Program

Building a solid track record of professional experience, things to consider when choosing a school.

  • What Does it Cost to Get a Ph.D in Data Science?
  • School Listings

data analysis graph

Data Science PhD Programs, Historically

Historically, data science PhD programs were one of the main avenues to get a good data-related position in academia or industry. But, PhD programs are heavily research oriented and require a somewhat long term investment of time, money, and energy to obtain. The issue that some data science PhD holders are reporting, especially in industry settings, is that that the state of the art is moving so quickly, and that the data science industry is evolving so rapidly, that an abundance of research oriented expertise is not always what’s heavily sought after.

Instead, many companies are looking for candidates who are up to date with the latest data science techniques and technologies, and are willing to pivot to match emerging trends and practices.

One recent development that is making the data science graduate school decisions more complex is the introduction of specialty master’s degrees, that focus on rigorous but compact, professional training. Both students and companies are realizing the value of an intensive, more industry-focused degree that can provide sufficient enough training to manage complex projects and that are more client oriented, opposed to research oriented.

However, not all prospective data science PhD students are looking for jobs in industry. There are some pretty amazing research opportunities opening up across a variety of academic fields that are making use of new data collection and analysis tools. Experts that understand how to leverage data systems including statistics and computer science to analyze trends and build models will be in high demand.

Can You Get a PhD in Data Science Online?

While it is not common to get a data science Ph.D. online, there are currently two options for those looking to take advantage of the flexibility of an online program.

Indiana University Bloomington and Northcentral University both offer online Ph.D. programs with either a minor or specialization in data science.

Given the trend for schools to continue increasing online offerings, expect to see additional schools adding this option in the near future.

woman data analysis on computer screens

Overview of PhD Coursework

A PhD requires a lot of academic work, which generally requires between four and five years (sometimes longer) to complete.

Here are some of the high level factors to consider and evaluate when comparing data science graduate programs.

How many credits are required for a PhD in data science?

On average, it takes 71 credits to graduate with a PhD in data science — far longer (almost double) than traditional master’s degree programs. In addition to coursework, most PhD students also have research and teaching responsibilities that can be simultaneously demanding and really great career preparation.

What’s the core curriculum like?

In a data science doctoral program, you’ll be expected to learn many skills and also how to apply them across domains and disciplines. Core curriculums will vary from program to program, but almost all will have a core foundation of statistics.

All PhD candidates will have to take a qualifying exam. This can vary from university to university, but to give you some insight, it is broken up into three phases at Yale. They have a practical exam, a theory exam and an oral exam. The goal is to make sure doctoral students are developing the appropriate level of expertise.

Dissertation

One of the final steps of a PhD program involves presenting original research findings in a formal document called a dissertation. These will provide background and context, as well as findings and analysis, and can contribute to the understanding and evolution of data science. A dissertation idea most often provides the framework for how a PhD candidate’s graduate school experience will unfold, so it’s important to be thoughtful and deliberate while considering research opportunities.

Since data science is such a rapidly evolving field and because choosing the right PhD program is such an important factor in developing a successful career path, there are some steps that prospective doctoral students can take in advance to find the best-fitting opportunity.

Join professional associations

Even before being fully credentials, joining professional associations and organizations such as the Data Science Association and the American Association of Big Data Professionals is a good way to get exposure to the field. Many professional societies are welcoming to new members and even encourage student participation with things like discounted membership fees and awards and contest categories for student researchers. One of the biggest advantages to joining is that these professional associations bring together other data scientists for conference events, research-sharing opportunities, networking and continuing education opportunities.

Leverage your social network

Be on the lookout to make professional connections with professors, peers, and members of industry. There are a number of LinkedIn groups dedicated to data science. A well-maintained professional network is always useful to have when looking for advice or letters of recommendation while applying to graduate school and then later while applying for jobs and other career-related opportunities.

Kaggle competitions

Kaggle competitions provide the opportunity to solve real-world data science problems and win prizes. A list of data science problems can be found at Kaggle.com . Winning one of these competitions is a good way to demonstrate professional interest and experience.

Internships

Internships are a great way to get real-world experience in data science while also getting to work for top names in the world of business. For example, IBM offers a data science internship which would also help to stand out when applying for PhD programs, as well as in seeking employment in the future.

Demonstrating professional experience is not only important when looking for jobs, but it can also help while applying for graduate school. There are a number of ways for prospective students to gain exposure to the field and explore different facets of data science careers.

Get certified

There are a number of data-related certificate programs that are open to people with a variety of academic and professional experience. DeZyre has an excellent guide to different certifications, some of which might help provide good background for graduate school applications.

Conferences

Conferences are a great place to meet people presenting new and exciting research in the data science field and bounce ideas off of newfound connections. Like professional societies and organizations, discounted student rates are available to encourage student participation. In addition, some conferences will waive fees if you are presenting a poster or research at the conference, which is an extra incentive to present.

teacher in full classroom of students

It can be hard to quantify what makes a good-fit when it comes to data science graduate school programs. There are easy to evaluate factors, such as cost and location, and then there are harder to evaluate criteria such as networking opportunities, accessibility to professors, and the up-to-dateness of the program’s curriculum.

Nevertheless, there are some key relevant considerations when applying to almost any data science graduate program.

What most schools will require when applying:

  • All undergraduate and graduate transcripts
  • A statement of intent for the program (reason for applying and future plans)
  • Letters of reference
  • Application fee
  • Online application
  • A curriculum vitae (outlining all of your academic and professional accomplishments)

What Does it Cost to Get a PhD in Data Science?

The great news is that many PhD data science programs are supported by fellowships and stipends. Some are completely funded, meaning the school will pay tuition and basic living expenses. Here are several examples of fully funded programs:

  • University of Southern California
  • University of Nevada, Reno
  • Kennesaw State University
  • Worcester Polytechnic Institute
  • University of Maryland

For all other programs, the average range of tuition, depending on the school can range anywhere from $1,300 per credit hour to $2,000 amount per credit hour. Remember, typical PhD programs in data science are between 60 and 75 credit hours, meaning you could spend up to $150,000 over several years.

That’s why the financial aspects are so important to evaluate when assessing PhD programs, because some schools offer full stipends so that you are able to attend without having to find supplemental scholarships or tuition assistance.

Can I become a professor of data science with a PhD.? Yes! If you are interested in teaching at the college or graduate level, a PhD is the degree needed to establish the full expertise expected to be a professor. Some data scientists who hold PhDs start by entering the field of big data and pivot over to teaching after gaining a significant amount of work experience. If you’re driven to teach others or to pursue advanced research in data science, a PhD is the right degree for you.

Do I need a master’s in order to pursue a PhD.? No. Many who pursue PhDs in Data Science do not already hold advanced degrees, and many PhD programs include all the coursework of a master’s program in the first two years of school. For many students, this is the most time-effective option, allowing you to complete your education in a single pass rather than interrupting your studies after your master’s program.

Can I choose to pursue a PhD after already receiving my master’s? Yes. A master’s program can be an opportunity to get the lay of the land and determine the specific career path you’d like to forge in the world of big data. Some schools may allow you to simply extend your academic timeline after receiving your master’s degree, and it is also possible to return to school to receive a PhD if you have been working in the field for some time.

If a PhD. isn’t necessary, is it a waste of time? While not all students are candidates for PhDs, for the right students – who are keen on doing in-depth research, have the time to devote to many years of school, and potentially have an interest in continuing to work in academia – a PhD is a great choice. For more information on this question, take a look at our article Is a Data Science PhD. Worth It?

Complete List of Data Science PhD Programs

Below you will find the most comprehensive list of schools offering a doctorate in data science. Each school listing contains a link to the program specific page, GRE or a master’s degree requirements, and a link to a page with detailed course information.

Note that the listing only contains true data science programs. Other similar programs are often lumped together on other sites, but we have chosen to list programs such as data analytics and business intelligence on a separate section of the website.

Boise State University  – Boise, Idaho PhD in Computing – Data Science Concentration

The Data Science emphasis focuses on the development of mathematical and statistical algorithms, software, and computing systems to extract knowledge or insights from data.  

In 60 credits, students complete an Introduction to Graduate Studies, 12 credits of core courses, 6 credits of data science elective courses, 10 credits of other elective courses, a Doctoral Comprehensive Examination worth 1 credit, and a 30-credit dissertation.

Electives can be taken in focus areas such as Anthropology, Biometry, Ecology/Evolution and Behavior, Econometrics, Electrical Engineering, Earth Dynamics and Informatics, Geoscience, Geostatistics, Hydrology and Hydrogeology, Materials Science, and Transportation Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $7,236 total (Resident), $24,573 total (Non-resident)

View Course Offerings

Bowling Green State University  – Bowling Green, Ohio Ph.D. in Data Science

Data Science students at Bowling Green intertwine knowledge of computer science with statistics.

Students learn techniques in analyzing structured, unstructured, and dynamic datasets.

Courses train students to understand the principles of analytic methods and articulating the strengths and limitations of analytical methods.

The program requires 60 credit hours in the studies of Computer Science (6 credit hours), Statistics (6 credit hours), Data Science Exploration and Communication, Ethical Issues, Advanced Data Mining, and Applied Data Science Experience.

Students must also complete 21 credit hours of elective courses, a qualifying exam, a preliminary exam, and a dissertation.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,418 (Resident), $14,410 (Non-resident)

Brown University  – Providence, Rhode Island PhD in Computer Science – Concentration in Data Science

Brown University’s database group is a world leader in systems-oriented database research; they seek PhD candidates with strong system-building skills who are interested in researching TupleWare, MLbase, MDCC, Crowd DB, or PIQL.

In order to gain entrance, applicants should consider first doing a research internship at Brown with this group. Other ways to boost an application are to take and do well at massive open online courses, do an internship at a large company, and get involved in a large open-source software project.

Coding well in C++ is preferred.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $62,680 total

Chapman University  – Irvine, California Doctorate in Computational and Data Sciences

Candidates for the doctorate in computational and data science at Chapman University begin by completing 13 core credits in basic methodologies and techniques of computational science.

Students complete 45 credits of electives, which are personalized to match the specific interests and research topics of the student.

Finally, students complete up to 12 credits in dissertation research.

Applicants must have completed courses in differential equations, data structures, and probability and statistics, or take specific foundation courses, before beginning coursework toward the PhD.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,538 per year

Clemson University / Medical University of South Carolina (MUSC) – Joint Program – Clemson, South Carolina & Charleston, South Carolina Doctor of Philosophy in Biomedical Data Science and Informatics – Clemson

The PhD in biomedical data science and informatics is a joint program co-authored by Clemson University and the Medical University of South Carolina (MUSC).

Students choose one of three tracks to pursue: precision medicine, population health, and clinical and translational informatics. Students complete 65-68 credit hours, and take courses in each of 5 areas: biomedical informatics foundations and applications; computing/math/statistics/engineering; population health, health systems, and policy; biomedical/medical domain; and lab rotations, seminars, and doctoral research.

Applicants must have a bachelor’s in health science, computing, mathematics, statistics, engineering, or a related field, and it is recommended to also have competency in a second of these areas.

Program requirements include a year of calculus and college biology, as well as experience in computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,858 total (South Carolina Resident), $22,566 total (Non-resident)

View Course Offerings – Clemson

George Mason University  – Fairfax, Virginia Doctor of Philosophy in Computational Sciences and Informatics – Emphasis in Data Science

George Mason’s PhD in computational sciences and informatics requires a minimum of 72 credit hours, though this can be reduced if a student has already completed a master’s. 48 credits are toward graduate coursework, and an additional 24 are for dissertation research.

Students choose an area of emphasis—either computer modeling and simulation or data science—and completed 18 credits of the coursework in this area. Students are expected to completed the coursework in 4-5 years.

Applicants to this program must have a bachelor’s degree in a natural science, mathematics, engineering, or computer science, and must have knowledge and experience with differential equations and computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $13,426 total (Virginia Resident), $35,377 total (Non-resident)

Harrisburg University of Science and Technology  – Harrisburg, Pennsylvania Doctor of Philosophy in Data Sciences

Harrisburg University’s PhD in data science is a 4-5 year program, the first 2 of which make up the Harrisburg master’s in analytics.

Beyond this, PhD candidates complete six milestones to obtain the degree, including 18 semester hours in doctoral-level courses, such as multivariate data analysis, graph theory, machine learning.

Following the completion of ANLY 760 Doctoral Research Seminar, students in the program complete their 12 hours of dissertation research bringing the total program hours to 36.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $14,940 total

Icahn School of Medicine at Mount Sinai  – New York, New York Genetics and Data Science, PhD

As part of the Biomedical Science PhD program, the Genetics and Data Science multidisciplinary training offers research opportunities that expand on genetic research and modern genomics. The training also integrates several disciplines of biomedical sciences with machine learning, network modeling, and big data analysis.

Students in the Genetics and Data Science program complete a predetermined course schedule with a total of 64 credits and 3 years of study.

Additional course requirements and electives include laboratory rotations, a thesis proposal exam and thesis defense, Computer Systems, Intro to Algorithms, Machine Learning for Biomedical Data Science, Translational Genomics, and Practical Analysis of a Personal Genome.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $31,303 total

Indiana University-Purdue University Indianapolis  – Indianapolis, Indiana PhD in Data Science PhD Minor in Applied Data Science

Doctoral candidates pursuing the PhD in data science at Indiana University-Purdue must display competency in research, data analytics, and at management and infrastructure to earn the degree.

The PhD is comprised of 24 credits of a data science core, 18 credits of methods courses, 18 credits of a specialization, written and oral qualifying exams, and 30 credits of dissertation research. All requirements must be completed within 7 years.

Applicants are generally expected to have a master’s in social science, health, data science, or computer science. 

Currently a majority of the PhD students at IUPUI are funded by faculty grants and two are funded by the federal government. None of the students are self funded.

IUPUI also offers a PhD Minor in Applied Data Science that is 12-18 credits. The minor is open to students enrolled at IUPUI or IU Bloomington in a doctoral program other than Data Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $9,228 per year (Indiana Resident), $25,368 per year (Non-resident)

Jackson State University – Jackson, Mississippi PhD Computational and Data-Enabled Science and Engineering

Jackson State University offers a PhD in computational and data-enabled science and engineering with 5 concentration areas: computational biology and bioinformatics, computational science and engineering, computational physical science, computation public health, and computational mathematics and social science.

Students complete 12 credits of common core courses, 12 credits in the specialization, 24 credits of electives, and 24 credits in dissertation research.

Students may complete the doctoral program in as little as 5 years and no more than 8 years.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,270 total

Kennesaw State University  – Kennesaw, Georgia PhD in Analytics and Data Science

Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

Prior to dissertation research, the comprehensive examination will cover material from the three areas of study: computer science, mathematics, and statistics.

Successful applicants will have a master’s degree in a computational field, calculus I and II, programming experience, modeling experience, and are encouraged to have a base SAS certification.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,328 total (Georgia Resident), $19,188 total (Non-resident)

New Jersey Institute of Technology  – Newark, New Jersey PhD in Business Data Science

Students may enter the PhD program in business data science at the New Jersey Institute of Technology with either a relevant bachelor’s or master’s degree. Students with bachelor’s degrees begin with 36 credits of advanced courses, and those with master’s take 18 credits before moving on to credits in dissertation research.

Core courses include business research methods, data mining and analysis, data management system design, statistical computing with SAS and R, and regression analysis.

Students take qualifying examinations at the end of years 1 and 2, and must defend their dissertations successfully by the end of year 6.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $21,932 total (New Jersey Resident), $32,426 total (Non-resident)

New York University  – New York, New York PhD in Data Science

Doctoral candidates in data science at New York University must complete 72 credit hours, pass a comprehensive and qualifying exam, and defend a dissertation with 10 years of entering the program.

Required courses include an introduction to data science, probability and statistics for data science, machine learning and computational statistics, big data, and inference and representation.

Applicants must have an undergraduate or master’s degree in fields such as mathematics, statistics, computer science, engineering, or other scientific disciplines. Experience with calculus, probability, statistics, and computer programming is also required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,332 per year

View Course Offering

Northcentral University  – San Diego, California PhD in Data Science-TIM

Northcentral University offers a PhD in technology and innovation management with a specialization in data science.

The program requires 60 credit hours, including 6-7 core courses, 3 in research, a PhD portfolio, and 4 dissertation courses.

The data science specialization requires 6 courses: data mining, knowledge management, quantitative methods for data analytics and business intelligence, data visualization, predicting the future, and big data integration.

Applicants must have a master’s already.

Delivery Method: Online GRE: Required 2022-2023 Tuition: $16,794 total

Stevens Institute of Technology – Hoboken, New Jersey Ph.D. in Data Science

Stevens Institute of Technology has developed a data science Ph.D. program geared to help graduates become innovators in the space.

The rigorous curriculum emphasizes mathematical and statistical modeling, machine learning, computational systems and data management.

The program is directed by Dr. Ted Stohr, a recognized thought leader in the information systems, operations and business process management arenas.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $39,408 per year

University at Buffalo – Buffalo, New York PhD Computational and Data-Enabled Science and Engineering

The curriculum for the University of Buffalo’s PhD in computational and data-enabled science and engineering centers around three areas: data science, applied mathematics and numerical methods, and high performance and data intensive computing. 9 credit course of courses must be completed in each of these three areas. Altogether, the program consists of 72 credit hours, and should be completed in 4-5 years. A master’s degree is required for admission; courses taken during the master’s may be able to count toward some of the core coursework requirements.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,310 per year (New York Resident), $23,100 per year (Non-resident)

University of Colorado Denver – Denver, Colorado PhD in Big Data Science and Engineering

The University of Colorado – Denver offers a unique program for those students who have already received admission to the computer science and information systems PhD program.

The Big Data Science and Engineering (BDSE) program is a PhD fellowship program that allows selected students to pursue research in the area of big data science and engineering. This new fellowship program was created to train more computer scientists in data science application fields such as health informatics, geosciences, precision and personalized medicine, business analytics, and smart cities and cybersecurity.

Students in the doctoral program must complete 30 credit hours of computer science classes beyond a master’s level, and 30 credit hours of dissertation research.

The BDSE fellowship requires students to have an advisor both in the core disciplines (either computer science or mathematics and statistics) as well as an advisor in the application discipline (medicine and public health, business, or geosciences).

In addition, the fellowship covers full stipend, tuition, and fees up to ~50k for BDSE fellows annually. Important eligibility requirements can be found here.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $55,260 total

University of Marylan d  – College Park, Maryland PhD in Information Studies

Data science is a potential research area for doctoral candidates in information studies at the University of Maryland – College Park. This includes big data, data analytics, and data mining.

Applicants for the PhD must have taken the following courses in undergraduate studies: programming languages, data structures, design and analysis of computer algorithms, calculus I and II, and linear algebra.

Students must complete 6 qualifying courses, 2 elective graduate courses, and at least 12 credit hours of dissertation research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $16,238 total (Maryland Resident), $35,388 total (Non-resident)

University of Massachusetts Boston  – Boston, Massachusetts PhD in Business Administration – Information Systems for Data Science Track

The University of Massachusetts – Boston offers a PhD in information systems for data science. As this is a business degree, students must complete coursework in their first two years with a focus on data for business; for example, taking courses such as business in context: markets, technologies, and societies.

Students must take and pass qualifying exams at the end of year 1, comprehensive exams at the end of year 2, and defend their theses at the end of year 4.

Those with a degree in statistics, economics, math, computer science, management sciences, information systems, and other related fields are especially encouraged, though a quantitative degree is not necessary.

Students accepted by the program are ordinarily offered full tuition credits and a stipend ($25,000 per year) to cover educational expenses and help defray living costs for up to three years of study.

During the first two years of coursework, they are assigned to a faculty member as a research assistant; for the third year students will be engaged in instructional activities. Funding for the fourth year is merit-based from a limited pool of program funds

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $18,894 total (in-state), $36,879 (out-of-state)

University of Nevada Reno – Reno, Nevada PhD in Statistics and Data Science

The University of Nevada – Reno’s doctoral program in statistics and data science is comprised of 72 credit hours to be completed over the course of 4-5 years. Coursework is all within the scope of statistics, with titles such as statistical theory, probability theory, linear models, multivariate analysis, statistical learning, statistical computing, time series analysis.

The completion of a Master’s degree in mathematics or statistics prior to enrollment in the doctoral program is strongly recommended, but not required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,814 total (in-state), $22,356 (out-of-state)

University of Southern California – Los Angles, California PhD in Data Sciences & Operations

USC Marshall School of Business offers a PhD in data sciences and operations to be completed in 5 years.

Students can choose either a track in operations management or in statistics. Both tracks require 4 courses in fall and spring of the first 2 years, as well as a research paper and courses during the summers. Year 3 is devoted to dissertation preparation and year 4 and/or 5 to dissertation defense.

A bachelor’s degree is necessary for application, but no field or further experience is required.

Students should complete 60 units of coursework. If the students are admitted with Advanced Standing (e.g., Master’s Degree in appropriate field), this requirement may be reduced to 40 credits.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $63,468 total

University of Tennessee-Knoxville  – Knoxville, Tennessee The Data Science and Engineering PhD

The data science and engineering PhD at the University of Tennessee – Knoxville requires 36 hours of coursework and 36 hours of dissertation research. For those entering with an MS degree, only 24 hours of course work is required.

The core curriculum includes work in statistics, machine learning, and scripting languages and is enhanced by 6 hours in courses that focus either on policy issues related to data, or technology entrepreneurship.

Students must also choose a knowledge specialization in one of these fields: health and biological sciences, advanced manufacturing, materials science, environmental and climate science, transportation science, national security, urban systems science, and advanced data science.

Applicants must have a bachelor’s or master’s degree in engineering or a scientific field. 

All students that are admitted will be supported by a research fellowship and tuition will be included.

Many students will perform research with scientists from Oak Ridge national lab, which is located about 30 minutes drive from campus.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,468 total (Tennessee Resident), $29,656 total (Non-resident)

University of Vermont – Burlington, Vermont Complex Systems and Data Science (CSDS), PhD

Through the College of Engineering and Mathematical Sciences, the Complex Systems and Data Science (CSDS) PhD program is pan-disciplinary and provides computational and theoretical training. Students may customize the program depending on their chosen area of focus.

Students in this program work in research groups across campus.

Core courses include Data Science, Principles of Complex Systems and Modeling Complex Systems. Elective courses include Machine Learning, Complex Networks, Evolutionary Computation, Human/Computer Interaction, and Data Mining.

The program requires at least 75 credits to graduate with approval by the student graduate studies committee.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $12,204 total (Vermont Resident), $30,960 total (Non-resident)

University of Washington Seattle Campus – Seattle, Washington PhD in Big Data and Data Science

The University of Washington’s PhD program in data science has 2 key goals: training of new data scientists and cyberinfrastructure development, i.e., development of open-source tools and services that scientists around the world can use for big data analysis.

Students must take core courses in data management, machine learning, data visualization, and statistics.

Students are also required to complete at least one internship that covers practical work in big data.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $17,004 per year (Washington resident), $30,477 (non-resident)

University of Wisconsin-Madison – Madison, Wisconsin PhD in Biomedical Data Science

The PhD program in Biomedical Data Science offered by the Department of Biostatistics and Medical Informatics at UW-Madison is unique, in blending the best of statistics and computer science, biostatistics and biomedical informatics. 

Students complete three year-long course sequences in biostatistics theory and methods, computer science/informatics, and a specialized sequence to fit their interests.

Students also complete three research rotations within their first two years in the program, to both expand their breadth of knowledge and assist in identifying a research advisor.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,728 total (in-state), $24,054 total (out-of-state)

Vanderbilt University – Nashville, Tennessee Data Science Track of the BMI PhD Program

The PhD in biomedical informatics at Vanderbilt has the option of a data science track.

Students complete courses in the areas of biomedical informatics (3 courses), computer science (4 courses), statistical methods (4 courses), and biomedical science (2 courses). Students are expected to complete core courses and defend their dissertations within 5 years of beginning the program.

Applicants must have a bachelor’s degree in computer science, engineering, biology, biochemistry, nursing, mathematics, statistics, physics, information management, or some other health-related field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $53,160 per year

Washington University in St. Louis – St. Louis, Missouri Doctorate in Computational & Data Sciences

Washington University now offers an interdisciplinary Ph.D. in Computational & Data Sciences where students can choose from one of four tracks (Computational Methodologies, Political Science, Psychological & Brain Sciences, or Social Work & Public Health).

Students are fully funded and will receive a stipend for at least five years contingent on making sufficient progress in the program.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $59,420 total

Worcester Polytechnic Institute – Worcester, Massachusetts PhD in Data Science

The PhD in data science at Worcester Polytechnic Institute focuses on 5 areas: integrative data science, business intelligence and case studies, data access and management, data analytics and mining, and mathematical analysis.

Students first complete a master’s in data science, and then complete 60 credit hours beyond the master’s, including 30 credit hours of research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $28,980 per year

Yale University – New Haven, Connecticut PhD Program – Department of Stats and Data Science

The PhD in statistics and data science at Yale University offers broad training in the areas of statistical theory, probability theory, stochastic processes, asymptotics, information theory, machine learning, data analysis, statistical computing, and graphical methods. Students complete 12 courses in the first year in these topics.

Students are required to teach one course each semester of their third and fourth years.

Most students complete and defend their dissertations in their fifth year.

Applicants should have an educational background in statistics, with an undergraduate major in statistics, mathematics, computer science, or similar field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $46,900 total

data science dissertation

  • Related Programs

data science dissertation

  • Warning : Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95 Warning : array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102
  • ODSC EUROPE
  • AI+ Training
  • Speak at ODSC

data science dissertation

  • Data Analytics
  • Data Engineering
  • Data Visualization
  • Deep Learning
  • Generative AI
  • Machine Learning
  • NLP and LLMs
  • Business & Use Cases
  • Career Advice
  • Write for us
  • ODSC Community Slack Channel
  • Upcoming Webinars

10 Compelling Machine Learning Ph.D. Dissertations for 2020

10 Compelling Machine Learning Ph.D. Dissertations for 2020

Machine Learning Modeling Research posted by Daniel Gutierrez, ODSC August 19, 2020 Daniel Gutierrez, ODSC

As a data scientist, an integral part of my work in the field revolves around keeping current with research coming out of academia. I frequently scour arXiv.org for late-breaking papers that show trends and reveal fertile areas of research. Other sources of valuable research developments are in the form of Ph.D. dissertations, the culmination of a doctoral candidate’s work to confer his/her degree. Ph.D. candidates are highly motivated to choose research topics that establish new and creative paths toward discovery in their field of study. Their dissertations are highly focused on a specific problem. If you can find a dissertation that aligns with your areas of interest, consuming the research is an excellent way to do a deep dive into the technology. After reviewing hundreds of recent theses from universities all over the country, I present 10 machine learning dissertations that I found compelling in terms of my own areas of interest.

[Related article: Introduction to Bayesian Deep Learning ]

I hope you’ll find several that match your own fields of inquiry. Each thesis may take a while to consume but will result in hours of satisfying summer reading. Enjoy!

1. Bayesian Modeling and Variable Selection for Complex Data

As we routinely encounter high-throughput data sets in complex biological and environmental research, developing novel models and methods for variable selection has received widespread attention. This dissertation addresses a few key challenges in Bayesian modeling and variable selection for high-dimensional data with complex spatial structures. 

2. Topics in Statistical Learning with a Focus on Large Scale Data

Big data vary in shape and call for different approaches. One type of big data is the tall data, i.e., a very large number of samples but not too many features. This dissertation describes a general communication-efficient algorithm for distributed statistical learning on this type of big data. The algorithm distributes the samples uniformly to multiple machines, and uses a common reference data to improve the performance of local estimates. The algorithm enables potentially much faster analysis, at a small cost to statistical performance.

Another type of big data is the wide data, i.e., too many features but a limited number of samples. It is also called high-dimensional data, to which many classical statistical methods are not applicable. 

This dissertation discusses a method of dimensionality reduction for high-dimensional classification. The method partitions features into independent communities and splits the original classification problem into separate smaller ones. It enables parallel computing and produces more interpretable results.

3. Sets as Measures: Optimization and Machine Learning

The purpose of this machine learning dissertation is to address the following simple question:

How do we design efficient algorithms to solve optimization or machine learning problems where the decision variable (or target label) is a set of unknown cardinality?

Optimization and machine learning have proved remarkably successful in applications requiring the choice of single vectors. Some tasks, in particular many inverse problems, call for the design, or estimation, of sets of objects. When the size of these sets is a priori unknown, directly applying optimization or machine learning techniques designed for single vectors appears difficult. The work in this dissertation shows that a very old idea for transforming sets into elements of a vector space (namely, a space of measures), a common trick in theoretical analysis, generates effective practical algorithms.

4. A Geometric Perspective on Some Topics in Statistical Learning

Modern science and engineering often generate data sets with a large sample size and a comparably large dimension which puts classic asymptotic theory into question in many ways. Therefore, the main focus of this dissertation is to develop a fundamental understanding of statistical procedures for estimation and hypothesis testing from a non-asymptotic point of view, where both the sample size and problem dimension grow hand in hand. A range of different problems are explored in this thesis, including work on the geometry of hypothesis testing, adaptivity to local structure in estimation, effective methods for shape-constrained problems, and early stopping with boosting algorithms. The treatment of these different problems shares the common theme of emphasizing the underlying geometric structure.

5. Essays on Random Forest Ensembles

A random forest is a popular machine learning ensemble method that has proven successful in solving a wide range of classification problems. While other successful classifiers, such as boosting algorithms or neural networks, admit natural interpretations as maximum likelihood, a suitable statistical interpretation is much more elusive for a random forest. The first part of this dissertation demonstrates that a random forest is a fruitful framework in which to study AdaBoost and deep neural networks. The work explores the concept and utility of interpolation, the ability of a classifier to perfectly fit its training data. The second part of this dissertation places a random forest on more sound statistical footing by framing it as kernel regression with the proximity kernel. The work then analyzes the parameters that control the bandwidth of this kernel and discuss useful generalizations.

6. Marginally Interpretable Generalized Linear Mixed Models

A popular approach for relating correlated measurements of a non-Gaussian response variable to a set of predictors is to introduce latent random variables and fit a generalized linear mixed model. The conventional strategy for specifying such a model leads to parameter estimates that must be interpreted conditional on the latent variables. In many cases, interest lies not in these conditional parameters, but rather in marginal parameters that summarize the average effect of the predictors across the entire population. Due to the structure of the generalized linear mixed model, the average effect across all individuals in a population is generally not the same as the effect for an average individual. Further complicating matters, obtaining marginal summaries from a generalized linear mixed model often requires evaluation of an analytically intractable integral or use of an approximation. Another popular approach in this setting is to fit a marginal model using generalized estimating equations. This strategy is effective for estimating marginal parameters, but leaves one without a formal model for the data with which to assess quality of fit or make predictions for future observations. Thus, there exists a need for a better approach.

This dissertation defines a class of marginally interpretable generalized linear mixed models that leads to parameter estimates with a marginal interpretation while maintaining the desirable statistical properties of a conditionally specified model. The distinguishing feature of these models is an additive adjustment that accounts for the curvature of the link function and thereby preserves a specific form for the marginal mean after integrating out the latent random variables. 

7. On the Detection of Hate Speech, Hate Speakers and Polarized Groups in Online Social Media

The objective of this dissertation is to explore the use of machine learning algorithms in understanding and detecting hate speech, hate speakers and polarized groups in online social media. Beginning with a unique typology for detecting abusive language, the work outlines the distinctions and similarities of different abusive language subtasks (offensive language, hate speech, cyberbullying and trolling) and how we might benefit from the progress made in each area. Specifically, the work suggests that each subtask can be categorized based on whether or not the abusive language being studied 1) is directed at a specific individual, or targets a generalized “Other” and 2) the extent to which the language is explicit versus implicit. The work then uses knowledge gained from this typology to tackle the “problem of offensive language” in hate speech detection. 

8. Lasso Guarantees for Dependent Data

Serially correlated high dimensional data are prevalent in the big data era. In order to predict and learn the complex relationship among the multiple time series, high dimensional modeling has gained importance in various fields such as control theory, statistics, economics, finance, genetics and neuroscience. This dissertation studies a number of high dimensional statistical problems involving different classes of mixing processes. 

9. Random forest robustness, variable importance, and tree aggregation

Random forest methodology is a nonparametric, machine learning approach capable of strong performance in regression and classification problems involving complex data sets. In addition to making predictions, random forests can be used to assess the relative importance of feature variables. This dissertation explores three topics related to random forests: tree aggregation, variable importance, and robustness. 

10. Climate Data Computing: Optimal Interpolation, Averaging, Visualization and Delivery

This dissertation solves two important problems in the modern analysis of big climate data. The first is the efficient visualization and fast delivery of big climate data, and the second is a computationally extensive principal component analysis (PCA) using spherical harmonics on the Earth’s surface. The second problem creates a way to supply the data for the technology developed in the first. These two problems are computationally difficult, such as the representation of higher order spherical harmonics Y400, which is critical for upscaling weather data to almost infinitely fine spatial resolution.

I hope you enjoyed learning about these compelling machine learning dissertations.

Editor’s note: Interested in more data science research? Check out the Research Frontiers track at ODSC Europe this September 17-19 or the ODSC West Research Frontiers track this October 27-30.

data science dissertation

Daniel Gutierrez, ODSC

Daniel D. Gutierrez is a practicing data scientist who’s been working with data long before the field came in vogue. As a technology journalist, he enjoys keeping a pulse on this fast-paced industry. Daniel is also an educator having taught data science, machine learning and R classes at the university level. He has authored four computer industry books on database and data science technology, including his most recent title, “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Daniel holds a BS in Mathematics and Computer Science from UCLA.

west square

May 2024 Top LLM & Generative AI News, Research, & Open-Source Tools

Generative AI posted by ODSC Team May 23, 2024 Every month, there are countless updates in the world of large language models. Between new tools...

Microsoft Unveils Copilot+ PCs

Microsoft Unveils Copilot+ PCs

AI and Data Science News posted by ODSC Team May 21, 2024 In a special event on May 20 at their new campus, Microsoft introduced a category of...

Why London is a Powerhouse in Artificial Intelligence

Why London is a Powerhouse in Artificial Intelligence

Europe 2024 Featured Post posted by ODSC Team May 21, 2024 Everyone talks about San Francisco & Silicon Valley as being the go-to places for artificial intelligence,...

eu cfs square

Machine Learning - CMU

PhD Dissertations

PhD Dissertations

[all are .pdf files].

Learning Models that Match Jacob Tyo, 2024

Improving Human Integration across the Machine Learning Pipeline Charvi Rastogi, 2024

Reliable and Practical Machine Learning for Dynamic Healthcare Settings Helen Zhou, 2023

Automatic customization of large-scale spiking network models to neuronal population activity (unavailable) Shenghao Wu, 2023

Estimation of BVk functions from scattered data (unavailable) Addison J. Hu, 2023

Rethinking object categorization in computer vision (unavailable) Jayanth Koushik, 2023

Advances in Statistical Gene Networks Jinjin Tian, 2023 Post-hoc calibration without distributional assumptions Chirag Gupta, 2023

The Role of Noise, Proxies, and Dynamics in Algorithmic Fairness Nil-Jana Akpinar, 2023

Collaborative learning by leveraging siloed data Sebastian Caldas, 2023

Modeling Epidemiological Time Series Aaron Rumack, 2023

Human-Centered Machine Learning: A Statistical and Algorithmic Perspective Leqi Liu, 2023

Uncertainty Quantification under Distribution Shifts Aleksandr Podkopaev, 2023

Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023

Comparing Forecasters and Abstaining Classifiers Yo Joong Choe, 2023

Using Task Driven Methods to Uncover Representations of Human Vision and Semantics Aria Yuan Wang, 2023

Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023

Applied Mathematics of the Future Kin G. Olivares, 2023

METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023

NEURAL REASONING FOR QUESTION ANSWERING Haitian Sun, 2023

Principled Machine Learning for Societally Consequential Decision Making Amanda Coston, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Maxwell B. Wang, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Darby M. Losey, 2023

Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics David Zhao, 2023

Towards an Application-based Pipeline for Explainability Gregory Plumb, 2022

Objective Criteria for Explainable Machine Learning Chih-Kuan Yeh, 2022

Making Scientific Peer Review Scientific Ivan Stelmakh, 2022

Facets of regularization in high-dimensional learning: Cross-validation, risk monotonization, and model complexity Pratik Patil, 2022

Active Robot Perception using Programmable Light Curtains Siddharth Ancha, 2022

Strategies for Black-Box and Multi-Objective Optimization Biswajit Paria, 2022

Unifying State and Policy-Level Explanations for Reinforcement Learning Nicholay Topin, 2022

Sensor Fusion Frameworks for Nowcasting Maria Jahja, 2022

Equilibrium Approaches to Modern Deep Learning Shaojie Bai, 2022

Towards General Natural Language Understanding with Probabilistic Worldbuilding Abulhair Saparov, 2022

Applications of Point Process Modeling to Spiking Neurons (Unavailable) Yu Chen, 2021

Neural variability: structure, sources, control, and data augmentation Akash Umakantha, 2021

Structure and time course of neural population activity during learning Jay Hennig, 2021

Cross-view Learning with Limited Supervision Yao-Hung Hubert Tsai, 2021

Meta Reinforcement Learning through Memory Emilio Parisotto, 2021

Learning Embodied Agents with Scalably-Supervised Reinforcement Learning Lisa Lee, 2021

Learning to Predict and Make Decisions under Distribution Shift Yifan Wu, 2021

Statistical Game Theory Arun Sai Suggala, 2021

Towards Knowledge-capable AI: Agents that See, Speak, Act and Know Kenneth Marino, 2021

Learning and Reasoning with Fast Semidefinite Programming and Mixing Methods Po-Wei Wang, 2021

Bridging Language in Machines with Language in the Brain Mariya Toneva, 2021

Curriculum Learning Otilia Stretcu, 2021

Principles of Learning in Multitask Settings: A Probabilistic Perspective Maruan Al-Shedivat, 2021

Towards Robust and Resilient Machine Learning Adarsh Prasad, 2021

Towards Training AI Agents with All Types of Experiences: A Unified ML Formalism Zhiting Hu, 2021

Building Intelligent Autonomous Navigation Agents Devendra Chaplot, 2021

Learning to See by Moving: Self-supervising 3D Scene Representations for Perception, Control, and Visual Reasoning Hsiao-Yu Fish Tung, 2021

Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe Collin Politsch, 2020

Causal Inference with Complex Data Structures and Non-Standard Effects Kwhangho Kim, 2020

Networks, Point Processes, and Networks of Point Processes Neil Spencer, 2020

Dissecting neural variability using population recordings, network models, and neurofeedback (Unavailable) Ryan Williamson, 2020

Predicting Health and Safety: Essays in Machine Learning for Decision Support in the Public Sector Dylan Fitzpatrick, 2020

Towards a Unified Framework for Learning and Reasoning Han Zhao, 2020

Learning DAGs with Continuous Optimization Xun Zheng, 2020

Machine Learning and Multiagent Preferences Ritesh Noothigattu, 2020

Learning and Decision Making from Diverse Forms of Information Yichong Xu, 2020

Towards Data-Efficient Machine Learning Qizhe Xie, 2020

Change modeling for understanding our world and the counterfactual one(s) William Herlands, 2020

Machine Learning in High-Stakes Settings: Risks and Opportunities Maria De-Arteaga, 2020

Data Decomposition for Constrained Visual Learning Calvin Murdock, 2020

Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data Micol Marchetti-Bowick, 2020

Towards Efficient Automated Machine Learning Liam Li, 2020

LEARNING COLLECTIONS OF FUNCTIONS Emmanouil Antonios Platanios, 2020

Provable, structured, and efficient methods for robustness of deep networks to adversarial examples Eric Wong , 2020

Reconstructing and Mining Signals: Algorithms and Applications Hyun Ah Song, 2020

Probabilistic Single Cell Lineage Tracing Chieh Lin, 2020

Graphical network modeling of phase coupling in brain activity (unavailable) Josue Orellana, 2019

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees Christoph Dann, 2019 Learning Generative Models using Transformations Chun-Liang Li, 2019

Estimating Probability Distributions and their Properties Shashank Singh, 2019

Post-Inference Methods for Scalable Probabilistic Modeling and Sequential Decision Making Willie Neiswanger, 2019

Accelerating Text-as-Data Research in Computational Social Science Dallas Card, 2019

Multi-view Relationships for Analytics and Inference Eric Lei, 2019

Information flow in networks based on nonstationary multivariate neural recordings Natalie Klein, 2019

Competitive Analysis for Machine Learning & Data Science Michael Spece, 2019

The When, Where and Why of Human Memory Retrieval Qiong Zhang, 2019

Towards Effective and Efficient Learning at Scale Adams Wei Yu, 2019

Towards Literate Artificial Intelligence Mrinmaya Sachan, 2019

Learning Gene Networks Underlying Clinical Phenotypes Under SNP Perturbations From Genome-Wide Data Calvin McCarter, 2019

Unified Models for Dynamical Systems Carlton Downey, 2019

Anytime Prediction and Learning for the Balance between Computation and Accuracy Hanzhang Hu, 2019

Statistical and Computational Properties of Some "User-Friendly" Methods for High-Dimensional Estimation Alnur Ali, 2019

Nonparametric Methods with Total Variation Type Regularization Veeranjaneyulu Sadhanala, 2019

New Advances in Sparse Learning, Deep Networks, and Adversarial Learning: Theory and Applications Hongyang Zhang, 2019

Gradient Descent for Non-convex Problems in Modern Machine Learning Simon Shaolei Du, 2019

Selective Data Acquisition in Learning and Decision Making Problems Yining Wang, 2019

Anomaly Detection in Graphs and Time Series: Algorithms and Applications Bryan Hooi, 2019

Neural dynamics and interactions in the human ventral visual pathway Yuanning Li, 2018

Tuning Hyperparameters without Grad Students: Scaling up Bandit Optimisation Kirthevasan Kandasamy, 2018

Teaching Machines to Classify from Natural Language Interactions Shashank Srivastava, 2018

Statistical Inference for Geometric Data Jisu Kim, 2018

Representation Learning @ Scale Manzil Zaheer, 2018

Diversity-promoting and Large-scale Machine Learning for Healthcare Pengtao Xie, 2018

Distribution and Histogram (DIsH) Learning Junier Oliva, 2018

Stress Detection for Keystroke Dynamics Shing-Hon Lau, 2018

Sublinear-Time Learning and Inference for High-Dimensional Models Enxu Yan, 2018

Neural population activity in the visual cortex: Statistical methods and application Benjamin Cowley, 2018

Efficient Methods for Prediction and Control in Partially Observable Environments Ahmed Hefny, 2018

Learning with Staleness Wei Dai, 2018

Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data Jing Xiang, 2017

New Paradigms and Optimality Guarantees in Statistical Learning and Estimation Yu-Xiang Wang, 2017

Dynamic Question Ordering: Obtaining Useful Information While Reducing User Burden Kirstin Early, 2017

New Optimization Methods for Modern Machine Learning Sashank J. Reddi, 2017

Active Search with Complex Actions and Rewards Yifei Ma, 2017

Why Machine Learning Works George D. Montañez , 2017

Source-Space Analyses in MEG/EEG and Applications to Explore Spatio-temporal Neural Dynamics in Human Vision Ying Yang , 2017

Computational Tools for Identification and Analysis of Neuronal Population Activity Pengcheng Zhou, 2016

Expressive Collaborative Music Performance via Machine Learning Gus (Guangyu) Xia, 2016

Supervision Beyond Manual Annotations for Learning Visual Representations Carl Doersch, 2016

Exploring Weakly Labeled Data Across the Noise-Bias Spectrum Robert W. H. Fisher, 2016

Optimizing Optimization: Scalable Convex Programming with Proximal Operators Matt Wytock, 2016

Combining Neural Population Recordings: Theory and Application William Bishop, 2015

Discovering Compact and Informative Structures through Data Partitioning Madalina Fiterau-Brostean, 2015

Machine Learning in Space and Time Seth R. Flaxman, 2015

The Time and Location of Natural Reading Processes in the Brain Leila Wehbe, 2015

Shape-Constrained Estimation in High Dimensions Min Xu, 2015

Spectral Probabilistic Modeling and Applications to Natural Language Processing Ankur Parikh, 2015 Computational and Statistical Advances in Testing and Learning Aaditya Kumar Ramdas, 2015

Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain Alona Fyshe, 2015

Learning Statistical Features of Scene Images Wooyoung Lee, 2014

Towards Scalable Analysis of Images and Videos Bin Zhao, 2014

Statistical Text Analysis for Social Science Brendan T. O'Connor, 2014

Modeling Large Social Networks in Context Qirong Ho, 2014

Semi-Cooperative Learning in Smart Grid Agents Prashant P. Reddy, 2013

On Learning from Collective Data Liang Xiong, 2013

Exploiting Non-sequence Data in Dynamic Model Learning Tzu-Kuo Huang, 2013

Mathematical Theories of Interaction with Oracles Liu Yang, 2013

Short-Sighted Probabilistic Planning Felipe W. Trevizan, 2013

Statistical Models and Algorithms for Studying Hand and Finger Kinematics and their Neural Mechanisms Lucia Castellanos, 2013

Approximation Algorithms and New Models for Clustering and Learning Pranjal Awasthi, 2013

Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Mladen Kolar, 2013

Learning with Sparsity: Structures, Optimization and Applications Xi Chen, 2013

GraphLab: A Distributed Abstraction for Large Scale Machine Learning Yucheng Low, 2013

Graph Structured Normal Means Inference James Sharpnack, 2013 (Joint Statistics & ML PhD)

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-Son Phuoc Le, 2013

Learning Large-Scale Conditional Random Fields Joseph K. Bradley, 2013

New Statistical Applications for Differential Privacy Rob Hall, 2013 (Joint Statistics & ML PhD)

Parallel and Distributed Systems for Probabilistic Reasoning Joseph Gonzalez, 2012

Spectral Approaches to Learning Predictive Representations Byron Boots, 2012

Attribute Learning using Joint Human and Machine Computation Edith L. M. Law, 2012

Statistical Methods for Studying Genetic Variation in Populations Suyash Shringarpure, 2012

Data Mining Meets HCI: Making Sense of Large Graphs Duen Horng (Polo) Chau, 2012

Learning with Limited Supervision by Input and Output Coding Yi Zhang, 2012

Target Sequence Clustering Benjamin Shih, 2011

Nonparametric Learning in High Dimensions Han Liu, 2010 (Joint Statistics & ML PhD)

Structural Analysis of Large Networks: Observations and Applications Mary McGlohon, 2010

Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy Brian D. Ziebart, 2010

Tractable Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar, 2010

Rare Category Analysis Jingrui He, 2010

Coupled Semi-Supervised Learning Andrew Carlson, 2010

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong, 2009

Efficient Matrix Models for Relational Learning Ajit Paul Singh, 2009

Exploiting Domain and Task Regularities for Robust Named Entity Recognition Andrew O. Arnold, 2009

Theoretical Foundations of Active Learning Steve Hanneke, 2009

Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning Hao Cen, 2009

Detecting Patterns of Anomalies Kaustav Das, 2009

Dynamics of Large Networks Jurij Leskovec, 2008

Computational Methods for Analyzing and Modeling Gene Regulation Dynamics Jason Ernst, 2008

Stacked Graphical Learning Zhenzhen Kou, 2007

Actively Learning Specific Function Properties with Applications to Statistical Inference Brent Bryan, 2007

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar, 2007

Scalable Graphical Models for Social Networks Anna Goldenberg, 2007

Measure Concentration of Strongly Mixing Processes with Applications Leonid Kontorovich, 2007

Tools for Graph Mining Deepayan Chakrabarti, 2005

Automatic Discovery of Latent Variable Models Ricardo Silva, 2005

data science dissertation

  • Thesis Option

Data Science master’s students can choose to satisfy the research experience requirement by selecting the thesis option. Students will spend the majority of their second year working on a substantial data science project that culminates in the submission and oral defense of a master’s thesis. While all thesis projects must be related to data science, students are given leeway in finding a project in a domain of study that fits with their background and interest.

All students choosing the thesis option must find a research advisor and submit a thesis proposal by mid-April of their first year of study. Thesis proposals will be evaluated by the Data Science faculty committee and only those students whose proposals are accepted will be allowed to continue with the thesis option.  

To account for the time spent on thesis research, students choosing the thesis option are able substitute three required courses (the Capstone and two "free" elective courses (as defined in the final bullet point on the degree requirement page )) with AC 302.

In Applied Computation

  • How to Apply
  • Learning Outcomes
  • Master of Science Degree Requirements
  • Master of Engineering Degree Requirements
  • CSE courses
  • Degree Requirements
  • Data Science courses
  • Data Science FAQ
  • Secondary Field Requirements
  • Advising and Other Activities
  • AB/SM Information
  • Alumni Stories
  • Financing the Degree
  • Student FAQ

data science dissertation

Recent Dissertation Topics

Marty Wells and a student look over papers

Kerstin Emily Frailey - “PRACTICAL DATA QUALITY FOR MODERN DATA & MODERN USES, WITH APPLICATIONS TO AMERICA’S COVID-19 DATA"

Dissertation Advisor: Martin Wells

Initial job placement: Co-Founder & CEO

David Kent - “Smoothness-Penalized Deconvolution: Rates of Convergence, Choice of Tuning Parameter, and Inference"

Dissertation Advisor: David Ruppert

Initial job placement: VISITING ASSISTANT PROFESSOR - Cornell University

Yuchen Xu - “Dynamic Atomic Column Detection in Transmission Electron Microscopy Videos via Ridge Estimation”

Dissertation Advisor: David Matteson

Initial job placement: Postdoctoral Fellow - UCLA

Siyi Deng - “Optimal and Safe Semi-supervised Estimation and Inference for High-dimensional Linear Regression"

Dissertation Advisor: Yang Ning

Initial job placement: Data Scientist - TikTok

Peter (Haoxuan) Wu - “Advances in adaptive and deep Bayesian state-space models”

Initial job placement: Quantitative Researcher - DRW

Grace Deng - “Generative models and Bayesian spillover graphs for dynamic networks”

Initial job placement: Data Scientist - Research at Google

Samriddha Lahiry - “Some problems of asymptotic quantum statistical inference”

Dissertation Advisor: Michael Nussbaum

Initial job placement: Postdoctoral Fellow - Harvard University

Yaosheng Xu - “WWTA load-balancing for parallel-server systems with heterogeneous servers and multi-scale heavy traffic limits for generalized Jackson networks”

Dissertation Advisor: Jim Dai

Initial job placement: Applied Scientist - Amazon

Seth Strimas-Mackey - “Latent structure in linear prediction and corpora comparison”

Dissertation Advisor: Marten Wegkamp and Florentina Bunea

Initial job placement: Data Scientist at Google

Tao Zhang - “Topics in modern regression modeling”

Dissertation Advisor: David Ruppert and Kengo Kato

Initial job placement: Quantitative Researcher - Point72

Wentian Huang - “Nonparametric and semiparametric approaches to functional data modeling”

Initial job placement: Ernst & Young

Binh Tang - “Deep probabilistic models for sequential prediction”

Initial job placement: Amazon

Yi Su - “Off-policy evaluation and learning for interactive systems"

Dissertation Advisor: Thorsten Joachims

Initial job placement: Berkeley (postdoc)

Ruqi Zhang - “Scalable and reliable inference for probabilistic modeling”

Dissertation Advisor: Christopher De Sa

Jason Sun - “Recent developments on Matrix Completion"

Initial job placement: LinkedIn

Indrayudh Ghosal - “Model combinations and the Infinitesimal Jackknife : how to refine models with boosting and quantify uncertainty”

Dissertation Advisor: Giles Hooker

Benjamin Ryan Baer - “Contributions to fairness and transparency”

Initial job placement: Rochester (postdoc)

Megan Lynne Gelsinger - “Spatial and temporal approaches to analyzing big data”

Dissertation Advisor: David Matteson and Joe Guinness

Initial job placement: Institute for Defense Analysis

Zhengze Zhou - “Statistical inference for machine learning : feature importance, uncertainty quantification and interpretation stability”

Initial job placement: Facebook

Huijie Feng - “Estimation and inference of high-dimensional individualized threshold with binary responses”

Initial job placement: Microsoft

Xiaojie Mao - “Machine learning methods for data-driven decision making : contextual optimization, causal inference, and algorithmic fairness”

Dissertation Advisor: Nathan Kallus and Madeleine Udell

Initial job placement: Tsinghua University, China

Xin Bing - “Structured latent factor models : Identifiability, estimation, inference and prediction”

Initial job placement: Cambridge (postdoc), University of Toronto

Yang Liu - “Nonparametric regression and density estimation on a network"

Dissertation Advisor: David Ruppert and Peter Frazier

Initial job placement: Research Analyst - Cubist Systematic Strategies

Skyler Seto - “Learning from less : improving and understanding model selection in penalized machine learning problems”

Initial job placement: Machine Learning Researcher - Apple

Jiekun Feng - “Markov chain, Markov decision process, and deep reinforcement learning with applications to hospital management and real-time ride-hailing”

Initial job placement:

Wenyu Zhang - “Methods for change point detection in sequential data”

Initial job placement: Research Scientist - Institute for Infocomm Research

Liao Zhu - “The adaptive multi-factor model and the financial market"

Initial job placement: Quantitative Researcher - Two Sigma

Xiaoyun Quan - “Latent Gaussian copula model for high dimensional mixed data, and its applications”

Dissertation Advisor: James Booth and Martin Wells

Praphruetpong (Ben) Athiwaratkun - "Density representations for words and hierarchical data"

Dissertation Advisor: Andrew Wilson

Initial job placement: AI Scientist - AWS AI Labs

Yiming Sun - “High dimensional data analysis with dependency and under limited memory”

Dissertation Advisor: Sumanta Basu and Madeleine Udell

Zi Ye - “Functional single index model and jensen effect"

Dissertation Advisor: Giles Hooker 

Initial job placement: Data & Applied Scientist - Microsoft

Hui Fen (Sarah) Tan - “Interpretable approaches to opening up black-box models”

Dissertation Advisor: Giles Hooker and Martin Wells

Daniel E. Gilbert - “Luck, fairness and Bayesian tensor completion”

Yichen zhou - “asymptotics and interpretability of decision trees and decision tree ensemblesg”.

Initial job placement: Data Scientist - Google

Ze Jin - “Measuring statistical dependence and its applications in machine learning”  

Initial job placement: Research Scientist, Facebook Integrity Ranking & ML - Facebook

Xiaohan Yan - “Statistical learning for structural patterns with trees”

Dissertation Advisor: Jacob Bien

Initial job placement: Senior Data Scientist - Microsoft

Guo Yu - “High-dimensional structured regression using convex optimization”

Dan kowal - "bayesian methods for functional and time series data".

Dissertation Advisor: David Matteson and David Ruppert

Initial job placement: assistant professor, Department of Statistics, Rice University

Keegan Kang - "Data Dependent Random Projections"

David sinclair - "model selection results for high dimensional graphical models on binary and count data with applications to fmri and genomics", liu, yanning – "statistical issues in the design and analysis of clinical trials".

Dissertation Advisor: Bruce Turnbull

Nicholson, William Bertil – "Tools for Modeling Sparse Vector Autoregressions"

Tupper, laura lindley – "topics in classification and clustering of high-dimensional data", chetelat, didier – "high-dimensional inference by unbiased risk estimation".

Initial Job Placement: Assistant Professor Universite de Montreal, Montreal, Canada

Gaynanova, Irina – "Estimation Of Sparse Low-Dimensional Linear Projections"

Dissertation Advisor: James Booth

Initial Job Placement: Assistant Professor, Texas A&M, College Station, TX

Mentch, Lucas – "Ensemble Trees and CLTS: Statistical Inference in Machine Learning"

Initial Job Placement: Assistant Professor, University of Pittsburgh, Pittsburgh, PA

Risk, Ben – "Topics in Independent Component Analysis, Likelihood Component Analysis, and Spatiotemporal Mixed Modeling"

Dissertation Advisors: David Matteson and David Ruppert

Initial Job Placement: Postdoctoral Fellow, University of North Carolina, Chapel Hill, NC

Zhao, Yue – "Contributions to the Statistical Inference for the Semiparametric Elliptical Copula Model"

Disseration Advisor: Marten Wegkamp 

Initial Job Placement: Postoctoral Fellow, McGill University, Montreal, Canada

Chen, Maximillian Gene – "Dimension Reduction and Inferential Procedures for Images"

Dissertation Advisor: Martin Wells 

Earls, Cecelia – Bayesian hierarchical Gaussian process models for functional data analysis

Dissertation Advisor: Giles Hooker

Initial Job Placement: Lecturer, Cornell University, Ithaca, NY

Li, James Yi-Wei – "Tensor (Multidimensional Array) Decomposition, Regression, and Software for Statistics and Machine Learning"

Initial Job Placement: Research Scientist, Yahoo Labs

Schneider, Matthew John – "Three Papers on Time Series Forecasting and Data Privacy"

Dissertation Advisor: John Abowd

Initial Job Placement: Assistant Professor, Northwestern University, Evanston, IL

Thorbergsson, Leifur – "Experimental design for partially observed Markov decision processes"

Initial Job Placement: Data Scientist, Memorial Sloan Kettering Cancer Center, New York, NY

Wan, Muting – "Model-Based Classification with Applications to High-Dimensional Data in Bioinformatics"

Initial Job Placement: Senior Associate, 1010 Data, New York, NY

Johnson, Lynn Marie – "Topics in Linear Models: Methods for Clustered, Censored Data and Two-Stage Sampling Designs"

Dissertation Advisor: Robert Strawderman

Initial Job Placement: Statistical Consultant, Cornell, Statistical Consulting Unit, Ithaca, NY

Tecuapetla Gomez, Inder Rafael –  "Asymptotic Inference for Locally Stationary Processes"

Initial Job Placement: Postdoctoral Fellow, Georg-August-Universitat Gottigen, Gottigen, Germany. 

Bar, Haim – "Parallel Testing, and Variable Selection -- a Mixture-Model Approach with Applications in Biostatistics" 

Dissertation Advisor: James Booth

Initial Job Placement: Postdoc, Department of Medicine, Weill Medical Center, New York, NY

Cunningham, Caitlin –  "Markov Methods for Identifying ChIP-seq Peaks" 

Initial Job Placement: Assistant Professor, Le Moyne College, Syracuse, NY

Ji, Pengsheng – "Selected Topics in Nonparametric Testing and Variable Selection for High Dimensional Data" 

Dissertation Advisor: Michael Nussbaum 

Initial Job Placement: Assistant Professor, University of Georgia, Athens, GA

Morris, Darcy Steeg – "Methods for Multivariate Longitudinal Count and Duration Models with Applications in Economics" 

Dissertation Advisor: Francesca Molinari 

Initial Job Placement: Research Mathematical Statistician, Center for Statistical Research and Methodology, U.S. Census Bureau, Washington DC

Narayanan, Rajendran – "Shrinkage Estimation for Penalised Regression, Loss Estimation and Topics on Largest Eigenvalue Distributions" 

Initial Job Placement: Visiting Scientist, Indian Statistical Institute, Kolkata, India

Xiao, Luo – "Topics in Bivariate Spline Smoothing" 

Dissertation Advisor: David Ruppert 

Initial Job Placement: Postdoc, Johns Hopkins University, Baltimore, MD

Zeber, David – "Extremal Properties of Markov Chains and the Conditional Extreme Value Model" 

Dissertation Advisor: Sidney Resnick 

Initial Job Placement: Data Analyst, Mozilla, San Francisco, CA

Clement, David – "Estimating equation methods for longitudinal and survival data" 

Dissertation Advisor: Robert Strawderman 

Initial Job Placement: Quantitative Analyst, Smartodds, London UK

Eilertson, Kirsten – "Estimation and inference of random effect models with applications to population genetics and proteomics" 

Dissertation Advisor: Carlos Bustamante 

Initial Job Placement: Biostatistician, The J. David Gladstone Institutes, San Francisco CA

Grabchak, Michael – "Tempered stable distributions: properties and extensions" 

Dissertation Advisor: Gennady Samorodnitsky 

Initial Job Placement: Assistant Professor, UNC Charlotte, Charlotte NC

Li, Yingxing – "Aspects of penalized splines" 

Initial Job Placement: Assistant Professor, The Wang Yanan Institute for Studies in Economics, Xiamen University

Lopez Oliveros, Luis – "Modeling end-user behavior in data networks" 

Dissertation Advisor: Sidney Resnick  

Initial Job Placement: Consultant, Murex North America, New York NY

Ma, Xin – "Statistical Methods for Genome Variant Calling and Population Genetic Inference from Next-Generation Sequencing Data" 

Initial Job Placement: Postdoc, Stanford University, Stanford CA

Kormaksson, Matthias – "Dynamic path analysis and model based clustering of microarray data" 

Dissertation Advisor: James Booth 

Initial Job Placement: Postdoc, Department of Public Health, Weill Cornell Medical College, New York NY

Schifano, Elizabeth – "Topics in penalized estimation" 

Initial Job Placement: Postdoc, Department of Biostatistics, Harvard University, Boston MA

Hanlon, Bret – "High-dimensional data analysis" 

Dissertation Advisor: Anand Vidyashankar 

Shaby, Benjamin – "Tools for hard bayesian computations" 

Initial Job Placement: Postdoc, SAMSI, Durham NC

Zipunnikov, Vadim – "Topics on generalized linear mixed models" 

Initial Job Placement: Postdoc, Department of Biostatistics, Johns Hopkins University, Baltimore MD

Barger, Kathryn Jo-Anne – "Objective bayesian estimation for the number of classes in a population using Jeffreys and reference priors" 

Dissertation Advisor: John Bunge 

Initial Job Placement: Pfizer Incorporated

Chan, Serena Suewei – "Robust and efficient inference for linear mixed models using skew-normal distributions" 

Initial Job Placement: Statistician, Takeda Pharmaceuticles, Deerfield IL

Lin, Haizhi – "Distressed debt prices and recovery rate estimation" 

Dissertation Advisor: Martin Wells  

Initial Job Placement: Associate, Fixed Income Department, Credit Suisse Securities (USA), New York, NY

  • DSpace@MIT Home
  • MIT Libraries

This collection of MIT Theses in DSpace contains selected theses and dissertations from all MIT departments. Please note that this is NOT a complete collection of MIT theses. To search all MIT theses, use MIT Libraries' catalog .

MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

MIT Theses are openly available to all readers. Please share how this access affects or benefits you. Your story matters.

If you have questions about MIT theses in DSpace, [email protected] . See also Access & Availability Questions or About MIT Theses in DSpace .

If you are a recent MIT graduate, your thesis will be added to DSpace within 3-6 months after your graduation date. Please email [email protected] with any questions.

Permissions

MIT Theses may be protected by copyright. Please refer to the MIT Libraries Permissions Policy for permission information. Note that the copyright holder for most MIT theses is identified on the title page of the thesis.

Theses by Department

  • Comparative Media Studies
  • Computation for Design and Optimization
  • Computational and Systems Biology
  • Department of Aeronautics and Astronautics
  • Department of Architecture
  • Department of Biological Engineering
  • Department of Biology
  • Department of Brain and Cognitive Sciences
  • Department of Chemical Engineering
  • Department of Chemistry
  • Department of Civil and Environmental Engineering
  • Department of Earth, Atmospheric, and Planetary Sciences
  • Department of Economics
  • Department of Electrical Engineering and Computer Sciences
  • Department of Humanities
  • Department of Linguistics and Philosophy
  • Department of Materials Science and Engineering
  • Department of Mathematics
  • Department of Mechanical Engineering
  • Department of Nuclear Science and Engineering
  • Department of Ocean Engineering
  • Department of Physics
  • Department of Political Science
  • Department of Urban Studies and Planning
  • Engineering Systems Division
  • Harvard-MIT Program of Health Sciences and Technology
  • Institute for Data, Systems, and Society
  • Media Arts & Sciences
  • Operations Research Center
  • Program in Real Estate Development
  • Program in Writing and Humanistic Studies
  • Science, Technology & Society
  • Science Writing
  • Sloan School of Management
  • Supply Chain Management
  • System Design & Management
  • Technology and Policy Program

Collections in this community

Doctoral theses, graduate theses, undergraduate theses, recent submissions.

Thumbnail

Locomotive superheaters and feed water heaters 

Thumbnail

A cost accounting system for the F. L. & J. C. Codman Company buffing wheels 

Thumbnail

The electrical strength of insulators in high vacua 

feed

X

UCL Institute of Health Informatics

Menu

Dissertation in Health Data Science

The dissertation is an independent research project which is researched and written under the supervision of a member of academic staff. The model for the dissertation is a journal article. The module provides a structure for taught and independent study to enable you to:

  • explore the theoretical principles, approaches and methods of research in data science
  • manage the practicalities of research such as project planning, dealing with ethical committees and so on
  • gain experience in writing a proposal
  • undertake your own research project
  • write up your work in a reflective and scholarly way
  • prepare for publication of your work

What you choose as the focus of your research project and dissertation will depend very much on your own interests, and the opportunities you have available locally. The requirements for this module are that your project is a) relevant and motivating to you; b) small enough for you to be able to focus on developing and demonstrating key research skills (rather than laboriously collecting and analysing large amounts of data); and c) discussed and agreed with the dissertation team.

Module code

Ucl credits, course length, face to face dates.

  • Nov: TBC ONLINE
  • Dec: TBC ONLINE

Assessment Dates

Proposal Submission: TBC

Journal Paper Submission: TBC

Module organisers

Dr Paul Taylor  For further information contact [email protected]

Who can study this course?

MSc Health Data Analytics students

Admission requirements

Open to all UCL MSc Health Data Analytics students who have or are in the process of completing the eight taught modules.

Independent research project

Project proposal (20%); Journal paper (80%)

Selected reading list

Dependent on project area

  • Princeton University Doctoral Dissertations, 2011-2024

Computer Science

MSC DATA SCIENCE DISSERTATION - 2024/5

Module code: COMM070

Module Overview

The dissertation consists of a substantial written report. This report is based on a major piece of work that involves applying material encountered in the taught component of the degree, and extending that knowledge with the student's contribution, under the guidance of a supervisor. The dissertation usually involves a substantial literature survey on a specific topic, followed by the identification of a problem to tackle, and thereafter the development of a technical solution, and experimental or theoretical evaluation of the achievement.

Module provider

Computer Science and Electronic Eng

Module Leader

MARSHAN Alaa (CS & EE)

Number of Credits: 60

Ects credits: 30, framework: fheq level 7, module cap (maximum number of students): n/a, overall student workload.

Independent Learning Hours: 588

Lecture Hours: 2

Tutorial Hours: 10

Module Availability

Crosses academic years

Prerequisites / Co-requisites

Some project titles may require the student to have taken specific modules from the MSc programme.

Module content

The dissertation is the result of an expected 600 hrs of work. Most of this is done individually by the student, in locating and reading relevant sources, working on the technical contribution that is the main part of the dissertation, and writing up the final report. Some time is also spent in regular discussions with the supervisor. Further details are given in the module handbook.

Assessment pattern

Alternative assessment, assessment strategy.

The assessment strategy is designed to provide students with the opportunity to demonstrate that they have achieved the module learning outcomes. Thus, the summative assessment for this module consists of: - Grades for the final report against previously published assessment criteria. - Final submission is tentatively due at the end of Summer. Formative assessment and feedback Project Synopsis: Late in Semester 2 the project Synopsis will be submitted for feedback. This document should contain: Main report ¿ Introduction to problem, aims / objectives (half page) ¿ Literature review / background (1 page) ¿ Technical overview (1 page) ¿ Workplan, including risks and timeline (half page) References (doesn¿t count towards the page limit) - as many as you need Draft Report: In the middle of the Summer term the current state of the written report including planned table of contents will be submitted for feedback. Formative feedback is also given by the supervisor during regular meetings.

Module aims

  • Provide an opportunity for students to pursue a single topic in depth and to demonstrate evidence of research ability at a Masters level. The topic is typically a current problem in the broad area of the MSc Data Science. Students are encouraged to either research a new concept or apply existing technology to a new field

Learning outcomes

Attributes Developed

C - Cognitive/analytical

K - Subject knowledge

T - Transferable skills

P - Professional/Practical skills

Methods of Teaching / Learning

The learning and teaching strategy is designed to: Provide students with the knowledge, skills, and practical experience covering the module aims and learning outcomes. The learning and teaching methods include: - Regular meetings with the allocated supervisor to discuss progress. (approximately 9 hours, e.g. 18 meetings of ½ hour each, or 9 meetings of 1 hour each, or some mix in between) - Lecture on research methods (2 hours)

Indicated Lecture Hours (which may also include seminars, tutorials, workshops and other contact time) are approximate and may include in-class tests where one or more of these are an assessment on the module. In-class tests are scheduled/organised separately to taught content and will be published on to student personal timetables, where they apply to taken modules, as soon as they are finalised by central administration. This will usually be after the initial publication of the teaching timetable for the relevant semester.

Reading list

https://readinglists.surrey.ac.uk Upon accessing the reading list, please search for the module using the module code: COMM070

Other information

Digital Capabilities On this module, students learn to take a large-scale technical or research project from conception through to implementation and evaluation. This requires excellent technical skills that bring in aspects from the other modules on the programme to incorporate either security (Information Security programme) or data science and AI (Data Science programme) into their work. Students use their knowledge from previous modules in their topic area to engineer a solution to a complex problem or solve a technical research question. Employability This module provides students with technical skills alongside a range of transferrable skills by developing their own solution to a complex problem. A crucial key to success in this module is good project management skills and organisation, and the ability to develop a project management plan and follow this through. This large-scale project module provides students with experience of working on a large scale, complex piece of software or complex research problem. The resulting solution can be used as a portfolio piece to advertise a student¿s development experience to employers. . Global and Cultural Skills Computer Science is a global language and the tools and languages used on this module can be used internationally. This module allows students to develop skills that will allow them to develop applications with global reach and collaborate with their peers around the world. Resourcefulness and Resilience This module requires that a student take an idea from conception, through to specification and design, implementation and then critical evaluation. This large-scale project requires excellent technical skills, but also excellent project management and planning. Students will inevitably encounter obstacles during the development of their project, requiring them to be resourceful, in order to find solutions or alternative approaches to circumvent the obstacle. Students also require resilience to persist in the face of failure, until a viable path forward for the project is developed. The experience gained in the module will be immensely valuable when planning and implementing future large-scale projects. Sustainability The project is directed by the student primarily, and there are many staff with suitable expertise to supervise projects related to sustainability. Students therefore have the option of working on a project directly related to sustainability. For example, some projects in data science work on more efficient machine learning models that can be trained with fewer computations and a smaller memory requirement, thereby reducing power consumption. Some students work with optimisation techniques that can directly be applied to problems of reducing energy consumption, or improving the efficiency of a process with a fixed energy budget. Some students work directly on the problem of understanding ecological networks and food chains, or more efficient management of traffic flow.

Programmes this module appears in

Please note that the information detailed within this record is accurate at the time of publishing and may be subject to change. This record contains information for the most up to date version of the programme / module for the 2024/5 academic year.

  • Current Students

Want advice about your study, your wellbeing or getting the most out of university?

Data Science (with Dissertation)

Apply your scientific skills to build your expertise in data and business analysis, machine learning, data mining and data resources management.

What type of student are you?

You're considered a domestic student if you're an Australian citizen or permanent resident , a New Zealand citizen or hold a permanent Australian humanitarian visa.

You're an international student if you hold or are applying for a Student, Diplomatic, Bridging, Temporary or Provisional Resident visa, or are a permanent resident of New Zealand.

Data science is used to drive decision-making across every industry, from medical research and the food industry to airlines, customer service and artificial intelligence.

As part of your coursework, you’ll explore big data and machine learning, and how they are used by organisations of every size. You’ll learn new skills and ways of thinking as you explore data acquisition, preparation, transformation and modelling.

For your dissertation, you’ll work with your supervisor to decide on a topic and explore new approaches and the latest developments in data science.

3 reasons to study Science at Murdoch

  • Build your expertise in the expanding field of data science and machine learning.
  • Get mentored by our data science academics who have experience solving real-world problems.
  • Learn in our IT Innovation Hub, a cutting-edge learning teaching and research facility, fitted out with the latest mixed and augmented reality equipment, operational data centre and high-performance computing capabilities.

What you’ll learn

  • Learn how to apply data science and machine learning to identify and address challenges faced by an organisation.
  • Develop the skills you need to lead a team, work as part of a team and work on your own.
  • Learn a wide range of data techniques, including how to collect, record, analyse and interpret information.
  • Use creative thinking to define and resolve data science problems.
  • Develop a deep understanding of the ethics of data science and how this relates to a range of professional settings.

Your future career

When you graduate with a postgraduate qualification, you’ll have the skills and expertise you need to manage, protect and leverage big data across any industry including telecommunications, health, education, architecture, engineering, law and government. Your career options could include:

  • Data Scientist
  • Machine Learning experts
  • Data Analyst
  • Data and Analytics Manager
  • Business Intelligence Architect

For further admission information about this course, please download our Information Pack.

Study areas

  • Information Technology

Apply to start

Sign up to hear from us.

Stay informed about upcoming events, news, courses, and much more. Your journey begins here.

To help plan the structure of your course, search for our suggested course plans .

Entry requirements

Select which option best describes your path to university :, higher education, recent secondary education, vocational education & training, work & life experience, english requirements.

You must meet a minimum standard in English to study at Murdoch, which for most courses can be demonstrated by providing evidence that you have completed Year 11 and 12 in Australia at any level if you are a domestic student, or through either English proficiency tests , university preparation courses , English language courses , previous tertiary study or vocational education .

Advanced Standing

Everyone has a different path to university so if you’ve already completed formal or informal learning, you could receive advanced standing. Also known as recognition of prior learning, advanced standing can reduce the amount of study needed to complete your degree by giving you credit for certain units.

Formal learning can include previous study in higher education vocational education or adult and community education. Informal learning can include on the job learning, various kinds of work and life experience.

Find out more

This information applies to courses offered at our Australian campuses only. Courses offered at our Dubai  and Singapore campuses or delivered by Open Universities Australia may have different requirements.

Fees and scholarships

Students in a lecture theatre

Scholarships

Make the most of your university experience by reducing the financial costs with a scholarship.

Whether you’re a high achiever or have experienced hardship, we offer a wide range of scholarships and awards to students from all walks of life. Explore our scholarships to find the ones you could be eligible for as a new Murdoch student as well as what's available throughout your degree.

cbd

How much will it cost?

Instead of paying an overall course fee, you pay for the individual units you enrol in. The total course cost will vary depending on the units you choose.

Following your successful application, you’ll receive a Letter of Offer which will contain specific course and fee information.

If you are applying for an undergraduate course, you may be eligible for a Commonwealth Supported Place. Postgraduate programs are full fee paying, unless indicated otherwise in your Letter of Offer .

Course fees vary depending on the level of study and the year of commencement. Use our Fee Calculator to estimate the cost of your course.

Make the most of your university experience by reducing the financial costs with a scholarship. Our International Welcome Scholarship offers eligible students between $8,000 and $11,000, depending on the course you study.

The Australian Government also offers scholarships to students from developing countries through the Australia Awards Scholarship program.

Explore our scholarships and find out if you’re eligible to apply.

How to apply

Your document checklist.

Ready to apply? Before you start, make sure you have all of the following documentation ready for a quick application.

  • Completed official Academic Transcripts and Certificates of Completion
  • A recent Curriculum Vitae
  • Complete or incomplete official Academic Transcripts and Certificates of Completion – both original and English translated versions
  • A letter showing proof of 2 years of work experience
  • English Language Proficiency Document (if available)

Harvard University Theses, Dissertations, and Prize Papers

The Harvard University Archives ’ collection of theses, dissertations, and prize papers document the wide range of academic research undertaken by Harvard students over the course of the University’s history.

Beyond their value as pieces of original research, these collections document the history of American higher education, chronicling both the growth of Harvard as a major research institution as well as the development of numerous academic fields. They are also an important source of biographical information, offering insight into the academic careers of the authors.

Printed list of works awarded the Bowdoin prize in 1889-1890.

Spanning from the ‘theses and quaestiones’ of the 17th and 18th centuries to the current yearly output of student research, they include both the first Harvard Ph.D. dissertation (by William Byerly, Ph.D . 1873) and the dissertation of the first woman to earn a doctorate from Harvard ( Lorna Myrtle Hodgkinson , Ed.D. 1922).

Other highlights include:

  • The collection of Mathematical theses, 1782-1839
  • The 1895 Ph.D. dissertation of W.E.B. Du Bois, The suppression of the African slave trade in the United States, 1638-1871
  • Ph.D. dissertations of astronomer Cecilia Payne-Gaposchkin (Ph.D. 1925) and physicist John Hasbrouck Van Vleck (Ph.D. 1922)
  • Undergraduate honors theses of novelist John Updike (A.B. 1954), filmmaker Terrence Malick (A.B. 1966),  and U.S. poet laureate Tracy Smith (A.B. 1994)
  • Undergraduate prize papers and dissertations of philosophers Ralph Waldo Emerson (A.B. 1821), George Santayana (Ph.D. 1889), and W.V. Quine (Ph.D. 1932)
  • Undergraduate honors theses of U.S. President John F. Kennedy (A.B. 1940) and Chief Justice John Roberts (A.B. 1976)

What does a prize-winning thesis look like?

If you're a Harvard undergraduate writing your own thesis, it can be helpful to review recent prize-winning theses. The Harvard University Archives has made available for digital lending all of the Thomas Hoopes Prize winners from the 2019-2021 academic years.

Accessing These Materials

How to access materials at the Harvard University Archives

How to find and request dissertations, in person or virtually

How to find and request undergraduate honors theses

How to find and request Thomas Temple Hoopes Prize papers

How to find and request Bowdoin Prize papers

  • email: Email
  • Phone number 617-495-2461

Related Collections

Harvard faculty personal and professional archives, harvard student life collections: arts, sports, politics and social life, access materials at the harvard university archives.

  • MyU : For Students, Faculty, and Staff

CS&E Announces 2024-25 Doctoral Dissertation Fellowship (DDF) Award Winners

Collage of headshots of scholarship recipients

Seven Ph.D. students working with CS&E professors have been named Doctoral Dissertation Fellows for the 2024-25 school year. The Doctoral Dissertation Fellowship is a highly competitive fellowship that gives the University’s most accomplished Ph.D. candidates an opportunity to devote full-time effort to an outstanding research project by providing time to finalize and write a dissertation during the fellowship year. The award includes a stipend of $25,000, tuition for up to 14 thesis credits each semester, and subsidized health insurance through the Graduate Assistant Health Plan.

CS&E congratulates the following students on this outstanding accomplishment:

  • Athanasios Bacharis (Advisor: Nikolaos Papanikolopoulos )
  • Karin de Langis (Advisor:  Dongyeop Kang )
  • Arshia Zernab Hassan (Advisors: Chad Myers )
  • Xinyue Hu (Advisors: Zhi-Li Zhang )
  • Lucas Kramer (Advisors: Eric Van Wyk )
  • Yijun Lin (Advisors: Yao-Yi Chiang )
  • Mingzhou Yang (Advisors: Shashi Shekhar )

Athanasios Bacharis

Athanasios Bacharis headshot

Bacharis’ work centers around the robot-vision area, focusing on making autonomous robots act on visual information. His research includes active vision approaches, namely, view planning and next-best-view, to tackle the problem of 3D reconstruction via different optimization frameworks. The acquisition of 3D information is crucial for automating tasks, and active vision methods obtain it via optimal inference. Areas of impact include agriculture and healthcare, where 3D models can lead to reduced use of fertilizers via phenotype analysis of crops and effective management of cancer treatments. Bacharis has a strong publication record, with two peer-reviewed conference papers and one journal paper already published. He also has one conference paper under review and two journal papers in the submission process. His publications are featured in prestigious robotic and automation venues, further demonstrating his expertise and the relevance of his research in the field.

Karin de Langis

Karin de Langis headshot

Karin's thesis works at the intersection of Natural Language Processing (NLP) and cognitive science. Her work uses eye-tracking and other cognitive signals to improve NLP systems in their performance and cognitive interpretability, and to create NLP systems that process language more similarly to humans. Her human-centric approach to NLP is motivated by the possibility of addressing the shortcomings of current statistics-based NLP systems, which often become stuck on explainability and interpretability, resulting in potential biases. This work has most recently been accepted and presented at SIGNLL Conference on Computational Natural Language Learning (CoNLL) conference which has a special focus on theoretically, cognitively and scientifically motivated approaches to computational linguistics.

Arshia Zernab Hassan

Arshia Zernab Hassan headshot

Hassan's thesis work delves into developing computational methods for interpreting data from genome wide CRISPR/Cas9 screens. CRISPR/Cas9 is a new approach for genome editing that enables precise, large-scale editing of genomes and construction of mutants in human cells. These are powerful data for inferring functional relationships among genes essential for cancer growth. Moreover, chemical-genetic CRISPR screens, where population of mutant cells are grown in the presence of chemical compounds, help us understand the effect the chemicals have on cancer cells and formulate precise drug solutions. Given the novelty of these experimental technologies, computational methods to process and interpret the resulting data and accurately quantify the various genetic interactions are still quite limited, and this is where Hassan’s dissertation is focused on. Her research extends to developing deep-learning based methods that leverage CRISPR chemical-genetic and other genomic datasets to predict cancer sensitivity to candidate drugs. Her methods on improving information content in CRISPR screens was published in the Molecular Systems Biology journal, a highly visible journal in the computational biology field. 

Xinyue Hu headshot

Hu's Ph.D. dissertation is concentrated on how to effectively leverage the power of artificial intelligence and machine learning (AI/ML) – especially deep learning – to tackle challenging and important problems in the design and development of reliable, effective and secure (independent) physical infrastructure networks. More specifically, her research focuses on two critical infrastructures: power grids and communication networks, in particular, emerging 5G networks, both of which not only play a critical role in our daily life but are also vital to the nation’s economic well-being and security. Due to the enormous complexity, diversity, and scale of these two infrastructures, traditional approaches based on (simplified) theoretical models and heuristics-based optimization are no longer sufficient in overcoming many technical challenges in the design and operations of these infrastructures: data-driven machine learning approaches have become increasingly essential. The key question now is: how does one leverage the power of AI/ML without abandoning the rich theory and practical expertise that have accumulated over the years? Hu’s research has pioneered a new paradigm – (domain) knowledge-guided machine learning (KGML) – in tackling challenging and important problems in power grid and communications (e.g., 5G) network infrastructures.

Lucas Kramer

Lucas Kramer headshot

Kramer is now the driving force in designing tools and techniques for building extensible programming languages, with the Minnesota Extensible Language Tools (MELT) group. These are languages that start with a host language such as C or Java, but can then be extended with new syntax (notations) and new semantics (e.g. error-checking analyses or optimizations) over that new syntax and the original host language syntax. One extension that Kramer created was to embed the domain-specific language Halide in MELT's extensible specification of C, called ableC. This extension allows programmers to specify how code working on multi-dimensional matrices is transformed and optimized to make efficient use of hardware. Another embeds the logic-programming language Prolog into ableC; yet another provides a form of nondeterministic parallelism useful in some algorithms that search for a solution in a structured, but very large, search space. The goal of his research is to make building language extensions such as these more practical for non-expert developers.  To this end he has made many significant contributions to the MELT group's Silver meta-language, making it easier for extension developers to correctly specify complex language features with minimal boilerplate. Kramer is the lead author of one journal and four conference papers on his work at the University of Minnesota, winning the distinguished paper award for his 2020 paper at the Software Language Engineering conference, "Strategic Tree Rewriting in Attribute Grammars".

Yijun Lin headshot

Lin’s doctoral dissertation focuses on a timely, important topic of spatiotemporal prediction and forecasting using multimodal and multiscale data. Spatiotemporal prediction and forecasting are important scientific problems applicable to diverse phenomena, such as air quality, ambient noise, traffic conditions, and meteorology. Her work also couples the resulting prediction and forecasting with multimodal (e.g., satellite imagery, street-view photos, census records, and human mobility data) and multiscale geographic information (e.g., census records focusing on small tracts vs. neighborhood surveys) to characterize the natural and built environment, facilitating our understanding of the interactions between and within human social systems and the ecosystem. Her work has a wide-reaching impact across multiple domains such as smart cities, urban planning, policymaking, and public health.

Mingzhou Yang

Mingzhou Yang headshot

Yang is developing a thesis in the broad area of spatial data mining for problems in transportation. His thesis has both societal and theoretical significance. Societally, climate change is a grand challenge due to the increasing severity and frequency of climate-related disasters such as wildfires, floods, droughts, etc. Thus, many nations are aiming at carbon neutrality (also called net zero) by mid-century to avert the worst impacts of global warming. Improving energy efficiency and reducing toxic emissions in transportation is important because transportation accounts for the vast majority of U.S. petroleum consumption as well as over a third of GHG emissions and over a hundred thousand U.S. deaths annually via air pollution. To accurately quantify the expected environmental cost of vehicles during real-world driving, Yang's thesis explores ways to incorporate physics in the neural network architecture complementing other methods of integration: feature incorporation, and regularization. This approach imposes stringent physical constraints on the neural network model, guaranteeing that its outputs are consistently in accordance with established physical laws for vehicles. Extensive experiments including ablation studies demonstrated the efficacy of incorporating physics into the model. 

Related news releases

  • Brock Shamblin Wins 2024 Riedl TA Award
  • Ph.D. Student Angel Sylvester Mentor’s High School Student
  • 2024 John T. Riedl Memorial Graduate Teaching Assistant Award
  • CS&E Earns Five Awards at 2023 SIAM SDM
  • CS&E Announces 2023-24 Doctoral Dissertation Fellowship (DDF) Award Winners
  • Future undergraduate students
  • Future transfer students
  • Future graduate students
  • Future international students
  • Diversity and Inclusion Opportunities
  • Learn abroad
  • Living Learning Communities
  • Mentor programs
  • Programs for women
  • Student groups
  • Visit, Apply & Next Steps
  • Information for current students
  • Departments and majors overview
  • Departments
  • Undergraduate majors
  • Graduate programs
  • Integrated Degree Programs
  • Additional degree-granting programs
  • Online learning
  • Academic Advising overview
  • Academic Advising FAQ
  • Academic Advising Blog
  • Appointments and drop-ins
  • Academic support
  • Commencement
  • Four-year plans
  • Honors advising
  • Policies, procedures, and forms
  • Career Services overview
  • Resumes and cover letters
  • Jobs and internships
  • Interviews and job offers
  • CSE Career Fair
  • Major and career exploration
  • Graduate school
  • Collegiate Life overview
  • Scholarships
  • Diversity & Inclusivity Alliance
  • Anderson Student Innovation Labs
  • Information for alumni
  • Get engaged with CSE
  • Upcoming events
  • CSE Alumni Society Board
  • Alumni volunteer interest form
  • Golden Medallion Society Reunion
  • 50-Year Reunion
  • Alumni honors and awards
  • Outstanding Achievement
  • Alumni Service
  • Distinguished Leadership
  • Honorary Doctorate Degrees
  • Nobel Laureates
  • Alumni resources
  • Alumni career resources
  • Alumni news outlets
  • CSE branded clothing
  • International alumni resources
  • Inventing Tomorrow magazine
  • Update your info
  • CSE giving overview
  • Why give to CSE?
  • College priorities
  • Give online now
  • External relations
  • Giving priorities
  • CSE Dean's Club
  • Donor stories
  • Impact of giving
  • Ways to give to CSE
  • Matching gifts
  • CSE directories
  • Invest in your company and the future
  • Recruit our students
  • Connect with researchers
  • K-12 initiatives
  • Diversity initiatives
  • Research news
  • Give to CSE
  • CSE priorities
  • Corporate relations
  • Information for faculty and staff
  • Administrative offices overview
  • Office of the Dean
  • Academic affairs
  • Finance and Operations
  • Communications
  • Human resources
  • Undergraduate programs and student services
  • CSE Committees
  • CSE policies overview
  • Academic policies
  • Faculty hiring and tenure policies
  • Finance policies and information
  • Graduate education policies
  • Human resources policies
  • Research policies
  • Research overview
  • Research centers and facilities
  • Research proposal submission process
  • Research safety
  • Award-winning CSE faculty
  • National academies
  • University awards
  • Honorary professorships
  • Collegiate awards
  • Other CSE honors and awards
  • Staff awards
  • Performance Management Process
  • Work. With Flexibility in CSE
  • K-12 outreach overview
  • Summer camps
  • Outreach events
  • Enrichment programs
  • Field trips and tours
  • CSE K-12 Virtual Classroom Resources
  • Educator development
  • Sponsor an event

VIDEO

  1. DATA SCIENCE [MODULE-2]

  2. 77344859

  3. 77343075 Dissertation Poster Presentation Abhinav Tyagi MSc Data Science

  4. Football English Premier League (EPL) Scoreline Prediction using Machine Learning and Data Mining

  5. Unleashing The Power Of Data Science To Transform Industries

  6. Why Data Science?

COMMENTS

  1. Computational and Data Sciences (PhD) Dissertations

    Optimal Analytical Methods for High Accuracy Cardiac Disease Classification and Treatment Based on ECG Data, Jianwei Zheng. Dissertations from 2020 PDF. Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents, Steven Agajanian. PDF

  2. Research Topics & Ideas: Data Science

    If you're just starting out exploring data science-related topics for your dissertation, thesis or research project, you've come to the right place. In this post, we'll help kickstart your research by providing a hearty list of data science and analytics-related research ideas, including examples from recent studies.. PS - This is just the start…

  3. How to write a great data science thesis

    There are probably more than a thousand manuals on how to write a great thesis (some of my favorites can be found here, hereand here). They will stress the importance of structure, substance and style. They will urge you to write down your methodology and results first, then progress to the literature review, introduction and conclusions and to ...

  4. Top 10 Essential Data Science Topics to Real-World Application From the

    1. Introduction. Statistics and data science are more popular than ever in this era of data explosion and technological advances. Decades ago, John Tukey (Brillinger, 2014) said, "The best thing about being a statistician is that you get to play in everyone's backyard."More recently, Xiao-Li Meng (2009) said, "We no longer simply enjoy the privilege of playing in or cleaning up everyone ...

  5. Getting a PhD in Data Science: What You Need to Know

    Typically, the entry-level degree to get a data science position is a bachelor's degree, meaning that even just an undergraduate degree could help you land a job that earns a higher than average salary. Nonetheless, a PhD will likely prepare you for more advanced positions that could offer higher pay than less specialized roles.

  6. 17 Compelling Machine Learning Ph.D. Dissertations

    This dissertation revisits and makes progress on some old but challenging problems concerning least squares estimation, the work-horse of supervised machine learning. Two major problems are addressed: (i) least squares estimation with heavy-tailed errors, and (ii) least squares estimation in non-Donsker classes.

  7. Doctor of Data Science and Analytics Dissertations

    The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests. We launched the first formal PhD program in Data Science in 2015.

  8. Ten Research Challenge Areas in Data Science

    Abstract. To drive progress in the field of data science, we propose 10 challenge areas for the research community to pursue. Since data science is broad, with methods drawing from computer science, statistics, and other disciplines, and with applications appearing in all sectors, these challenge areas speak to the breadth of issues spanning ...

  9. Thesis/Capstone for Master's in Data Science

    Data Science; Capstone and Thesis Overview; Capstone and Thesis Overview. Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can ...

  10. Five Tips For Writing A Great Data Science Thesis

    Although educational programs, conventions and thesis requirements vary wildly, I hope to offer some common guidelines for any student currently working on a Data Science thesis. The article offers five guidance points, but may effectively be summarized in a single line: "Write for your reader, not for yourself."

  11. PhD in Data Science

    PhD in Analytics and Data Science. Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

  12. 10 Compelling Machine Learning Ph.D. Dissertations for 2020

    This dissertation explores three topics related to random forests: tree aggregation, variable importance, and robustness. 10. Climate Data Computing: Optimal Interpolation, Averaging, Visualization and Delivery. This dissertation solves two important problems in the modern analysis of big climate data.

  13. PhD Dissertations

    PhD Dissertations [All are .pdf files] Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023. Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023. METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023. Applied Mathematics of the Future Kin G. Olivares, 2023

  14. Thesis Option

    Data Science master's students can choose to satisfy the research experience requirement by selecting the thesis option. Students will spend the majority of their second year working on a substantial data science project that culminates in the submission and oral defense of a master's thesis. While all thesis projects must be related to data science, students are given leeway in finding a ...

  15. Recent Dissertation Topics

    Many of our Statistical Science PhD students' dissertations are available online. The Cornell Library system offers access to a database of dissertations through ProQuest. Cornell Library System Dissertation Search. Click on the year below to see a list of dissertation topics by student, along with placement information on where students ...

  16. MIT Theses

    MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

  17. 37 Research Topics In Data Science To Stay On Top Of » EML

    22.) Cybersecurity. Cybersecurity is a relatively new research topic in data science and in general, but it's already garnering a lot of attention from businesses and organizations. After all, with the increasing number of cyber attacks in recent years, it's clear that we need to find better ways to protect our data.

  18. Dissertation in Health Data Science

    The model for the dissertation is a journal article. The module provides a structure for taught and independent study to enable you to: explore the theoretical principles, approaches and methods of research in data science; manage the practicalities of research such as project planning, dealing with ethical committees and so on

  19. DataSpace: Computer Science

    Princeton University Doctoral Dissertations, 2011-2024; Computer Science; Computer Science Items (Sorted by Submit Date in Descending order): 1 to 20 of 235 next > Issue Date Title Author(s) 2024 ... Designing Compact Data Structures for Network Measurement and Control: Chen, Xiaoqi: 2023: From mind to machine: neural circuits, learning ...

  20. MSC DATA SCIENCE DISSERTATION

    Module aims. Provide an opportunity for students to pursue a single topic in depth and to demonstrate evidence of research ability at a Masters level. The topic is typically a current problem in the broad area of the MSc Data Science. Students are encouraged to either research a new concept or apply existing technology to a new field.

  21. Data Science (with Dissertation)

    You'll learn new skills and ways of thinking as you explore data acquisition, preparation, transformation and modelling. For your dissertation, you'll work with your supervisor to decide on a topic and explore new approaches and the latest developments in data science. 3 reasons to study Science at Murdoch. Build your expertise in the ...

  22. PDF University of Washington

    University of Washington

  23. Harvard University Theses, Dissertations, and Prize Papers

    The Harvard University Archives' collection of theses, dissertations, and prize papers document the wide range of academic research undertaken by Harvard students over the course of the University's history.. Beyond their value as pieces of original research, these collections document the history of American higher education, chronicling both the growth of Harvard as a major research ...

  24. What Is a Data Scientist? Salary, Skills, and How to Become One

    A data scientist earns an average salary of $108,659 in the United States, according to Lightcast™ [1]. Demand is high for data professionals—data scientists occupations are expected to grow by 36 percent in the next 10 years (much faster than average), according to the US Bureau of Labor Statistics (BLS) [ 2 ].

  25. CS&E Announces 2024-25 Doctoral Dissertation Fellowship (DDF) Award

    Seven Ph.D. students working with CS&E professors have been named Doctoral Dissertation Fellows for the 2024-25 school year. The Doctoral Dissertation Fellowship is a highly competitive fellowship that gives the University's most accomplished Ph.D. candidates an opportunity to devote full-time effort to an outstanding research project by providing time to finalize and write a dissertation ...