Research Topics

Biomedical Imaging

Biomedical Imaging

The current plethora of imaging technologies such as magnetic resonance imaging (MR), computed tomography (CT), position emission tomography (PET), optical coherence tomography (OCT), and ultrasound provide great insight into the different anatomical and functional processes of the human body.

Computer Vision

Computer Vision

Computer vision is the science and technology of teaching a computer to interpret images and video as well as a typical human. Technically, computer vision encompasses the fields of image/video processing, pattern recognition, biological vision, artificial intelligence, augmented reality, mathematical modeling, statistics, probability, optimization, 2D sensors, and photography.

Image Segmentation/Classification

Image Segmentation/Classification

Extracting information from a digital image often depends on first identifying desired objects or breaking down the image into homogenous regions (a process called 'segmentation') and then assigning these objects to particular classes (a process called 'classification'). This is a fundamental part of computer vision, combining image processing and pattern recognition techniques.

Multiresolution Techniques

Multiresolution   Techniques

The VIP lab has a particularly extensive history with multiresolution methods, and a significant number of research students have explored this theme. Multiresolution methods are very broad, essentially meaning than an image or video is modeled, represented, or features extracted on more than one scale, somehow allowing both local and non-local phenomena.

Remote Sensing

Remote Sensing

Remote sensing, or the science of capturing data of the earth from airplanes or satellites, enables regular monitoring of land, ocean, and atmosphere expanses, representing data that cannot be captured using any other means. A vast amount of information is generated by remote sensing platforms and there is an obvious need to analyze the data accurately and efficiently.

Scientific Imaging

Scientific Imaging

Scientific Imaging refers to working on two- or three-dimensional imagery taken for a scientific purpose, in most cases acquired either through a microscope or remotely-sensed images taken at a distance.

Stochastic Models

Stochastic Models

In many image processing, computer vision, and pattern recognition applications, there is often a large degree of uncertainty associated with factors such as the appearance of the underlying scene within the acquired data, the location and trajectory of the object of interest, the physical appearance (e.g., size, shape, color, etc.) of the objects being detected, etc.

Video Analysis

Video Analysis

Video analysis is a field within  computer vision  that involves the automatic interpretation of digital video using computer algorithms. Although humans are readily able to interpret digital video, developing algorithms for the computer to perform the same task has been highly evasive and is now an active research field.

Deep Evolution Figure

Evolutionary Deep Intelligence

Deep learning has shown considerable promise in recent years, producing tremendous results and significantly improving the accuracy of a variety of challenging problems when compared to other machine learning methods.

Discovered Radiomics Sequencer

Discovery Radiomics

Radiomics, which involves the high-throughput extraction and analysis of a large amount of quantitative features from medical imaging data to characterize tumor phenotype in a quantitative manner, is ushering in a new era of imaging-driven quantitative personalized cancer decision support and management. 

Discovered Radiomics Sequencer

Sports Analytics

Sports Analytics is a growing field in computer vision that analyzes visual cues from images to provide statistical data on players, teams, and games. Want to know how a player's technique improves the quality of the team? Can a team, based on their defensive position, increase their chances to the finals? These are a few out of a plethora of questions that are answered in sports analytics.

Share via Facebook

  • Contact Waterloo
  • Maps & Directions
  • Accessibility

The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations .

A list of completed theses and new thesis topics from the Computer Vision Group.

Are you about to start a BSc or MSc thesis? Please read our instructions for preparing and delivering your work.

Below we list possible thesis topics for Bachelor and Master students in the areas of Computer Vision, Machine Learning, Deep Learning and Pattern Recognition. The project descriptions leave plenty of room for your own ideas. If you would like to discuss a topic in detail, please contact the supervisor listed below and Prof. Paolo Favaro to schedule a meeting. Note that for MSc students in Computer Science it is required that the official advisor is a professor in CS.

AI deconvolution of light microscopy images

Level: master.

Background Light microscopy became an indispensable tool in life sciences research. Deconvolution is an important image processing step in improving the quality of microscopy images for removing out-of-focus light, higher resolution, and beter signal to noise ratio. Currently classical deconvolution methods, such as regularisation or blind deconvolution, are implemented in numerous commercial software packages and widely used in research. Recently AI deconvolution algorithms have been introduced and being currently actively developed, as they showed a high application potential.

Aim Adaptation of available AI algorithms for deconvolution of microscopy images. Validation of these methods against state-of-the -art commercially available deconvolution software.

Material and Methods Student will implement and further develop available AI deconvolution methods and acquire test microscopy images of different modalities. Performance of developed AI algorithms will be validated against available commercial deconvolution software.

computer vision dissertation topics

  • Al algorithm development and implementation: 50%.
  • Data acquisition: 10%.
  • Comparison of performance: 40 %.

Requirements

  • Interest in imaging.
  • Solid knowledge of AI.
  • Good programming skills.

Supervisors Paolo Favaro, Guillaume Witz, Yury Belyaev.

Institutes Computer Vison Group, Digital Science Lab, Microscopy imaging Center.

Contact Yury Belyaev, Microscopy imaging Center, [email protected] , + 41 78 899 0110.

Instance segmentation of cryo-ET images

Level: bachelor/master.

In the 1600s, a pioneering Dutch scientist named Antonie van Leeuwenhoek embarked on a remarkable journey that would forever transform our understanding of the natural world. Armed with a simple yet ingenious invention, the light microscope, he delved into uncharted territory, peering through its lens to reveal the hidden wonders of microscopic structures. Fast forward to today, where cryo-electron tomography (cryo-ET) has emerged as a groundbreaking technique, allowing researchers to study proteins within their natural cellular environments. Proteins, functioning as vital nano-machines, play crucial roles in life and understanding their localization and interactions is key to both basic research and disease comprehension. However, cryo-ET images pose challenges due to inherent noise and a scarcity of annotated data for training deep learning models.

computer vision dissertation topics

Credit: S. Albert et al./PNAS (CC BY 4.0)

To address these challenges, this project aims to develop a self-supervised pipeline utilizing diffusion models for instance segmentation in cryo-ET images. By leveraging the power of diffusion models, which iteratively diffuse information to capture underlying patterns, the pipeline aims to refine and accurately segment cryo-ET images. Self-supervised learning, which relies on unlabeled data, reduces the dependence on extensive manual annotations. Successful implementation of this pipeline could revolutionize the field of structural biology, facilitating the analysis of protein distribution and organization within cellular contexts. Moreover, it has the potential to alleviate the limitations posed by limited annotated data, enabling more efficient extraction of valuable information from cryo-ET images and advancing biomedical applications by enhancing our understanding of protein behavior.

Methods The segmentation pipeline for cryo-electron tomography (cryo-ET) images consists of two stages: training a diffusion model for image generation and training an instance segmentation U-Net using synthetic and real segmentation masks.

    1. Diffusion Model Training:         a. Data Collection: Collect and curate cryo-ET image datasets from the EMPIAR             database (https://www.ebi.ac.uk/empiar/).         b. Architecture Design: Select an appropriate architecture for the diffusion model.         c. Model Evaluation: Cryo-ET experts will help assess image quality and fidelity             through visual inspection and quantitative measures     2. Building the Segmentation dataset:         a. Synthetic and real mask generation: Use the trained diffusion model to generate             synthetic cryo-ET images. The diffusion process will be seeded from either a real             or a synthetic segmentation mask. This will yield to pairs of cryo-ET images and             segmentation masks.     3. Instance Segmentation U-Net Training:         a. Architecture Design: Choose an appropriate instance segmentation U-Net             architecture.         b. Model Evaluation: Evaluate the trained U-Net using precision, recall, and F1             score metrics.

By combining the diffusion model for cryo-ET image generation and the instance segmentation U-Net, this pipeline provides an efficient and accurate approach to segment structures in cryo-ET images, facilitating further analysis and interpretation.

References     1. Kwon, Diana. "The secret lives of cells-as never seen before." Nature 598.7882 (2021):         558-560.     2. Moebel, Emmanuel, et al. "Deep learning improves macromolecule identification in 3D         cellular cryo-electron tomograms." Nature methods 18.11 (2021): 1386-1394.     3. Rice, Gavin, et al. "TomoTwin: generalized 3D localization of macromolecules in         cryo-electron tomograms with structural data mining." Nature Methods (2023): 1-10.

Contacts Prof. Thomas Lemmin Institute of Biochemistry and Molecular Medicine Bühlstrasse 28, 3012 Bern ( [email protected] )

Prof. Paolo Favaro Institute of Computer Science Neubrückstrasse 10 3012 Bern ( [email protected] )

Adding and removing multiple sclerosis lesions with to imaging with diffusion networks

Background multiple sclerosis lesions are the result of demyelination: they appear as dark spots on t1 weighted mri imaging and as bright spots on flair mri imaging.  image analysis for ms patients requires both the accurate detection of new and enhancing lesions, and the assessment of  atrophy via local thickness and/or volume changes in the cortex.  detection of new and growing lesions is possible using deep learning, but made difficult by the relative lack of training data: meanwhile cortical morphometry can be affected by the presence of lesions, meaning that removing lesions prior to morphometry may be more robust.  existing ‘lesion filling’ methods are rather crude, yielding unrealistic-appearing brains where the borders of the removed lesions are clearly visible., aim: denoising diffusion networks are the current gold standard in mri image generation [1]: we aim to leverage this technology to remove and add lesions to existing mri images.  this will allow us to create realistic synthetic mri images for training and validating ms lesion segmentation algorithms, and for investigating the sensitivity of morphometry software to the presence of ms lesions at a variety of lesion load levels., materials and methods: a large, annotated, heterogeneous dataset of mri data from ms patients, as well as images of healthy controls without white matter lesions, will be available for developing the method.  the student will work in a research group with a long track record in applying deep learning methods to neuroimaging data, as well as experience training denoising diffusion networks..

Nature of the Thesis:

Literature review: 10%

Replication of Blob Loss paper: 10%

Implementation of the sliding window metrics:10%

Training on MS lesion segmentation task: 30%

Extension to other datasets: 20%

Results analysis: 20%

Fig. Results of an existing lesion filling algorithm, showing inadequate performance

Requirements:

Interest/Experience with image processing

Python programming knowledge (Pytorch bonus)

Interest in neuroimaging

Supervisor(s):

PD. Dr. Richard McKinley

Institutes: Diagnostic and Interventional Neuroradiology

Center for Artificial Intelligence in Medicine (CAIM), University of Bern

References: [1] Brain Imaging Generation with Latent Diffusion Models , Pinaya et al, Accepted in the Deep Generative Models workshop @ MICCAI 2022 , https://arxiv.org/abs/2209.07162

Contact : PD Dr Richard McKinley, Support Centre for Advanced Neuroimaging ( [email protected] )

Improving metrics and loss functions for targets with imbalanced size: sliding window Dice coefficient and loss.

Background The Dice coefficient is the most commonly used metric for segmentation quality in medical imaging, and a differentiable version of the coefficient is often used as a loss function, in particular for small target classes such as multiple sclerosis lesions.  Dice coefficient has the benefit that it is applicable in instances where the target class is in the minority (for example, in case of segmenting small lesions).  However, if lesion sizes are mixed, the loss and metric is biased towards performance on large lesions, leading smaller lesions to be missed and harming overall lesion detection.  A recently proposed loss function (blob loss[1]) aims to combat this by treating each connected component of a lesion mask separately, and claims improvements over Dice loss on lesion detection scores in a variety of tasks.

Aim: The aim of this thesisis twofold.  First, to benchmark blob loss against a simple, potentially superior loss for instance detection: sliding window Dice loss, in which the Dice loss is calculated over a sliding window across the area/volume of the medical image.  Second, we will investigate whether a sliding window Dice coefficient is better corellated with lesion-wise detection metrics than Dice coefficient and may serve as an alternative metric capturing both global and instance-wise detection.

Materials and Methods: A large, annotated, heterogeneous dataset of MRI data from MS patients will be available for benchmarking the method, as well as our existing codebases for MS lesion segmentation.  Extension of the method to other diseases and datasets (such as covered in the blob loss paper) will make the method more plausible for publication.  The student will work alongside clinicians and engineers carrying out research in multiple sclerosis lesion segmentation, in particular in the context of our running project supported by the CAIM grant.

computer vision dissertation topics

Fig. An  annotated MS lesion case, showing the variety of lesion sizes

References: [1] blob loss: instance imbalance aware loss functions for semantic segmentation, Kofler et al, https://arxiv.org/abs/2205.08209

Idempotent and partial skull-stripping in multispectral MRI imaging

Background Skull stripping (or brain extraction) refers to the masking of non-brain tissue from structural MRI imaging.  Since 3D MRI sequences allow reconstruction of facial features, many data providers supply data only after skull-stripping, making this a vital tool in data sharing.  Furthermore, skull-stripping is an important pre-processing step in many neuroimaging pipelines, even in the deep-learning era: while many methods could now operate on data with skull present, they have been trained only on skull-stripped data and therefore produce spurious results on data with the skull present.

High-quality skull-stripping algorithms based on deep learning are now widely available: the most prominent example is HD-BET [1].  A major downside of HD-BET is its behaviour on datasets to which skull-stripping has already been applied: in this case the algorithm falsely identifies brain tissue as skull and masks it.  A skull-stripping algorithm F not exhibiting this behaviour would  be idempotent: F(F(x)) = F(x) for any image x.  Furthermore, legacy datasets from before the availability of high-quality skull-stripping algorithms may still contain images which have been inadequately skull-stripped: currently the only solution to improve the skull-stripping on this data is to go back to the original datasource or to manually correct the skull-stripping, which is time-consuming and prone to error. 

Aim: In this project, the student will develop an idempotent skull-stripping network which can also handle partially skull-stripped inputs.  In the best case, the network will operate well on a large subset of the data we work with (e.g. structural MRI, diffusion-weighted MRI, Perfusion-weighted MRI,  susceptibility-weighted MRI, at a variety of field strengths) to maximize the future applicability of the network across the teams in our group.

Materials and Methods: Multiple datasets, both publicly available and internal (encompassing thousands of 3D volumes) will be available. Silver standard reference data for standard sequences at 1.5T and 3T can be generated using existing tools such as HD-BET: for other sequences and field strengths semi-supervised learning or methods improving robustness to domain shift may be employed.  Robustness to partial skull-stripping may be induced by a combination of learning theory and model-based approaches.

computer vision dissertation topics

Dataset curation: 10%

Idempotent skull-stripping model building: 30%

Modelling of partial skull-stripping:10%

Extension of model to handle partial skull: 30%

Results analysis: 10%

Fig. An example of failed skull-stripping requiring manual correction

References: [1] Isensee, F, Schell, M, Pflueger, I, et al. Automated brain extraction of multisequence MRI using artificial neural networks. Hum Brain Mapp . 2019; 40: 4952– 4964. https://doi.org/10.1002/hbm.24750

Automated leaf detection and leaf area estimation (for Arabidopsis thaliana)

Correlating plant phenotypes such as leaf area or number of leaves to the genotype (i.e. changes in DNA) is a common goal for plant breeders and molecular biologists. Such data can not only help to understand fundamental processes in nature, but also can help to improve ecotypes, e.g., to perform better under climate change, or reduce fertiliser input. However, collecting data for many plants is very time consuming and automated data acquisition is necessary.

The project aims at building a machine learning model to automatically detect plants in top-view images (see examples below), segment their leaves (see Fig C) and to estimate the leaf area. This information will then be used to determine the leaf area of different Arabidopsis ecotypes. The project will be carried out in collaboration with researchers of the Institute of Plant Sciences at the University of Bern. It will also involve the design and creation of a dataset of plant top-views with the corresponding annotation (provided by experts at the Institute of Plant Sciences).

computer vision dissertation topics

Contact: Prof. Dr. Paolo Favaro ( [email protected] )

Master Projects at the ARTORG Center

The Gerontechnology and Rehabilitation group at the ARTORG Center for Biomedical Engineering is offering multiple MSc thesis projects to students, which are interested in working with real patient data, artificial intelligence and machine learning algorithms. The goal of these projects is to transfer the findings to the clinic in order to solve today’s healthcare problems and thus to improve the quality of life of patients. Assessment of Digital Biomarkers at Home by Radar.  [PDF] Comparison of Radar, Seismograph and Ballistocardiography and to Monitor Sleep at Home.   [PDF] Sentimental Analysis in Speech.  [PDF] Contact: Dr. Stephan Gerber ( [email protected] )

Internship in Computational Imaging at Prophesee

A 6 month intership at Prophesee, Grenoble is offered to a talented Master Student.

The topic of the internship is working on burst imaging following the work of Sam Hasinoff , and exploring ways to improve it using event-based vision.

A compensation to cover the expenses of living in Grenoble is offered. Only students that have legal rights to work in France can apply.

Anyone interested can send an email with the CV to Daniele Perrone ( [email protected] ).

Using machine learning applied to wearables to predict mental health

This Master’s project lies at the intersection of psychiatry and computer science and aims to use machine learning techniques to improve health. Using sensors to detect sleep and waking behavior has as of yet unexplored potential to reveal insights into health.  In this study, we make use of a watch-like device, called an actigraph, which tracks motion to quantify sleep behavior and waking activity. Participants in the study consist of healthy and depressed adolescents and wear actigraphs for a year during which time we query their mental health status monthly using online questionnaires.  For this masters thesis we aim to make use of machine learning methods to predict mental health based on the data from the actigraph. The ability to predict mental health crises based on sleep and wake behavior would provide an opportunity for intervention, significantly impacting the lives of patients and their families. This Masters thesis is a collaboration between Professor Paolo Favaro at the Institute of Computer Science ( [email protected] ) and Dr Leila Tarokh at the Universitäre Psychiatrische Dienste (UPD) ( [email protected] ).  We are looking for a highly motivated individual interested in bridging disciplines. 

Bachelor or Master Projects at the ARTORG Center

The Gerontechnology and Rehabilitation group at the ARTORG Center for Biomedical Engineering is offering multiple BSc- and MSc thesis projects to students, which are interested in working with real patient data, artificial intelligence and machine learning algorithms. The goal of these projects is to transfer the findings to the clinic in order to solve today’s healthcare problems and thus to improve the quality of life of patients. Machine Learning Based Gait-Parameter Extraction by Using Simple Rangefinder Technology.  [PDF] Detection of Motion in Video Recordings   [PDF] Home-Monitoring of Elderly by Radar  [PDF] Gait feature detection in Parkinson's Disease  [PDF] Development of an arthroscopic training device using virtual reality  [PDF] Contact: Dr. Stephan Gerber ( [email protected] ), Michael Single ( [email protected]. ch )

Dynamic Transformer

Level: bachelor.

Visual Transformers have obtained state of the art classification accuracies [ViT, DeiT, T2T, BoTNet]. Mixture of experts could be used to increase the capacity of a neural network by learning instance dependent execution pathways in a network [MoE]. In this research project we aim to push the transformers to their limit and combine their dynamic attention with MoEs, compared to Switch Transformer [Switch], we will use a much more efficient formulation of mixing [CondConv, DynamicConv] and we will use this idea in the attention part of the transformer, not the fully connected layer.

  • Input dependent attention kernel generation for better transformer layers.

Publication Opportunity: Dynamic Neural Networks Meets Computer Vision (a CVPR 2021 Workshop)

Extensions:

  • The same idea could be extended to other ViT/Transformer based models [DETR, SETR, LSTR, TrackFormer, BERT]

Related Papers:

  • Visual Transformers: Token-based Image Representation and Processing for Computer Vision [ViT]
  • DeiT: Data-efficient Image Transformers [DeiT]
  • Bottleneck Transformers for Visual Recognition [BoTNet]
  • Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [T2TViT]
  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [MoE]
  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [Switch]
  • CondConv: Conditionally Parameterized Convolutions for Efficient Inference [CondConv]
  • Dynamic Convolution: Attention over Convolution Kernels [DynamicConv]
  • End-to-End Object Detection with Transformers [DETR]
  • Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR]
  • End-to-end Lane Shape Prediction with Transformers [LSTR]
  • TrackFormer: Multi-Object Tracking with Transformers [TrackFormer]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [BERT]

Contact: Sepehr Sameni

Visual Transformers have obtained state of the art classification accuracies for 2d images[ViT, DeiT, T2T, BoTNet]. In this project, we aim to extend the same ideas to 3d data (videos), which requires a more efficient attention mechanism [Performer, Axial, Linformer]. In order to accelerate the training process, we could use [Multigrid] technique.

  • Better video understanding by attention blocks.

Publication Opportunity: LOVEU (a CVPR workshop) , Holistic Video Understanding (a CVPR workshop) , ActivityNet (a CVPR workshop)

  • Rethinking Attention with Performers [Performer]
  • Axial Attention in Multidimensional Transformers [Axial]
  • Linformer: Self-Attention with Linear Complexity [Linformer]
  • A Multigrid Method for Efficiently Training Video Models [Multigrid]

GIRAFFE is a newly introduced GAN that can generate scenes via composition with minimal supervision [GIRAFFE]. Generative methods can implicitly learn interpretable representation as can be seen in GAN image interpretations [GANSpace, GanLatentDiscovery]. Decoding GIRAFFE could give us per-object interpretable representations that could be used for scene manipulation, data augmentation, scene understanding, semantic segmentation, pose estimation [iNeRF], and more. 

In order to invert a GIRAFFE model, we will first train the generative model on Clevr and CompCars datasets, then we add a decoder to the pipeline and train this autoencoder. We can make the task easier by knowing the number of objects in the scene and/or knowing their positions. 

Goals:  

Scene Manipulation and Decomposition by Inverting the GIRAFFE 

Publication Opportunity:  DynaVis 2021 (a CVPR workshop on Dynamic Scene Reconstruction)  

Related Papers: 

  • GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [GIRAFFE] 
  • Neural Scene Graphs for Dynamic Scenes 
  • pixelNeRF: Neural Radiance Fields from One or Few Images [pixelNeRF] 
  • NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [NeRF] 
  • Neural Volume Rendering: NeRF And Beyond 
  • GANSpace: Discovering Interpretable GAN Controls [GANSpace] 
  • Unsupervised Discovery of Interpretable Directions in the GAN Latent Space [GanLatentDiscovery] 
  • Inverting Neural Radiance Fields for Pose Estimation [iNeRF] 

Quantized ViT

Visual Transformers have obtained state of the art classification accuracies [ViT, CLIP, DeiT], but the best ViT models are extremely compute heavy and running them even only for inference (not doing backpropagation) is expensive. Running transformers cheaply by quantization is not a new problem and it has been tackled before for BERT [BERT] in NLP [Q-BERT, Q8BERT, TernaryBERT, BinaryBERT]. In this project we will be trying to quantize pretrained ViT models. 

Quantizing ViT models for faster inference and smaller models without losing accuracy 

Publication Opportunity:  Binary Networks for Computer Vision 2021 (a CVPR workshop)  

Extensions:  

  • Having a fast pipeline for image inference with ViT will allow us to dig deep into the attention of ViT and analyze it, we might be able to prune some attention heads or replace them with static patterns (like local convolution or dilated patterns), We might be even able to replace the transformer with performer and increase the throughput even more [Performer]. 
  • The same idea could be extended to other ViT based models [DETR, SETR, LSTR, TrackFormer, CPTR, BoTNet, T2TViT] 
  • Learning Transferable Visual Models From Natural Language Supervision [CLIP] 
  • Visual Transformers: Token-based Image Representation and Processing for Computer Vision [ViT] 
  • DeiT: Data-efficient Image Transformers [DeiT] 
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [BERT] 
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT [Q-BERT] 
  • Q8BERT: Quantized 8Bit BERT [Q8BERT] 
  • TernaryBERT: Distillation-aware Ultra-low Bit BERT [TernaryBERT] 
  • BinaryBERT: Pushing the Limit of BERT Quantization [BinaryBERT] 
  • Rethinking Attention with Performers [Performer] 
  • End-to-End Object Detection with Transformers [DETR] 
  • Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR] 
  • End-to-end Lane Shape Prediction with Transformers [LSTR] 
  • TrackFormer: Multi-Object Tracking with Transformers [TrackFormer] 
  • CPTR: Full Transformer Network for Image Captioning [CPTR] 
  • Bottleneck Transformers for Visual Recognition [BoTNet] 
  • Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [T2TViT] 

Multimodal Contrastive Learning

Recently contrastive learning has gained a lot of attention for self-supervised image representation learning [SimCLR, MoCo]. Contrastive learning could be extended to multimodal data, like videos (images and audio) [CMC, CoCLR]. Most contrastive methods require large batch sizes (or large memory pools) which makes them expensive for training. In this project we are going to use non batch size dependent contrastive methods [SwAV, BYOL, SimSiam] to train multimodal representation extractors. 

Our main goal is to compare the proposed method with the CMC baseline, so we will be working with STL10, ImageNet, UCF101, HMDB51, and NYU Depth-V2 datasets. 

Inspired by the recent works on smaller datasets [ConVIRT, CPD], to accelerate the training speed, we could start with two pretrained single-modal models and finetune them with the proposed method.  

  • Extending SwAV to multimodal datasets 
  • Grasping a better understanding of the BYOL 

Publication Opportunity:  MULA 2021 (a CVPR workshop on Multimodal Learning and Applications)  

  • Most knowledge distillation methods for contrastive learners also use large batch sizes (or memory pools) [CRD, SEED], the proposed method could be extended for knowledge distillation. 
  • One could easily extend this idea to multiview learning, for example one could have two different networks working on the same input and train them with contrastive learning, this may lead to better models [DeiT] by cross-model inductive biases communications. 
  • Self-supervised Co-training for Video Representation Learning [CoCLR] 
  • Learning Spatiotemporal Features via Video and Text Pair Discrimination [CPD] 
  • Audio-Visual Instance Discrimination with Cross-Modal Agreement [AVID-CMA] 
  • Self-Supervised Learning by Cross-Modal Audio-Video Clustering [XDC] 
  • Contrastive Multiview Coding [CPC] 
  • Contrastive Learning of Medical Visual Representations from Paired Images and Text [ConVIRT] 
  • A Simple Framework for Contrastive Learning of Visual Representations [SimCLR] 
  • Momentum Contrast for Unsupervised Visual Representation Learning [MoCo] 
  • Bootstrap your own latent: A new approach to self-supervised Learning [BYOL] 
  • Exploring Simple Siamese Representation Learning [SimSiam] 
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [SwAV] 
  • Contrastive Representation Distillation [CRD] 
  • SEED: Self-supervised Distillation For Visual Representation [SEED] 

Robustness of Neural Networks

Neural Networks have been found to achieve surprising performance in several tasks such as classification, detection and segmentation. However, they are also very sensitive to small (controlled) changes to the input. It has been shown that some changes to an image that are not visible to the naked eye may lead the network to output an incorrect label. This thesis will focus on studying recent progress in this area and aim to build a procedure for a trained network to self-assess its reliability in classification or one of the popular computer vision tasks.

Contact: Paolo Favaro

Masters projects at sitem center

The Personalised Medicine Research Group at the sitem Center for Translational Medicine and Biomedical Entrepreneurship is offering multiple MSc thesis projects to the biomed eng MSc students that may also be of interest to the computer science students. Automated quantification of cartilage quality for hip treatment decision support.  PDF Automated quantification of massive rotator cuff tears from MRI. PDF Deep learning-based segmentation and fat fraction analysis of the shoulder muscles using quantitative MRI. PDF Unsupervised Domain Adaption for Cross-Modality Hip Joint Segmentation. PDF Contact:  Dr. Kate Gerber

Internships/Master thesis @ Chronocam

3-6 months internships on event-based computer vision. Chronocam is a rapidly growing startup developing event-based technology, with more than 15 PhDs working on problems like tracking, detection, classification, SLAM, etc. Event-based computer vision has the potential to solve many long-standing problems in traditional computer vision, and this is a super exciting time as this potential is becoming more and more tangible in many real-world applications. For next year we are looking for motivated Master and PhD students with good software engineering skills (C++ and/or python), and preferable good computer vision and deep learning background. PhD internships will be more research focused and possibly lead to a publication.  For each intern we offer a compensation to cover the expenses of living in Paris.  List of some of the topics we want to explore:

  • Photo-realistic image synthesis and super-resolution from event-based data (PhD)
  • Self-supervised representation learning (PhD)
  • End-to-end Feature Learning for Event-based Data
  • Bio-inspired Filtering using Spiking Networks
  • On-the fly Compression of Event-based Streams for Low-Power IoT Cameras
  • Tracking of Multiple Objects with a Dual-Frequency Tracker
  • Event-based Autofocus
  • Stabilizing an Event-based Stream using an IMU
  • Crowd Monitoring for Low-power IoT Cameras
  • Road Extraction from an Event-based Camera Mounted in a Car for Autonomous Driving
  • Sign detection from an Event-based Camera Mounted in a Car for Autonomous Driving
  • High-frequency Eye Tracking

Email with attached CV to Daniele Perrone at  [email protected] .

Contact: Daniele Perrone

Object Detection in 3D Point Clouds

Today we have many 3D scanning techniques that allow us to capture the shape and appearance of objects. It is easier than ever to scan real 3D objects and transform them into a digital model for further processing, such as modeling, rendering or animation. However, the output of a 3D scanner is often a raw point cloud with little to no annotations. The unstructured nature of the point cloud representation makes it difficult for processing, e.g. surface reconstruction. One application is the detection and segmentation of an object of interest.  In this project, the student is challenged to design a system that takes a point cloud (a 3D scan) as input and outputs the names of objects contained in the scan. This output can then be used to eliminate outliers or points that belong to the background. The approach involves collecting a large dataset of 3D scans and training a neural network on it.

Contact: Adrian Wälchli

Shape Reconstruction from a Single RGB Image or Depth Map

A photograph accurately captures the world in a moment of time and from a specific perspective. Since it is a projection of the 3D space to a 2D image plane, the depth information is lost. Is it possible to restore it, given only a single photograph? In general, the answer is no. This problem is ill-posed, meaning that many different plausible depth maps exist, and there is no way of telling which one is the correct one.  However, if we cover one of our eyes, we are still able to recognize objects and estimate how far away they are. This motivates the exploration of an approach where prior knowledge can be leveraged to reduce the ill-posedness of the problem. Such a prior could be learned by a deep neural network, trained with many images and depth maps.

CNN Based Deblurring on Mobile

Deblurring finds many applications in our everyday life. It is particularly useful when taking pictures on handheld devices (e.g. smartphones) where camera shake can degrade important details. Therefore, it is desired to have a good deblurring algorithm implemented directly in the device.  In this project, the student will implement and optimize a state-of-the-art deblurring method based on a deep neural network for deployment on mobile phones (Android).  The goal is to reduce the number of network weights in order to reduce the memory footprint while preserving the quality of the deblurred images. The result will be a camera app that automatically deblurs the pictures, giving the user a choice of keeping the original or the deblurred image.

Depth from Blur

If an object in front of the camera or the camera itself moves while the aperture is open, the region of motion becomes blurred because the incoming light is accumulated in different positions across the sensor. If there is camera motion, there is also parallax. Thus, a motion blurred image contains depth information.  In this project, the student will tackle the problem of recovering a depth-map from a motion-blurred image. This includes the collection of a large dataset of blurred- and sharp images or videos using a pair or triplet of GoPro action cameras. Two cameras will be used in stereo to estimate the depth map, and the third captures the blurred frames. This data is then used to train a convolutional neural network that will predict the depth map from the blurry image.

Unsupervised Clustering Based on Pretext Tasks

The idea of this project is that we have two types of neural networks that work together: There is one network A that assigns images to k clusters and k (simple) networks of type B perform a self-supervised task on those clusters. The goal of all the networks is to make the k networks of type B perform well on the task. The assumption is that clustering in semantically similar groups will help the networks of type B to perform well. This could be done on the MNIST dataset with B being linear classifiers and the task being rotation prediction.

Adversarial Data-Augmentation

The student designs a data augmentation network that transforms training images in such a way that image realism is preserved (e.g. with a constrained spatial transformer network) and the transformed images are more difficult to classify (trained via adversarial loss against an image classifier). The model will be evaluated for different data settings (especially in the low data regime), for example on the MNIST and CIFAR datasets.

Unsupervised Learning of Lip-reading from Videos

People with sensory impairment (hearing, speech, vision) depend heavily on assistive technologies to communicate and navigate in everyday life. The mass production of media content today makes it impossible to manually translate everything into a common language for assistive technologies, e.g. captions or sign language.  In this project, the student employs a neural network to learn a representation for lip-movement in videos in an unsupervised fashion, possibly with an encoder-decoder structure where the decoder reconstructs the audio signal. This requires collecting a large dataset of videos (e.g. from YouTube) of speakers or conversations where lip movement is visible. The outcome will be a neural network that learns an audio-visual representation of lip movement in videos, which can then be leveraged to generate captions for hearing impaired persons.

Learning to Generate Topographic Maps from Satellite Images

Satellite images have many applications, e.g. in meteorology, geography, education, cartography and warfare. They are an accurate and detailed depiction of the surface of the earth from above. Although it is relatively simple to collect many satellite images in an automated way, challenges arise when processing them for use in navigation and cartography. The idea of this project is to automatically convert an arbitrary satellite image, of e.g. a city, to a map of simple 2D shapes (streets, houses, forests) and label them with colors (semantic segmentation). The student will collect a dataset of satellite image and topological maps and train a deep neural network that learns to map from one domain to the other. The data could be obtained from a Google Maps database or similar.

New Variables of Brain Morphometry: the Potential and Limitations of CNN Regression

Timo blattner · sept. 2022.

The calculation of variables of brain morphology is computationally very expensive and time-consuming. A previous work showed the feasibility of ex- tracting the variables directly from T1-weighted brain MRI images using a con- volutional neural network. We used significantly more data and extended their model to a new set of neuromorphological variables, which could become inter- esting biomarkers in the future for the diagnosis of brain diseases. The model shows for nearly all subjects a less than 5% mean relative absolute error. This high relative accuracy can be attributed to the low morphological variance be- tween subjects and the ability of the model to predict the cortical atrophy age trend. The model however fails to capture all the variance in the data and shows large regional differences. We attribute these limitations in part to the moderate to poor reliability of the ground truth generated by FreeSurfer. We further investigated the effects of training data size and model complexity on this regression task and found that the size of the dataset had a significant impact on performance, while deeper models did not perform better. Lack of interpretability and dependence on a silver ground truth are the main drawbacks of this direct regression approach.

Home Monitoring by Radar

Lars ziegler · sept. 2022.

Detection and tracking of humans via UWB radars is a promising and continuously evolving field with great potential for medical technology. This contactless method of acquiring data of a patients movement patterns is ideal for in home application. As irregularities in a patients movement patterns are an indicator for various health problems including neurodegenerative diseases, the insight this data could provide may enable earlier detection of such problems. In this thesis a signal processing pipeline is presented with which a persons movement is modeled. During an experiment 142 measurements were recorded by two separate radar systems and one lidar system which each consisted of multiple sensors. The models that were calculated on these measurements by the signal processing pipeline were used to predict the times when a person stood up or sat down. The predictions showed an accuracy of 72.2%.

Revisiting non-learning based 3D reconstruction from multiple images

Aaron sägesser · oct. 2021.

Arthroscopy consists of challenging tasks and requires skills that even today, young surgeons still train directly throughout the surgery. Existing simulators are expensive and rarely available. Through the growing potential of virtual reality(VR) (head-mounted) devices for simulation and their applicability in the medical context, these devices have become a promising alternative that would be orders of magnitude cheaper and could be made widely available. To build a VR-based training device for arthroscopy is the overall aim of our project, as this would be of great benefit and might even be applicable in other minimally invasive surgery (MIS). This thesis marks a first step of the project with its focus to explore and compare well-known algorithms in a multi-view stereo (MVS) based 3D reconstruction with respect to imagery acquired by an arthroscopic camera. Simultaneously with this reconstruction, we aim to gain essential measures to compare the VR environment to the real world, as validation of the realism of future VR tasks. We evaluate 3 different feature extraction algorithms with 3 different matching techniques and 2 different algorithms for the estimation of the fundamental (F) matrix. The evaluation of these 18 different setups is made with a reconstruction pipeline embedded in a jupyter notebook implemented in python based on common computer vision libraries and compared with imagery generated with a mobile phone as well as with the reconstruction results of state-of-the-art (SOTA) structure-from-motion (SfM) software COLMAP and Multi-View Environment (MVE). Our comparative analysis manifests the challenges of heavy distortion, the fish-eye shape and weak image quality of arthroscopic imagery, as all results are substantially worse using this data. However, there are huge differences regarding the different setups. Scale Invariant Feature Transform (SIFT) and Oriented FAST Rotated BRIEF (ORB) in combination with k-Nearest Neighbour (kNN) matching and Least Median of Squares (LMedS) present the most promising results. Overall, the 3D reconstruction pipeline is a useful tool to foster the process of gaining measurements from the arthroscopic exploration device and to complement the comparative research in this context.

Examination of Unsupervised Representation Learning by Predicting Image Rotations

Eric lagger · sept. 2020.

In recent years deep convolutional neural networks achieved a lot of progress. To train such a network a lot of data is required and in supervised learning algorithms it is necessary that the data is labeled. To label data there is a lot of human work needed and this takes a lot of time and money to be done. To avoid the inconveniences that come with this we would like to find systems that don’t need labeled data and therefore are unsupervised learning algorithms. This is the importance of unsupervised algorithms, even though their outcome is not yet on the same qualitative level as supervised algorithms. In this thesis we will discuss an approach of such a system and compare the results to other papers. A deep convolutional neural network is trained to learn the rotations that have been applied to a picture. So we take a large amount of images and apply some simple rotations and the task of the network is to discover in which direction the image has been rotated. The data doesn’t need to be labeled to any category or anything else. As long as all the pictures are upside down we hope to find some high dimensional patterns for the network to learn.

StitchNet: Image Stitching using Autoencoders and Deep Convolutional Neural Networks

Maurice rupp · sept. 2019.

This thesis explores the prospect of artificial neural networks for image processing tasks. More specifically, it aims to achieve the goal of stitching multiple overlapping images to form a bigger, panoramic picture. Until now, this task is solely approached with ”classical”, hardcoded algorithms while deep learning is at most used for specific subtasks. This thesis introduces a novel end-to-end neural network approach to image stitching called StitchNet, which uses a pre-trained autoencoder and deep convolutional networks. Additionally to presenting several new datasets for the task of supervised image stitching with each 120’000 training and 5’000 validation samples, this thesis also conducts various experiments with different kinds of existing networks designed for image superresolution and image segmentation adapted to the task of image stitching. StitchNet outperforms most of the adapted networks in both quantitative as well as qualitative results.

Facial Expression Recognition in the Wild

Luca rolshoven · sept. 2019.

The idea of inferring the emotional state of a subject by looking at their face is nothing new. Neither is the idea of automating this process using computers. Researchers used to computationally extract handcrafted features from face images that had proven themselves to be effective and then used machine learning techniques to classify the facial expressions using these features. Recently, there has been a trend towards using deeplearning and especially Convolutional Neural Networks (CNNs) for the classification of these facial expressions. Researchers were able to achieve good results on images that were taken in laboratories under the same or at least similar conditions. However, these models do not perform very well on more arbitrary face images with different head poses and illumination. This thesis aims to show the challenges of Facial Expression Recognition (FER) in this wild setting. It presents the currently used datasets and the present state-of-the-art results on one of the biggest facial expression datasets currently available. The contributions of this thesis are twofold. Firstly, I analyze three famous neural network architectures and their effectiveness on the classification of facial expressions. Secondly, I present two modifications of one of these networks that lead to the proposed STN-COV model. While this model does not outperform all of the current state-of-the-art models, it does beat several ones of them.

A Study of 3D Reconstruction of Varying Objects with Deformable Parts Models

Raoul grossenbacher · july 2019.

This work covers a new approach to 3D reconstruction. In traditional 3D reconstruction one uses multiple images of the same object to calculate a 3D model by taking information gained from the differences between the images, like camera position, illumination of the images, rotation of the object and so on, to compute a point cloud representing the object. The characteristic trait shared by all these approaches is that one can almost change everything about the image, but it is not possible to change the object itself, because one needs to find correspondences between the images. To be able to use different instances of the same object, we used a 3D DPM model that can find different parts of an object in an image, thereby detecting the correspondences between the different pictures, which we then can use to calculate the 3D model. To take this theory to practise, we gave a 3D DPM model, which was trained to detect cars, pictures of different car brands, where no pair of images showed the same vehicle and used the detected correspondences and the Factorization Method to compute the 3D point cloud. This technique leads to a completely new approach in 3D reconstruction, because changing the object itself was never done before.

Motion deblurring in the wild replication and improvements

Alvaro juan lahiguera · jan. 2019, coma outcome prediction with convolutional neural networks, stefan jonas · oct. 2018, automatic correction of self-introduced errors in source code, sven kellenberger · aug. 2018, neural face transfer: training a deep neural network to face-swap, till nikolaus schnabel · july 2018.

This thesis explores the field of artificial neural networks with realistic looking visual outputs. It aims at morphing face pictures of a specific identity to look like another individual by only modifying key features, such as eye color, while leaving identity-independent features unchanged. Prior works have covered the topic of symmetric translation between two specific domains but failed to optimize it on faces where only parts of the image may be changed. This work applies a face masking operation to the output at training time, which forces the image generator to preserve colors while altering the face, fitting it naturally inside the unmorphed surroundings. Various experiments are conducted including an ablation study on the final setting, decreasing the baseline identity switching performance from 81.7% to 75.8 % whilst improving the average χ2 color distance from 0.551 to 0.434. The provided code-based software gives users easy access to apply this neural face swap to images and videos of arbitrary crop and brings Computer Vision one step closer to replacing Computer Graphics in this specific area.

A Study of the Importance of Parts in the Deformable Parts Model

Sammer puran · june 2017, self-similarity as a meta feature, lucas husi · april 2017, a study of 3d deformable parts models for detection and pose-estimation, simon jenni · march 2015, accelerated federated learning on client silos with label noise: rho selection in classification and segmentation, irakli kelbakiani · may 2024.

Federated Learning has recently gained more research interest. This increased attention is caused by factors including the growth of decentralized data, privacy concerns, and new privacy regulations. In Federated Learning, remote servers keep training a model on local datasets independently, and subsequently, local models are aggregated into a global model, which achieves better overall performance. Sending local model weights instead of the entire dataset is a significant advantage of Federated Learning over centralized classical machine learning algorithms. Federated learning involves uploading and downloading model parameters multiple times, so there are multiple communication rounds between the global server and remote client servers, which imposes challenges. The high number of necessary communication rounds not only increases high-cost communication overheads but is also a critical limitation for servers with low network bandwidth, which leads to latency and a higher probability of training failures caused by communication breakdowns. To mitigate these challenges, we aim to provide a fast-convergent Federated Learning training methodology that decreases the number of necessary communication rounds. We found a paper about Reducible Holdout Loss Selection (RHO-Loss) batch selection methodology, which ”selects low-noise, task-relevant, non-redundant points for training” [1]; we hypothesize, if client silos employ RHO-Loss methodology and successfully avoid training their local models on noisy and non-relevant samples, clients may offer stable and consistent updates to the global server, which could lead to faster convergence of the global model. Our contribution focuses on investigating the RHO-Loss method in a simulated federated setting for the Clothing1M dataset. We also examine its applicability to medical datasets and check its effectiveness in a simulated federated environment. Our experimental results show a promising outcome, specifically a reduction in communication rounds for the Clothing1M dataset. However, as the success of the RHO-Loss selection method depends on the availability of sufficient training data for the target RHO model and for the Irreducible RHO model, we emphasize that our contribution applies to those Federated Learning scenarios where client silos hold enough training data to successfully train and benefit from their RHO model on their local dataset.

Amodal Leaf Segmentation

Nicolas maier · nov. 2023.

Plant phenotyping is the process of measuring and analyzing various traits of plants. It provides essential information on how genetic and environmental factors affect plant growth and development. Manual phenotyping is highly time-consuming; therefore, many computer vision and machine learning based methods have been proposed in the past years to perform this task automatically based on images of the plants. However, the publicly available datasets (in particular, of Arabidopsis thaliana) are limited in size and diversity, making them unsuitable to generalize to new unseen environments. In this work, we propose a complete pipeline able to automatically extract traits of interest from an image of Arabidopsis thaliana. Our method uses a minimal amount of existing annotated data from a source domain to generate a large synthetic dataset adapted to a different target domain (e.g., different backgrounds, lighting conditions, and plant layouts). In addition, unlike the source dataset, the synthetic one provides ground-truth annotations for the occluded parts of the leaves, which are relevant when measuring some characteristics of the plant, e.g., its total area. This synthetic dataset is then used to train a model to perform amodal instance segmentation of the leaves to obtain the total area, leaf count, and color of each plant. To validate our approach, we create a small dataset composed of manually annotated real images of Arabidopsis thaliana, which is used to assess the performance of the models.

Assessment of movement and pose in a hospital bed by ambient and wearable sensor technology in healthy subjects

Tony licata · sept. 2022.

The use of automated systems describing the human motion has become possible in various domains. Most of the proposed systems are designed to work with people moving around in a standing position. Because such system could be interesting in a medical environment, we propose in this work a pipeline that can effectively predict human motion from people lying on beds. The proposed pipeline is tested with a data set composed of 41 participants executing 7 predefined tasks in a bed. The motion of the participants is measured with video cameras, accelerometers and pressure mat. Various experiments are carried with the information retrieved from the data set. Two approaches combining the data from the different measure technologies are explored. The performance of the different carried experiments is measured, and the proposed pipeline is composed with components providing the best results. Later on, we show that the proposed pipeline only needs to use the video cameras, which make the proposed environment easier to implement in real life situations.

Machine Learning Based Prediction of Mental Health Using Wearable-measured Time Series

Seyedeh sharareh mirzargar · sept. 2022.

Depression is the second major cause for years spent in disability and has a growing prevalence in adolescents. The recent Covid-19 pandemic has intensified the situation and limited in-person patient monitoring due to distancing measures. Recent advances in wearable devices have made it possible to record the rest/activity cycle remotely with high precision and in real-world contexts. We aim to use machine learning methods to predict an individual's mental health based on wearable-measured sleep and physical activity. Predicting an impending mental health crisis of an adolescent allows for prompt intervention, detection of depression onset or its recursion, and remote monitoring. To achieve this goal, we train three primary forecasting models; linear regression, random forest, and light gradient boosted machine (LightGBM); and two deep learning models; block recurrent neural network (block RNN) and temporal convolutional network (TCN); on Actigraph measurements to forecast mental health in terms of depression, anxiety, sleepiness, stress, sleep quality, and behavioral problems. Our models achieve a high forecasting performance, the random forest being the winner to reach an accuracy of 98% for forecasting the trait anxiety. We perform extensive experiments to evaluate the models' performance in accuracy, generalization, and feature utilization, using a naive forecaster as the baseline. Our analysis shows minimal mental health changes over two months, making the prediction task easily achievable. Due to these minimal changes in mental health, the models tend to primarily use the historical values of mental health evaluation instead of Actigraph features. At the time of this master thesis, the data acquisition step is still in progress. In future work, we plan to train the models on the complete dataset using a longer forecasting horizon to increase the level of mental health changes and perform transfer learning to compensate for the small dataset size. This interdisciplinary project demonstrates the opportunities and challenges in machine learning based prediction of mental health, paving the way toward using the same techniques to forecast other mental disorders such as internalizing disorder, Parkinson's disease, Alzheimer's disease, etc. and improving the quality of life for individuals who have some mental disorder.

CNN Spike Detector: Detection of Spikes in Intracranial EEG using Convolutional Neural Networks

Stefan jonas · oct. 2021.

The detection of interictal epileptiform discharges in the visual analysis of electroencephalography (EEG) is an important but very difficult, tedious, and time-consuming task. There have been decades of research on computer-assisted detection algorithms, most recently focused on using Convolutional Neural Networks (CNNs). In this thesis, we present the CNN Spike Detector, a convolutional neural network to detect spikes in intracranial EEG. Our dataset of 70 intracranial EEG recordings from 26 subjects with epilepsy introduces new challenges in this research field. We report cross-validation results with a mean AUC of 0.926 (+- 0.04), an area under the precision-recall curve (AUPRC) of 0.652 (+- 0.10) and 12.3 (+- 7.47) false positive epochs per minute for a sensitivity of 80%. A visual examination of false positive segments is performed to understand the model behavior leading to a relatively high false detection rate. We notice issues with the evaluation measures and highlight a major limitation of the common approach of detecting spikes using short segments, namely that the network is not capable to consider the greater context of the segment with regards to its origination. For this reason, we present the Context Model, an extension in which the CNN Spike Detector is supplied with additional information about the channel. Results show promising but limited performance improvements. This thesis provides important findings about the spike detection task for intracranial EEG and lays out promising future research directions to develop a network capable of assisting experts in real-world clinical applications.

PolitBERT - Deepfake Detection of American Politicians using Natural Language Processing

Maurice rupp · april 2021.

This thesis explores the application of modern Natural Language Processing techniques to the detection of artificially generated videos of popular American politicians. Instead of focusing on detecting anomalies and artifacts in images and sounds, this thesis focuses on detecting irregularities and inconsistencies in the words themselves, opening up a new possibility to detect fake content. A novel, domain-adapted, pre-trained version of the language model BERT combined with several mechanisms to overcome severe dataset imbalances yielded the best quantitative as well as qualitative results. Additionally to the creation of the biggest publicly available dataset of English-speaking politicians consisting of 1.5 M sentences from over 1000 persons, this thesis conducts various experiments with different kinds of text classification and sequence processing algorithms applied to the political domain. Furthermore, multiple ablations to manage severe data imbalance are presented and evaluated.

A Study on the Inversion of Generative Adversarial Networks

Ramona beck · march 2021.

The desire to use generative adversarial networks (GANs) for real-world tasks such as object segmentation or image manipulation is increasing as synthesis quality improves, which has given rise to an emerging research area called GAN inversion that focuses on exploring methods for embedding real images into the latent space of a GAN. In this work, we investigate different GAN inversion approaches using an existing generative model architecture that takes a completely unsupervised approach to object segmentation and is based on StyleGAN2. In particular, we propose and analyze algorithms for embedding real images into the different latent spaces Z, W, and W+ of StyleGAN following an optimization-based inversion approach, while also investigating a novel approach that allows fine-tuning of the generator during the inversion process. Furthermore, we investigate a hybrid and a learning-based inversion approach, where in the former we train an encoder with embeddings optimized by our best optimization-based inversion approach, and in the latter we define an autoencoder, consisting of an encoder and the generator of our generative model as a decoder, and train it to map an image into the latent space. We demonstrate the effectiveness of our methods as well as their limitations through a quantitative comparison with existing inversion methods and by conducting extensive qualitative and quantitative experiments with synthetic data as well as real images from a complex image dataset. We show that we achieve qualitatively satisfying embeddings in the W and W+ spaces with our optimization-based algorithms, that fine-tuning the generator during the inversion process leads to qualitatively better embeddings in all latent spaces studied, and that the learning-based approach also benefits from a variable generator as well as a pre-training with our hybrid approach. Furthermore, we evaluate our approaches on the object segmentation task and show that both our optimization-based and our hybrid and learning-based methods are able to generate meaningful embeddings that achieve reasonable object segmentations. Overall, our proposed methods illustrate the potential that lies in the GAN inversion and its application to real-world tasks, especially in the relaxed version of the GAN inversion where the weights of the generator are allowed to vary.

Multi-scale Momentum Contrast for Self-supervised Image Classification

Zhao xueqi · dec. 2020.

With the maturity of supervised learning technology, people gradually shift the research focus to the field of self-supervised learning. ”Momentum Contrast” (MoCo) proposes a new self-supervised learning method and raises the correct rate of self-supervised learning to a new level. Inspired by another article ”Representation Learning by Learning to Count”, if a picture is divided into four parts and passed through a neural network, it is possible to further improve the accuracy of MoCo. Different from the original MoCo, this MoCo variant (Multi-scale MoCo) does not directly pass the image through the encoder after the augmented images. Multi-scale MoCo crops and resizes the augmented images, and the obtained four parts are respectively passed through the encoder and then summed (upsampled version do not do resize to input but resize the contrastive samples). This method of images crop is not only used for queue q but also used for comparison queue k, otherwise the weights of queue k might be damaged during the moment update. This will further discussed in the experiments chapter between downsampled Multi-scale version and downsampled both Multi-scale version. Human beings also have the same principle of object recognition: when human beings see something they are familiar with, even if the object is not fully displayed, people can still guess the object itself with a high probability. Because of this, Multi-scale MoCo applies this concept to the pretext part of MoCo, hoping to obtain better feature extraction. In this thesis, there are three versions of Multi-scale MoCo, downsampled input samples version, downsampled input samples and contrast samples version and upsampled input samples version. The differences between these versions will be described in more detail later. The neural network architecture comparison includes ResNet50 , and the tested data set is STL-10. The weights obtained in pretext will be transferred to self-supervised learning, and in the process of self-supervised learning, the weights of other layers except the final linear layer are frozen without changing (these weights come from pretext).

Self-Supervised Learning Using Siamese Networks and Binary Classifier

Dušan mihajlov · march 2020.

In this thesis, we present several approaches for training a convolutional neural network using only unlabeled data. Our autonomously supervised learning algorithms are based on connections between image patch i. e. zoomed image and its original. Using the siamese architecture neural network we aim to recognize, if the image patch, which is input to the first neural network part, comes from the same image presented to the second neural network part. By applying transformations to both images, and different zoom sizes at different positions, we force the network to extract high level features using its convolutional layers. At the top of our siamese architecture, we have a simple binary classifier that measures the difference between feature maps that we extract and makes a decision. Thus, the only way that the classifier will solve the task correctly is when our convolutional layers are extracting useful representations. Those representations we can than use to solve many different tasks that are related to the data used for unsupervised training. As the main benchmark for all of our models, we used STL10 dataset, where we train a linear classifier on the top of our convolutional layers with a small amount of manually labeled images, which is a widely used benchmark for unsupervised learning tasks. We also combine our idea with recent work on the same topic, and the network called RotNet, which makes use of image rotations and therefore forces the network to learn rotation dependent features from the dataset. As a result of this combination we create a new procedure that outperforms original RotNet.

Learning Object Representations by Mixing Scenes

Lukas zbinden · may 2019.

In the digital age of ever increasing data amassment and accessibility, the demand for scalable machine learning models effective at refining the new oil is unprecedented. Unsupervised representation learning methods present a promising approach to exploit this invaluable yet unlabeled digital resource at scale. However, a majority of these approaches focuses on synthetic or simplified datasets of images. What if a method could learn directly from natural Internet-scale image data? In this thesis, we propose a novel approach for unsupervised learning of object representations by mixing natural image scenes. Without any human help, our method mixes visually similar images to synthesize new realistic scenes using adversarial training. In this process the model learns to represent and understand the objects prevalent in natural image data and makes them available for downstream applications. For example, it enables the transfer of objects from one scene to another. Through qualitative experiments on complex image data we show the effectiveness of our method along with its limitations. Moreover, we benchmark our approach quantitatively against state-of-the-art works on the STL-10 dataset. Our proposed method demonstrates the potential that lies in learning representations directly from natural image data and reinforces it as a promising avenue for future research.

Representation Learning using Semantic Distances

Markus roth · may 2019, zero-shot learning using generative adversarial networks, hamed hemati · dec. 2018, dimensionality reduction via cnns - learning the distance between images, ioannis glampedakis · sept. 2018, learning to play othello using deep reinforcement learning and self play, thomas simon steinmann · sept. 2018, aba-j interactive multi-modality tissue sectionto-volume alignment: a brain atlasing toolkit for imagej, felix meyenhofer · march 2018, learning visual odometry with recurrent neural networks, adrian wälchli · feb. 2018.

In computer vision, Visual Odometry is the problem of recovering the camera motion from a video. It is related to Structure from Motion, the problem of reconstructing the 3D geometry from a collection of images. Decades of research in these areas have brought successful algorithms that are used in applications like autonomous navigation, motion capture, augmented reality and others. Despite the success of these prior works in real-world environments, their robustness is highly dependent on manual calibration and the magnitude of noise present in the images in form of, e.g., non-Lambertian surfaces, dynamic motion and other forms of ambiguity. This thesis explores an alternative approach to the Visual Odometry problem via Deep Learning, that is, a specific form of machine learning with artificial neural networks. It describes and focuses on the implementation of a recent work that proposes the use of Recurrent Neural Networks to learn dependencies over time due to the sequential nature of the input. Together with a convolutional neural network that extracts motion features from the input stream, the recurrent part accumulates knowledge from the past to make camera pose estimations at each point in time. An analysis on the performance of this system is carried out on real and synthetic data. The evaluation covers several ways of training the network as well as the impact and limitations of the recurrent connection for Visual Odometry.

Crime location and timing prediction

Bernard swart · jan. 2018, from cartoons to real images: an approach to unsupervised visual representation learning, simon jenni · feb. 2017, automatic and large-scale assessment of fluid in retinal oct volume, nina mujkanovic · dec. 2016, segmentation in 3d using eye-tracking technology, michele wyss · july 2016, accurate scale thresholding via logarithmic total variation prior, remo diethelm · aug. 2014, novel techniques for robust and generalizable machine learning, abdelhak lemkhenter · sept. 2023.

Neural networks have transcended their status of powerful proof-of-concept machine learning into the realm of a highly disruptive technology that has revolutionized many quantitative fields such as drug discovery, autonomous vehicles, and machine translation. Today, it is nearly impossible to go a single day without interacting with a neural network-powered application. From search engines to on-device photo-processing, neural networks have become the go-to solution thanks to recent advances in computational hardware and an unprecedented scale of training data. Larger and less curated datasets, typically obtained through web crawling, have greatly propelled the capabilities of neural networks forward. However, this increase in scale amplifies certain challenges associated with training such models. Beyond toy or carefully curated datasets, data in the wild is plagued with biases, imbalances, and various noisy components. Given the larger size of modern neural networks, such models run the risk of learning spurious correlations that fail to generalize beyond their training data. This thesis addresses the problem of training more robust and generalizable machine learning models across a wide range of learning paradigms for medical time series and computer vision tasks. The former is a typical example of a low signal-to-noise ratio data modality with a high degree of variability between subjects and datasets. There, we tailor the training scheme to focus on robust patterns that generalize to new subjects and ignore the noisier and subject-specific patterns. To achieve this, we first introduce a physiologically inspired unsupervised training task and then extend it by explicitly optimizing for cross-dataset generalization using meta-learning. In the context of image classification, we address the challenge of training semi-supervised models under class imbalance by designing a novel label refinement strategy with higher local sensitivity to minority class samples while preserving the global data distribution. Lastly, we introduce a new Generative Adversarial Networks training loss. Such generative models could be applied to improve the training of subsequent models in the low data regime by augmenting the dataset using generated samples. Unfortunately, GAN training relies on a delicate balance between its components, making it prone mode collapse. Our contribution consists of defining a more principled GAN loss whose gradients incentivize the generator model to seek out missing modes in its distribution. All in all, this thesis tackles the challenge of training more robust machine learning models that can generalize beyond their training data. This necessitates the development of methods specifically tailored to handle the diverse biases and spurious correlations inherent in the data. It is important to note that achieving greater generalizability in models goes beyond simply increasing the volume of data; it requires meticulous consideration of training objectives and model architecture. By tackling these challenges, this research contributes to advancing the field of machine learning and underscores the significance of thoughtful design in obtaining more resilient and versatile models.

Automated Sleep Scoring, Deep Learning and Physician Supervision

Luigi fiorillo · oct. 2022.

Sleep plays a crucial role in human well-being. Polysomnography is used in sleep medicine as a diagnostic tool, so as to objectively analyze the quality of sleep. Sleep scoring is the procedure of extracting sleep cycle information from the wholenight electrophysiological signals. The scoring is done worldwide by the sleep physicians according to the official American Academy of Sleep Medicine (AASM) scoring manual. In the last decades, a wide variety of deep learning based algorithms have been proposed to automatise the sleep scoring task. In this thesis we study the reasons why these algorithms fail to be introduced in the daily clinical routine, with the perspective of bridging the existing gap between the automatic sleep scoring models and the sleep physicians. In this light, the primary step is the design of a simplified sleep scoring architecture, also providing an estimate of the model uncertainty. Beside achieving results on par with most up-to-date scoring systems, we demonstrate the efficiency of ensemble learning based algorithms, together with label smoothing techniques, in both enhancing the performance and calibrating the simplified scoring model. We introduced an uncertainty estimate procedure, so as to identify the most challenging sleep stage predictions, and to quantify the disagreement between the predictions given by the model and the annotation given by the physicians. In this thesis we also propose a novel method to integrate the inter-scorer variability into the training procedure of a sleep scoring model. We clearly show that a deep learning model is able to encode this variability, so as to better adapt to the consensus of a group of scorers-physicians. We finally address the generalization ability of a deep learning based sleep scoring system, further studying its resilience to the sleep complexity and to the AASM scoring rules. We can state that there is no need to train the algorithm strictly following the AASM guidelines. Most importantly, using data from multiple data centers results in a better performing model compared with training on a single data cohort. The variability among different scorers and data centers needs to be taken into account, more than the variability among sleep disorders.

Learning Representations for Controllable Image Restoration

Givi meishvili · march 2022.

Deep Convolutional Neural Networks have sparked a renaissance in all the sub-fields of computer vision. Tremendous progress has been made in the area of image restoration. The research community has pushed the boundaries of image deblurring, super-resolution, and denoising. However, given a distorted image, most existing methods typically produce a single restored output. The tasks mentioned above are inherently ill-posed, leading to an infinite number of plausible solutions. This thesis focuses on designing image restoration techniques capable of producing multiple restored results and granting users more control over the restoration process. Towards this goal, we demonstrate how one could leverage the power of unsupervised representation learning. Image restoration is vital when applied to distorted images of human faces due to their social significance. Generative Adversarial Networks enable an unprecedented level of generated facial details combined with smooth latent space. We leverage the power of GANs towards the goal of learning controllable neural face representations. We demonstrate how to learn an inverse mapping from image space to these latent representations, tuning these representations towards a specific task, and finally manipulating latent codes in these spaces. For example, we show how GANs and their inverse mappings enable the restoration and editing of faces in the context of extreme face super-resolution and the generation of novel view sharp videos from a single motion-blurred image of a face. This thesis also addresses more general blind super-resolution, denoising, and scratch removal problems, where blur kernels and noise levels are unknown. We resort to contrastive representation learning and first learn the latent space of degradations. We demonstrate that the learned representation allows inference of ground-truth degradation parameters and can guide the restoration process. Moreover, it enables control over the amount of deblurring and denoising in the restoration via manipulation of latent degradation features.

Learning Generalizable Visual Patterns Without Human Supervision

Simon jenni · oct. 2021.

Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in a renaissance in computer vision. However, almost all of the visual data we generate daily - several human lives worth of it - remains unlabeled and thus out of reach of today’s dominant supervised learning paradigm. This thesis focuses on techniques that steer deep models towards learning generalizable visual patterns without human supervision. Our primary tool in this endeavor is the design of Self-Supervised Learning tasks, i.e., pretext-tasks for which labels do not involve human labor. Besides enabling the learning from large amounts of unlabeled data, we demonstrate how self-supervision can capture relevant patterns that supervised learning largely misses. For example, we design learning tasks that learn deep representations capturing shape from images, motion from video, and 3D pose features from multi-view data. Notably, these tasks’ design follows a common principle: The recognition of data transformations. The strong performance of the learned representations on downstream vision tasks such as classification, segmentation, action recognition, or pose estimation validate this pretext-task design. This thesis also explores the use of Generative Adversarial Networks (GANs) for unsupervised representation learning. Besides leveraging generative adversarial learning to define image transformation for self-supervised learning tasks, we also address training instabilities of GANs through the use of noise. While unsupervised techniques can significantly reduce the burden of supervision, in the end, we still rely on some annotated examples to fine-tune learned representations towards a target task. To improve the learning from scarce or noisy labels, we describe a supervised learning algorithm with improved generalization in these challenging settings.

Learning Interpretable Representations of Images

Attila szabó · june 2019.

Computers represent images with pixels and each pixel contains three numbers for red, green and blue colour values. These numbers are meaningless for humans and they are mostly useless when used directly with classical machine learning techniques like linear classifiers. Interpretable representations are the attributes that humans understand: the colour of the hair, viewpoint of a car or the 3D shape of the object in the scene. Many computer vision tasks can be viewed as learning interpretable representations, for example a supervised classification algorithm directly learns to represent images with their class labels. In this work we aim to learn interpretable representations (or features) indirectly with lower levels of supervision. This approach has the advantage of cost savings on dataset annotations and the flexibility of using the features for multiple follow-up tasks. We made contributions in three main areas: weakly supervised learning, unsupervised learning and 3D reconstruction. In the weakly supervised case we use image pairs as supervision. Each pair shares a common attribute and differs in a varying attribute. We propose a training method that learns to separate the attributes into separate feature vectors. These features then are used for attribute transfer and classification. We also show theoretical results on the ambiguities of the learning task and the ways to avoid degenerate solutions. We show a method for unsupervised representation learning, that separates semantically meaningful concepts. We explain and show ablation studies how the components of our proposed method work: a mixing autoencoder, a generative adversarial net and a classifier. We propose a method for learning single image 3D reconstruction. It is done using only the images, no human annotation, stereo, synthetic renderings or ground truth depth map is needed. We train a generative model that learns the 3D shape distribution and an encoder to reconstruct the 3D shape. For that we exploit the notion of image realism. It means that the 3D reconstruction of the object has to look realistic when it is rendered from different random angles. We prove the efficacy of our method from first principles.

Learning Controllable Representations for Image Synthesis

Qiyang hu · june 2019.

In this thesis, our focus is learning a controllable representation and applying the learned controllable feature representation on images synthesis, video generation, and even 3D reconstruction. We propose different methods to disentangle the feature representation in neural network and analyze the challenges in disentanglement such as reference ambiguity and shortcut problem when using the weak label. We use the disentangled feature representation to transfer attributes between images such as exchanging hairstyle between two face images. Furthermore, we study the problem of how another type of feature, sketch, works in a neural network. The sketch can provide shape and contour of an object such as the silhouette of the side-view face. We leverage the silhouette constraint to improve the 3D face reconstruction from 2D images. The sketch can also provide the moving directions of one object, thus we investigate how one can manipulate the object to follow the trajectory provided by a user sketch. We propose a method to automatically generate video clips from a single image input using the sketch as motion and trajectory guidance to animate the object in that image. We demonstrate the efficiency of our approaches on several synthetic and real datasets.

Beyond Supervised Representation Learning

Mehdi noroozi · jan. 2019.

The complexity of any information processing task is highly dependent on the space where data is represented. Unfortunately, pixel space is not appropriate for the computer vision tasks such as object classification. The traditional computer vision approaches involve a multi-stage pipeline where at first images are transformed to a feature space through a handcrafted function and then consequenced by the solution in the feature space. The challenge with this approach is the complexity of designing handcrafted functions that extract robust features. The deep learning based approaches address this issue by end-to-end training of a neural network for some tasks that lets the network to discover the appropriate representation for the training tasks automatically. It turns out that image classification task on large scale annotated datasets yields a representation transferable to other computer vision tasks. However, supervised representation learning is limited to annotations. In this thesis we study self-supervised representation learning where the goal is to alleviate these limitations by substituting the classification task with pseudo tasks where the labels come for free. We discuss self-supervised learning by solving jigsaw puzzles that uses context as supervisory signal. The rational behind this task is that the network requires to extract features about object parts and their spatial configurations to solve the jigsaw puzzles. We also discuss a method for representation learning that uses an artificial supervisory signal based on counting visual primitives. This supervisory signal is obtained from an equivariance relation. We use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. The most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. We discuss a novel framework for self-supervised learning that overcomes limitations in designing and comparing different tasks, models, and data domains. In particular, our framework decouples the structure of the self-supervised model from the final task-specific finetuned model. Finally, we study the problem of multi-task representation learning. A naive approach to enhance the representation learned by a task is to train the task jointly with other tasks that capture orthogonal attributes. Having a diverse set of auxiliary tasks, imposes challenges on multi-task training from scratch. We propose a framework that allows us to combine arbitrarily different feature spaces into a single deep neural network. We reduce the auxiliary tasks to classification tasks and the multi-task learning to multi-label classification task consequently. Nevertheless, combining multiple representation space without being aware of the target task might be suboptimal. As our second contribution, we show empirically that this is indeed the case and propose to combine multiple tasks after the fine-tuning on the target task.

Motion Deblurring from a Single Image

Meiguang jin · dec. 2018.

With the information explosion, a tremendous amount photos is captured and shared via social media everyday. Technically, a photo requires a finite exposure to accumulate light from the scene. Thus, objects moving during the exposure generate motion blur in a photo. Motion blur is an image degradation that makes visual content less interpretable and is therefore often seen as a nuisance. Although motion blur can be reduced by setting a short exposure time, an insufficient amount of light has to be compensated through increasing the sensor’s sensitivity, which will inevitably bring large amount of sensor noise. Thus this motivates the necessity of removing motion blur computationally. Motion deblurring is an important problem in computer vision and it is challenging due to its ill-posed nature, which means the solution is not well defined. Mathematically, a blurry image caused by uniform motion is formed by the convolution operation between a blur kernel and a latent sharp image. Potentially there are infinite pairs of blur kernel and latent sharp image that can result in the same blurry image. Hence, some prior knowledge or regularization is required to address this problem. Even if the blur kernel is known, restoring the latent sharp image is still difficult as the high frequency information has been removed. Although we can model the uniform motion deblurring problem mathematically, it can only address the camera in-plane translational motion. Practically, motion is more complicated and can be non-uniform. Non-uniform motion blur can come from many sources, camera out-of-plane rotation, scene depth change, object motion and so on. Thus, it is more challenging to remove non-uniform motion blur. In this thesis, our focus is motion blur removal. We aim to address four challenging motion deblurring problems. We start from the noise blind image deblurring scenario where blur kernel is known but the noise level is unknown. We introduce an efficient and robust solution based on a Bayesian framework using a smooth generalization of the 0−1 loss to address this problem. Then we study the blind uniform motion deblurring scenario where both the blur kernel and the latent sharp image are unknown. We explore the relative scale ambiguity between the latent sharp image and blur kernel to address this issue. Moreover, we study the face deblurring problem and introduce a novel deep learning network architecture to solve it. We also address the general motion deblurring problem and particularly we aim at recovering a sequence of 7 frames each depicting some instantaneous motion of the objects in the scene.

Towards a Novel Paradigm in Blind Deconvolution: From Natural to Cartooned Image Statistics

Daniele perrone · july 2015.

In this thesis we study the blind deconvolution problem. Blind deconvolution consists in the estimation of a sharp image and a blur kernel from an observed blurry image. Because the blur model admits several solutions it is necessary to devise an image prior that favors the true blur kernel and sharp image. Recently it has been shown that a class of blind deconvolution formulations and image priors has the no-blur solution as global minimum. Despite this shortcoming, algorithms based on these formulations and priors can successfully solve blind deconvolution. In this thesis we show that a suitable initialization can exploit the non-convexity of the problem and yield the desired solution. Based on these conclusions, we propose a novel “vanilla” algorithm stripped of any enhancement typically used in the literature. Our algorithm, despite its simplicity, is able to compete with the top performers on several datasets. We have also investigated a remarkable behavior of a 1998 algorithm, whose formulation has the no-blur solution as global minimum: even when initialized at the no-blur solution, it converges to the correct solution. We show that this behavior is caused by an apparently insignificant implementation strategy that makes the algorithm no longer minimize the original cost functional. We also demonstrate that this strategy improves the results of our “vanilla” algorithm. Finally, we present a study of image priors for blind deconvolution. We provide experimental evidence supporting the recent belief that a good image prior is one that leads to a good blur estimate rather than being a good natural image statistical model. By focusing the attention on the blur estimation alone, we show that good blur estimates can be obtained even when using images quite different from the true sharp image. This allows using image priors, such as those leading to “cartooned” images, that avoid the no-blur solution. By using an image prior that produces “cartooned” images we achieve state-of-the-art results on different publicly available datasets. We therefore suggests a shift of paradigm in blind deconvolution: from modeling natural image statistics to modeling cartooned image statistics.

New Perspectives on Uncalibrated Photometric Stereo

Thoma papadhimitri · june 2014.

This thesis investigates the problem of 3D reconstruction of a scene from 2D images. In particular, we focus on photometric stereo which is a technique that computes the 3D geometry from at least three images taken from the same viewpoint and under different illumination conditions. When the illumination is unknown (uncalibrated photometric stereo) the problem is ambiguous: different combinations of geometry and illumination can generate the same images. First, we solve the ambiguity by exploiting the Lambertian reflectance maxima. These are points defined on curved surfaces where the normals are parallel to the light direction. Then, we propose a solution that can be computed in closed-form and thus very efficiently. Our algorithm is also very robust and yields always the same estimate regardless of the initial ambiguity. We validate our method on real world experiments and achieve state-of-art results. In this thesis we also solve for the first time the uncalibrated photometric stereo problem under the perspective projection model. We show that unlike in the orthographic case, one can uniquely reconstruct the normals of the object and the lights given only the input images and the camera calibration (focal length and image center). We also propose a very efficient algorithm which we validate on synthetic and real world experiments and show that the proposed technique is a generalization of the orthographic case. Finally, we investigate the uncalibrated photometric stereo problem in the case where the lights are distributed near the scene. In this case we propose an alternating minimization technique which converges quickly and overcomes the limitations of prior work that assumes distant illumination. We show experimentally that adopting a near-light model for real world scenes yields very accurate reconstructions.

You are using an outdated browser. Please upgrade your browser .

T4Tutorials.com

Computer vision research topics ideas.

List of Computer vision Research Topics Ideas for MS and PH.D. 1. Deep learning-enabled medical computer vision – Research questions 2. Deep learning, computer vision, and entomology – Research questions 3. Exploring human–nature interactions in national parks with social media photographs and computer vision 4. Assessing the potential for deep learning and computer vision to identify bumble bee species from images 6. Duplicate detection of images using computer vision techniques – Research questions 7. Noncontact cable force estimation with unmanned aerial vehicle and computer vision 8. Computer vision based two-stage waste recognition-retrieval algorithm for waste classification 9. A survey on generative adversarial networks for imbalance problems in computer vision tasks 10. Deception in the eyes of deceiver: A computer vision and machine learning based automated deception detection 11. A computer vision approach based on deep learning for the detection of dairy cows in free stall barn 12. Classification of fermented cocoa beans (cut test) using computer vision 13. Real-time water level monitoring using live cameras and computer vision techniques 14. Aeroelastic Vibration Measurement Based on Laser and Computer Vision Technique 15. Individualized SAR calculations using computer vision-based MR segmentation and a fast electromagnetic solver 16. Crop Nutrition and Computer Vision Technology 17. Advancing Eosinophilic Esophagitis Diagnosis and Phenotype Assessment with Deep Learning Computer Vision 18. Computer Vision-Based Bridge Damage Detection Using Deep Convolutional Networks with Expectation Maximum Attention Module 19. Analysis of ultrasonic vocalizations from mice using computer vision and machine learning 20. Developing a mold-free approach for complex glulam production with the assist of computer vision technologies 21. Decoding depressive disorder using computer vision 22. Assessment of Computer Vision Syndrome and Personal Risk Factors among Employees of Commercial Bank of Ethiopia in Addis Ababa, Ethiopia 23. One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision 24. A survey of image labelling for computer vision applications 25. Development of Kid Height Measurement Application based on Image using Computer Vision 26. Computer vision AC-STEM automated image analysis for 2D nanopore applications 27. Displacement Identification by Computer Vision for Condition Monitoring of Rail Vehicle Bearings 28. An Open-Source Computer Vision Tool for Automated Vocal Fold Tracking From Videoendoscopy 29. Computer Vision and Human Behaviour, Emotion and Cognition Detection: A Use Case on Student Engagement 30. Computer vision-based tree trunk and branch identification and shaking points detection in Dense-Foliage canopy for automated harvesting of apples 31. Computer Vision–Based Estimation of Flood Depth in Flooded-Vehicle Images 32. An automated light trap to monitor moths (Lepidoptera) using computer vision-based tracking and deep learning 33. The Use of Saliency in Underwater Computer Vision: A Review 34. Computer vision for liquid samples in hospitals and medical labs using hierarchical image segmentation and relations prediction 35. Computer vision syndrome prevalence according to individual and video display terminal exposure characteristics in Spanish university students 36. Computer vision and unsupervised machine learning for pore-scale structural analysis of fractured porous media 37. Research on computer vision enhancement in intelligent robot based on machine learning and deep learning 38. Deformable Scintillation Dosimeter I: Challenges and Implementation using Computer Vision Techniques 39. Use of Computer Vision to Identify the Frequency and Magnitude of Insulin Syringe Preparation Errors 40. Action recognition of dance video learning based on embedded system and computer vision image 41. Frontiers of computer vision technologies on real estate property photographs and floorplans 42. Analysis of UAV-Acquired Wetland Orthomosaics Using GIS, Computer Vision, Computational Topology and Deep Learning 43. Computer vision applied to dual-energy computed tomography images for precise calcinosis cutis quantification in patients with systemic sclerosis 44. Human Motion Gesture Recognition Based on Computer Vision 45. Application of computer vision in fish intelligent feeding system—A review 46. Application of Computer Vision in 3D Film 47. WiCV 2020: The Seventh Women In Computer Vision Workshop 48. Computer vision based obstacle detection and target tracking for autonomous vehicles 49. Evaluating Congou black tea quality using a lab-made computer vision system coupled with morphological features and chemometrics 50. Research on Key Technologies in the Field of Computer Vision Based on Deep Learning 51. Online detection of naturally DON contaminated wheat grains from China using Vis-NIR spectroscopy and computer vision 52. A Computer Vision-Based Occupancy and Equipment Usage Detection Approach for Reducing Building Energy Demand 53. Application of Computer Vision Technology in Agricultural Products and Food Inspection 54. Automatic Evaluation of Wheat Resistance to Fusarium Head Blight Using Dual Mask-RCNN Deep Learning Frameworks in Computer Vision 55. A computer vision algorithm for locating and recognizing traffic signal control light status and countdown time 56. Microplastic abundance quantification via a computer-vision-based chemometrics-assisted approach 57. Computer Vision for Dietary Assessment 58. Determinants of computer vision system’s technology acceptance to improve incoming cargo receiving at Eastern European and Central Asian transportation … 59. CONSTRUCTION OF A SOMATOSENSORY INTERACTIVE SYSTEM BASED ON COMPUTER VISION AND AUGMENTED REALITY TECHNIQUES USI… 60. Estimating California’s Solar and Wind Energy Production using Computer Vision Deep Learning Techniques on Weather Images 61. Leaf disease segmentation and classification of Jatropha Curcas L. and Pongamia Pinnata L. biofuel plants using computer vision based approaches 62. Automated correlation of petrographic images of sandstones to a textural properties database extracted with computer vision techniques 63. Computer Vision-based Intelligent Bookshelf System 64. Computer Vision Techniques for Crowd Density and Motion Direction Analysis 65. Computer Vision System for Landing Platform State Assessment Onboard of Unmanned Aerial Vehicle in Case of Input Visual Information Distortion 66. Research on Bridge Deck Health Assessment System Based on BIM and Computer Vision Technology 67. Computer Vision for Dynamic Student Data Management in Higher Education Platform. 68. Tulipp and ClickCV: How the Future Demands of Computer Vision Can Be Met Using FPGAs 69. Stripenn detects architectural stripes from chromatin conformation data using computer vision 70. Having Fun with Computer Vision 71. Application of Computer Vision in Pipeline Inspection Robot 72. Design of Digital Museum Narrative Space Based on Perceptual Experience Data Mining and Computer Vision 73. Study on Pipelined Parallel Processing Architectures for Imaging and Computer Vision 74. Research on fire inspection robot based on computer vision 75. ActiveNet: A computer-vision based approach to determine lethargy 76. Individual Wave Detection and Tracking within a Rotating Detonation Engine through Computer Vision Object Detection applied to High-Speed Images 77. Human Thorax Parametric Reconstruction Using Computer Vision 78. Advancing Eosinophilic Esophagitis Diagnosis and Phenotype Assessment with Deep Learning Computer Vision 79. LANE DETECTION USING COMPUTER VISION FOR SELF-DRIVING CARS 80. Automatic Gear Sorting Using Wireless PLC Based on Computer Vision 81. Computer Vision-based Marker-less Real Time Motion Analysis for Rehabilitation–An Interdisciplinary Research Project 82. Computer vision in surgery 83. GUIs for Computer Vision 84. Surgical navigation technology based on computer vision and vr towards iot 85. A web-based survey on various symptoms of computer vision syndrome and the genetic understanding based on a multi-trait genome-wide association study 86. Automated Classification and Detection of Malaria Cell Using Computer Vision 87. Computer-Assisted Self-Training for Kyudo Posture Rectification Using Computer Vision Methods 88. Comparison of Computer Vision Techniques for Drowsiness Detection While Driving 89. A Real-Time Computer Vision System for Workers’ PPE and Posture Detection in Actual Construction Site Environment 90. Embedded Computer Vision System Applied to a Four-Legged Line Follower Robot 91. Computer Vision in Industry, Practice in the Czech Republic 92. Computer vision for microscopic skin cancer diagnosis using handcrafted and non-handcrafted features 93. Deep nets: What have they ever done for vision? 94. Analysis of Traditional Computer Vision Techniques Used for Hemp Leaf Water Stress Detection and Classification 95. Embedded Computer Vision System Applied to a Four-Legged Line Follower Robot 96. Deep Learning and Computer Vision Strategies for Automated Gene Editing with a Single-Cell Electroporation Platform 97. A computer vision-based approach for behavior recognition of gestating sows fed different fiber levels during high ambient temperature 98. Swin transformer: Hierarchical vision transformer using shifted windows 99. GEJALA COMPUTER VISION SYNDROME YANG DIALAMI OLEH KARYAWAN BUMN SEKTOR KEUANGAN KOTA TASIKMALAYA 100. Sistem Identifikasi Tingkat Kematangan Buah Nanas Secara Non-Destruktif Berbasis Computer Vision 101. Field-programmable gate arrays in a low power vision system 102. SiT: Self-supervised vIsion Transformer 103. SISTEM PENDETEKSI MASKER WAJAH DAN SUHU TUBUH MENGGUNAKAN TEKNIK COMPUTER VISION DAN SENSOR INFRARED NON-CONTACT 104. Do we really need explicit position encodings for vision transformers? 105. Cvt: Introducing convolutions to vision transformers 106. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions 107. Training vision transformers for image retrieval 108. Transformers in Vision: A Survey 109. Uncertainty-assisted deep vision structural health monitoring 110. Twins: Revisiting spatial attention design in vision transformers 111. Crossvit: Cross-attention multi-scale vision transformer for image classification 112. Smart Computer Laboratory: IoT Based Smartphone Application 113. A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads 114. Physics-based vision meets deep learning 115. Neural vision-based semantic 3D world modeling 116. Scaling up visual and vision-language representation learning with noisy text supervision 117. Deep Learning–Based Scene Simplification for Bionic Vision 118. Techniques To Improve Machine Vision In Robots 119. Future Vision Exhibition: Artificial Landscapes 120. Enabling energy efficient machine learning on a Ultra-Low-Power vision sensor for IoT 121. Tokens-to-token vit: Training vision transformers from scratch on imagenet 122. Vision-based Sensors for Production Control 123. Machine Learning and Deep Learning Applications-A Vision 124. The quiet revolution in machine vision-a state-of-the-art survey paper, including historical review, perspectives, and future directions 125. Synthesizing Pose Sequences from 3D Assets for Vision-Based Activity Analysis 126. Applying Mobile Intelligent API Vision Kit and Normalized Features for Face Recognition Using Live Cameras 127. A New Approach for Fire Pixel Detection in Building Environment Using Vision Sensor 128. A Vision-Based Parameter Estimation for an Aircraft in Approach Phase 129. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 130. Analysis of Target Detection and Tracking for Intelligent Vision System 131. Test Automation with Grad-CAM Heatmaps-A Future Pipe Segment in MLOps for Vision AI? 132. Investigating the Vision Transformer Model for Image Retrieval Tasks 133. Vision-based adjusting of a digital model to real-world conditions for wire insertion tasks 134. Combining brief and ad for edge-preserved dense stereo matching 135. Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading 136. Shallow Convolution Neural Network for an Industrial Robot Real Time Vision System 137. Research Status of Gesture Recognition Based on Vision: A Review 138. Multi-scale vision longformer: A new vision transformer for high-resolution image encoding 139. Road Peculiarities Detection using Deep Learning for Vehicle Vision System 140. Reinforcement learning applied to machine vision: state of the art 141. Vision based inspection system for leather surface defect detection using fast convergence particle swarm optimization ensemble classifier approach 142. Evaluation of visual complications among professional computer users 143. Brain Tumor Segmentation: A Comparative Analysis 144. Detection of Atlantic salmon bone residues using machine vision technology 145. Understanding Perceptual Bias in Machine Vision Systems 146. The MVTec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection 147. Scene text detection and recognition: The deep learning era 148. Vision-based continuous sign language recognition using multimodal sensor fusion 149. Egocentric Vision for Dog Behavioral Analysis 150. Vision-based Docking of a Mobile Robot 151. VLGrammar: Grounded Grammar Induction of Vision and Language 152. Mask-aware photorealistic facial attribute manipulation 153. Real-time plant phenomics under robotic farming setup: A vision-based platform for complex plant phenotyping tasks 154. VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers 155. Deep Vision Based Surveillance System to Prevent Train-Elephant Collisions 156. Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution 157. Autonomous, onboard vision-based trash and litter detection in low altitude aerial images collected by an unmanned aerial vehicle 158. Research on the algorithm of painting image style feature extraction based on intelligent vision 159. Fast semantic segmentation method for machine vision inspection based on a fewer-parameters atrous convolution neural network 160. Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and Beyond 161. Vision-Based Full-Field Sensing for Condition Assessment of Structural Systems 162. New method of traffic flow forecasting based on quantum particle swarm optimization strategy for intelligent transportation system 163. Same-different conceptualization: a machine vision perspective 164. Toward high-quality magnetic data survey using UAV: development of a magnetic-isolated vision-based positioning system 165. Urban landscape ecological design and stereo vision based on 3D mesh simplification algorithm and artificial intelligence 166. Transformer in transformer 167. A Robotic grinding station based on an industrial manipulator and vision system 168. Applications of Internet of Things (IoT) in Agriculture-The Potential and Challenges in Smart Farm in Uganda 169. Vision-Based Patient Monitoring and Management in Mental Health Settings 170. Mitigating Demographic Bias in Facial Datasets with Style-Based Multi-attribute Transfer 171. Dynamic tree branch tracking for aerial canopy sampling using stereo vision 172. Smart Office Model Based on Internet of Things 173. How to construct low-altitude aerial image datasets for deep learning [J] 174. Pretraining boosts out-of-domain robustness for pose estimation 175. A benchmark and evaluation of non-rigid structure from motion 176. The devil is in the boundary: Exploiting boundary representation for basis-based instance segmentation 177. Vision based collision detection for a safe collaborative industrial manipulator 178. Eden: Multimodal synthetic dataset of enclosed garden scenes 179. Pixel-wise crowd understanding via synthetic data 180. Vision-Based Framework for Automatic Progress Monitoring of Precast Walls by Using Surveillance Videos during the Construction Phase 181. LSPnet: A 2D Localization-oriented Spacecraft Pose Estimation Neural Network 182. Transfer of Learning from Vision to Touch: A Hybrid Deep Convolutional Neural Network for Visuo-Tactile 3D Object Recognition 183. UAV Use Case: Real-Time Obstacle Avoidance System for Unmanned Aerial Vehicles Based on Stereo Vision 184. Improving grain size analysis using computer vision techniques and implications for grain growth kinetics 185. Comparison of full-reference image quality models for optimization of image processing systems 186. High Precision Medicine Bottles Vision Online Inspection System and Classification Based on Multi-Features and Ensemble Learning via Independence Test 187. Efficient attention: Attention with linear complexities 188. Fusion Learning Using Semantics and Graph Convolutional Network for Visual Food Recognition 189. The ikea asm dataset: Understanding people assembling furniture through actions, objects and pose 190. DualSR: Zero-Shot Dual Learning for Real-World Super-Resolution 191. Modelling and Analysis of Facial Expressions Using Optical Flow Derived Divergence and Curl Templates 192. Barlow twins: Self-supervised learning via redundancy reduction 193. Facial expression recognition in the wild via deep attentive center loss 194. Facial Beauty Prediction and Analysis Based on Deep Convolutional Neural Network: A Review 195. Rodnet: Radar object detection using cross-modal supervision 196. Using open-source computer vision software for identification and tracking of convective storms 197. Improving Robustness and Uncertainty Modelling in Neural Ordinary Differential Equations 198. Vision and Inertial Sensor Fusion for Terrain Relative Navigation 199. Domain-Aware Unsupervised Hyperspectral Reconstruction for Aerial Image Dehazing 200. A method for classifying citrus surface defects based on machine vision 201. Transgan: Two transformers can make one strong gan 202. Improving Point Cloud Semantic Segmentation by Learning 3D Object Detection 203. Information Systems Integration to Enhance Operational Customer Relationship Management in the Pharmaceutical Industry 204. Attentional feature fusion 205. The Isowarp: The Template-Based Visual Geometry of Isometric Surfaces 206. Vision-Based Diagnosis and Location of Insulator Self-Explosion Defects 207. Activity Recognition with Moving Cameras and Few Training Examples: Applications for Detection of Autism-Related Headbanging 208. Development and Validation of an Unsupervised Feature Learning System for Leukocyte Characterization and Classification: A Multi-Hospital Study 209. DualSANet: Dual Spatial Attention Network for Iris Recognition 210. Long-Range Attention Network for Multi-View Stereo 211. Towards visually explaining video understanding networks with perturbation 212. MinkLoc3D: Point Cloud Based Large-Scale Place Recognition 213. p-RT: A Runtime Framework to Enable Energy-Efficient Real-Time Robotic Vision Applications on Heterogeneous Architectures 214. JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition 215. Binarized neural architecture search for efficient object recognition 216. Learning transferable visual models from natural language supervision 217. Improved techniques for training single-image gans 218. Machine Vision 219. SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2 220. Pervasive label errors in test sets destabilize machine learning benchmarks 221. MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution 222. Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT-assisted complex environment 223. Deep Learning in X-ray Testing 224. Tresnet: High performance gpu-dedicated architecture 225. ATM: Attentional Text Matting 226. Classmix: Segmentation-based data augmentation for semi-supervised learning 227. Videossl: Semi-supervised learning for video classification 228. Stratified rule-aware network for abstract visual reasoning 229. A hierarchical privacy-preserving IoT architecture for vision-based hand rehabilitation assessment 230. Adversarial reinforcement learning for unsupervised domain adaptation 231. Visual question answering model based on graph neural network and contextual attention 232. Towards Balanced Learning for Instance Recognition 233. The automatic detection of pedestrians under the high-density conditions by deep learning techniques 234. RGB-D salient object detection: A survey 235. A deep active learning system for species identification and counting in camera trap images 236. Domain Impression: A Source Data Free Domain Adaptation Method 237. Robust feature learning for adversarial defense via hierarchical feature alignment 238. HADEM-MACS: a hybrid approach for detection and extraction of objects in movement by multimedia autonomous computer systems 239. Vision-Based Method Integrating Deep Learning Detection for Tracking Multiple Construction Machines 240. Bottleneck transformers for visual recognition 241. Single Image Human Proxemics Estimation for Visual Social Distancing 242. SoFA: Source-Data-Free Feature Alignment for Unsupervised Domain Adaptation 243. Contrastive learning of general-purpose audio representations 244. Knowledge distillation: A survey 245. Let’s Get Dirty: GAN Based Data Augmentation for Camera Lens Soiling Detection in Autonomous Driving 246. Video Semantic Analysis: The Sparsity Based Locality-Sensitive Discriminative Dictionary Learning Factor 247. Vision-Based Tactile Sensor Mechanism for the Estimation of Contact Position and Force Distribution Using Deep Learning 248. A Discriminative Model for Multiple People Detection 249. Generative adversarial networks and their application to 3D face generation: A survey 250. Semantic hierarchy emerges in deep generative representations for scene synthesis 251. Fake face detection via adaptive manipulation traces extraction network 252. Accuracy of smartphone video for contactless measurement of hand tremor frequency 253. Video Captioning of Future Frames 254. Disentangled Contour Learning for Quadrilateral Text Detection 255. Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild 256. CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks 257. CityFlow-NL: Tracking and Retrieval of Vehicles at City Scaleby Natural Language Descriptions 258. 3D Head Pose Estimation through Facial Features and Deep Convolutional Neural Networks 259. Rescuenet: Joint building segmentation and damage assessment from satellite imagery 260. Alleviating over-segmentation errors by detecting action boundaries 261. Multiresolution Adaptive Threshold Based Segmentation of Real-Time Vision-Based Database for Human Motion Estimation 262. Simplifying dependent reductions in the polyhedral model 263. Multi-camera traffic scene mosaic based on camera calibration 264. Novel View Synthesis via Depth-guided Skip Connections 265. Self-Supervised Pretraining Improves Self-Supervised Pretraining 266. Weakly Supervised Multi-Object Tracking and Segmentation 267. Proposal learning for semi-supervised object detection 268. You only look yourself: Unsupervised and untrained single image dehazing neural network 269. Algorithm for epipolar geometry and correcting monocular stereo vision based on a plane mirror 270. Mutual Information Maximization on Disentangled Representations for Differential Morph Detection 271. Route planning methods in indoor navigation tools for vision impaired persons: a systematic review 272. Subject Guided Eye Image Synthesis with Application to Gaze Redirection 273. Roles of artificial intelligence in construction engineering and management: A critical review and future trends 274. PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation 275. Swag: Superpixels weighted by average gradients for explanations of cnns 276. Impact diagnosis in stiffened structural panels using a deep learning approach 277. X-ray Testing 278. Computer Assisted Classification Framework for Detection of Acute Myeloid Leukemia in Peripheral Blood Smear Images 279. IGSSTRCF: Importance Guided Sparse Spatio-Temporal Regularized Correlation Filters for Tracking 280. A Unified Learning Approach for Hand Gesture Recognition and Fingertip Detection 281. EventAnchor: Reducing Human Interactions in Event Annotation of Racket Sports Videos 282. An Evolution of CNN Object Classifiers on Low-Resolution Images 283. A vector-based representation to enhance head pose estimation 284. Automatic Defect Detection of Print Fabric Using Convolutional Neural Network 285. Multi-Scale Voxel Class Balanced ASPP for LIDAR Pointcloud Semantic Segmentation 286. Non-Destructive Quality Inspection of Potato Tubers Using Automated Vision System 287. Layering Defects Detection in Laser Powder Bed Fusion using Embedded Vision System 288. Label-Free Robustness Estimation of Object Detection CNNs for Autonomous Driving Applications 289. Convolutional Neural Networks and Transfer Learning for Quality Inspection of Different Sugarcane Varieties 290. Vision-Based Suture Tensile Force Estimation in Robotic Surgery 291. FEANet: Foreground-edge-aware network with DenseASPOC for human parsing 292. Application of a convolutional neural network for detection of ignition sources and smoke 293. Image matching across wide baselines: From paper to practice 294. Towards Annotation-free Instance Segmentation and Tracking with Adversarial Simulations 295. Self-Supervised Learning for Domain Adaptation on Point Clouds 296. Ontology-driven event type classification in images 297. Apple Ripeness Identification Using Deep Learning 298. Deep sparse transfer learning for remote smart tongue diagnosis [J] 299. CONVERSATION OF REAL IMAGES INTO CARTOONIZE IMAGE FORMAT USING GENERATIVE ADVERSARIAL NETWORK 300. RICORD: A Precedent for Open AI in COVID-19 Image Analytics 301. CapGen: A Neural Image Caption Generator with Speech Synthesis 302. Computer-Aided Diagnosis of Alzheimer’s Disease through Weak Supervision Deep Learning Framework with Attention Mechanism 303. A novel and intelligent vision-based tutor for Yogasana: e-YogaGuru 304. Plant Trait Estimation and Classification Studies in Plant Phenotyping Using Machine Vision-A Review 305. Iranis: A Large-scale Dataset of Farsi License Plate Characters 306. A numerical framework for elastic surface matching, comparison, and interpolation 307. Sign language recognition from digital videos using deep learning methods 308. Enhanced Information Fusion Network for Crowd Counting 309. This Face Does Not Exist… But It Might Be Yours! Identity Leakage in Generative Models 310. Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty 311. Guided attentive feature fusion for multispectral pedestrian detection 312. Vision-Based Guidance for Tracking Dynamic Objects 313. Single-shot fringe projection profilometry based on Deep Learning and Computer Graphics 314. Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding 315. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples 316. Vision-Aided 6G Wireless Communications: Blockage Prediction and Proactive Handoff 317. FACEGAN: Facial Attribute Controllable rEenactment GAN 318. Benchmarking the robustness of semantic segmentation models with respect to common corruptions 319. Vision-based egg quality prediction in Pacific bluefin tuna (Thunnus orientalis) by deep neural network 320. Diagnosing colorectal abnormalities using scattering coefficient maps acquired from optical coherence tomography 321. Recovering Trajectories of Unmarked Joints in 3D Human Actions Using Latent Space Optimization 322. Only time can tell: Discovering temporal data for temporal modeling 323. Deep Learning applications for COVID-19 324. PointCutMix: Regularization Strategy for Point Cloud Classification 325. A Survey on the Usage of Pattern Recognition and Image Analysis Methods for the Lifestyle Improvement on Low Vision and Visually Impaired People 326. Mechanical System Control by RGB-D Device 327. DACS: Domain Adaptation via Cross-domain Mixed Sampling 328. Vision-Based Vibration Monitoring of Structures and Infrastructures: An Overview of Recent Applications 329. CycleSegNet: Object Co-segmentation with Cycle Refinement and Region Correspondence 330. Explainable Fingerprint ROI Segmentation Using Monte Carlo Dropout 331. Self-supervised pretraining of visual features in the wild 332. Optimized Z-Buffer Using Divide and Conquer 333. Efficient and robust unsupervised inverse intensity compensation for stereo image registration under radiometric changes 334. Zero-shot text-to-image generation 335. PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking 336. MobiSamadhaan—Intelligent Vision-Based Smart City Solution 337. A Training Method for Low Rank Convolutional Neural Networks Based on Alternating Tensor Compose-Decompose Method 338. Individual Sick Fir Tree (Abies mariesii) Identification in Insect Infested Forests by Means of UAV Images and Deep Learning 339. Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection 340. 3D Image Conversion of a Scene from Multiple 2D Images with Background Depth Profile 341. Falsification of a Vision-based Automatic Landing System 342. Generating Physically Sound Training Data for Image Recognition of Additively Manufactured Parts 343. Shelf Auditing Based on Image Classification Using Semi-Supervised Deep Learning to Increase On-Shelf Availability in Grocery Stores 344. Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks 345. Research on Detection Method of Sheet Surface Defects Based on Machine Vision 346. Low-Resolution LiDAR Upsampling Using Weighted Median Filter 347. RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet 348. Breaking Shortcuts by Masking for Robust Visual Reasoning 349. Correlation filter tracking based on superpixel and multifeature fusion 350. Segmentation of body parts of cows in RGB-depth images based on template matching 351. Electric Scooter and Its Rider Detection Framework Based on Deep Learning for Supporting Scooter-Related Injury Emergency Services 352. A Model of Diameter Measurement Based on the Machine Vision 353. Transreid: Transformer-based object re-identification 354. LightLayers: Parameter Efficient Dense and Convolutional Layers for Image Classification 355. Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing 356. Pedestrian Detection on Multispectral Images in Different Lighting Conditions 357. CovidSens: a vision on reliable social sensing for COVID-19 358. An Improved Approach for Face Detection 359. Classification and Measuring Accuracy of Lenses Using Inception Model V3 360. Nighttime image dehazing based on Retinex and dark channel prior using Taylor series expansion 361. Multiple Object Tracking Using Convolutional Neural Network on Aerial Imagery Sequences 362. Piano Skills Assessment 363. Automatic recognition of surface cracks in bridges based on 2D-APES and mobile machine vision 364. Real-time Navigation for Drogue-Type Autonomous Aerial Refueling Using Vision-Based Deep Learning Detection 365. Robust 3D Reconstruction Through Noise Reduction of Ultra-Fast Images 366. Survey of Occluded and Unoccluded Face Recognition 367. Cell tracking in time-lapse microscopy image sequences 368. Depth Estimation Using Blob Detection for Stereo Vision Images 369. Fast human activity recognition 370. Single Shot Multitask Pedestrian Detection and Behavior Prediction 371. The Vision of Digital Surgery 372. Modeling of Potato Slice Drying Process in a Microwave Dryer using Artificial Neural Network and Machine Vision 373. VinVL: Making Visual Representations Matter in Vision-Language Models 374. Quality safety monitoring of LED chips using deep learning-based vision inspection methods 375. Machine Vision Based Phenotype Recognition of Plant and Animal 376. Towards manufacturing robotics accuracy degradation assessment: A vision-based data-driven implementation 377. Attention guided low-light image enhancement with a large scale low-light simulation dataset 378. Human action identification by a quality-guided fusion of multi-model feature 379. Optimal quantization using scaled codebook 380. A Robust Illumination-Invariant Camera System for Agricultural Applications 381. Real-Time Gait-Based Age Estimation and Gender Classification From a Single Image 382. Counting and Tracking of Vehicles and Pedestrians in Real Time Using You Only Look Once V3 383. Style Normalization and Restitution for DomainGeneralization and Adaptation 384. A Machine Vision-Based Method Optimized for Restoring Broiler Chicken Images Occluded by Feeding and Drinking Equipment 385. Face Recognition for Surveillance Systems using SRGAN 386. FACIAL RECOGNITION AND ATTENDANCE SYSTEM USING DLIB AND FACE RECOGNITION LIBRARIES 387. Deep Preset: Blending and Retouching Photos with Color Style Transfer 388. Transunet: Transformers make strong encoders for medical image segmentation 389. List-wise learning-to-rank with convolutional neural networks for person re-identification 390. Intra-Camera Supervised Person Re-Identification 391. Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos 392. A Deep Convolutional Encoder-Decoder Architecture Approach for Sheep Weight Estimation 393. Automatic Borescope Damage Assessments for Gas Turbine Blades via Deep Learning 394. Foreground-aware Semantic Representations for Image Harmonization 395. A Learning-Based Approach to Parametric Rotoscoping of Multi-Shape Systems 396. Energy-efficient cluster-based unmanned aerial vehicle networks with deep learning-based scene classification model 397. Mobile-Aware Deep Learning Algorithms for Malaria Parasites and White Blood Cells Localization in Thick Blood Smears 398. SChISM: Semantic Clustering via Image Sequence Merging for Images of Human-Decomposition 399. Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models 400. Locate Globally, Segment Locally: A Progressive Architecture With Knowledge Review Network for Salient Object Detection 401. Using Feature Selection Based on Multi-view for Rice Seed Images Classification 402. Road images augmentation with synthetic traffic signs using neural networks 403. Java Tools For Image Understanding The Java Imaging and Vision Environment (JIVE) 404. Improved ECO Algorithm Based on Residual Neural Network 405. Viewpoint and Scale Consistency Reinforcement for UAV Vehicle Re-Identification 406. Automated Surveillance Model for Video-Based Anomalous Activity Detection Using Deep Learning Architecture 407. Real-Time, YOLO-Based Intelligent Surveillance and Monitoring System Using Jetson TX2 408. Countering Inconsistent Labelling by Google’s Vision API for Rotated Images 409. Towards Accurate Camouflaged Object Detection with Mixture Convolution and Interactive Fusion 410. Effectiveness of arbitrary transfer sets for data-free knowledge distillation 411. Accelerated High-Level Synthesis Feature Detection for FPGAs Using HiFlipVX 412. Supervised deep learning of elastic SRV distances on the shape space of curves 413. Evaluating GAN-Based Image Augmentation for Threat Detection in Large-Scale Xray Security Images 414. Where to Start Your Deep Learning 415. Real-Time Detection and Spatial Localization of Insulators for UAV Inspection Based on Binocular Stereo Vision 416. Excitation dropout: Encouraging plasticity in deep neural networks 417. The Edge Computing Cloud Architecture Based on 5G Network for Industrial Vision Detection 418. Estimating Galactic Distances From Images Using Self-supervised Representation Learning 419. Real-Time Hair Segmentation Using Mobile-Unet 420. Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning 421. Attention-based context aggregation network for monocular depth estimation 422. A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset 423. Identification of Suitable Contrast Enhancement Technique for Improving the Quality of Astrocytoma Histopathological Images. 424. Run-Time Monitoring of Machine Learning for Robotic Perception: A Survey of Emerging Trends 425. Prevalence and risk factor assessment of digital eye strain among children using online e-learning during the COVID-19 pandemic: Digital eye strain among … 426. Pedestrian Detection: Unification of Global and Local Features 427. A comprehensive analysis of weakly-supervised semantic segmentation in different image domains 428. Deep learning assisted vision inspection of resistance spot welds 429. Identifying centres of interest in paintings using alignment and edge detection: Case studies on works by Luc Tuymans 430. Going deeper with image transformers 431. Spike-thrift: Towards energy-efficient deep spiking neural networks by limiting spiking activity via attention-guided compression 432. Adversarial feature distribution alignment for semi-supervised learning 433. G2d: Generate to detect anomaly 434. Anomaly Detection in Crowded Scenes Using Motion Influence Map and Convolutional Autoencoder 435. A study on attention-based LSTM for abnormal behavior recognition with variable pooling 436. Deep convolutional neural network based autonomous drone navigation 437. A survey on contrastive self-supervised learning 438. Independent Learning of Motion Parameters for Deep Visual Odometry 439. Image segmentation using deep learning: A survey 440. Using Interaction Protocols to Control Vision Systems 441. TGCN: Time Domain Graph Convolutional Network for Multiple Objects Tracking 442. A Survey on Crowd Counting Methods and Datasets 443. RGSR: A two-step lossy JPG image super-resolution based on noise reduction 444. Image processing effects on the deep face recognition system [J] 445. A weakly supervised consistency-based learning method for covid-19 segmentation in ct images 446. Improving Few-Shot Learning using Composite Rotation based Auxiliary Task 447. Investigating large-scale graphs for community detection 448. Estimating Galactic Distances From Images Using Self-supervised Representation Learning 449. Pig Breed Detection Using Faster R-CNN 450. A Vision-Based System For Non-Intrusive Posture Correction Notifications 451. Inception recurrent convolutional neural network for object recognition 452. A Meta-Q-Learning Approach to Discriminative Correlation Filter based Visual Tracking 453. SWD: Low-Compute Real-Time Object Detection Architecture 454. An empirical study of multi-scale object detection in high resolution UAV images 455. Vision Guided Robots. Calibration and Motion Correction 456. Deep-emotion: Facial expression recognition using attentional convolutional network 457. Single-Object Tracking Algorithm Based on Two-Step Spatiotemporal Deep Feature Fusion in a Complex Surveillance Scenario 458. A Focus-Measurement Based 3D Surface Reconstruction System for Dimensional Metrology 459. Image Pre-Processing Method of Machine Learning for Edge Detection with Image Signal Processor Enhancement 460. Distress Recognition in Unpaved Roads Using Unmanned Aerial Systems and Deep Learning Segmentation 461. Noise density range sensitive mean-median filter for impulse noise removal 462. A Novel Fusion of Deep Learning and Android Application for Real-Time Mango Fruits Disease Detection 463. Deep Gaussian Denoiser Epistemic Uncertainty and Decoupled Dual-Attention Fusion 464. Patient Emotion Recognition in Human Computer Interaction System Based on Machine Learning Method and Interactive Design Theory 465. Real-Time Hair Segmentation Using Mobile-Unet. Electronics 2021, 10, 99 466. Diagonal-kernel convolutional neural networks for image classification 467. Dense-Resolution Network for Point Cloud Classification and Segmentation 468. U2-ONet: A Two-Level Nested Octave U-Structure Network with a Multi-Scale Attention Mechanism for Moving Object Segmentation 469. Anonymous Person Tracking Across Multiple Camera Using Color Histogram and Body Pose Estimation 470. On-line three-dimensional coordinate measurement of dynamic binocular stereo vision based on rotating camera in large FOV 471. Image Compression and Reconstruction Using Encoder–Decoder Convolutional Neural Network 472. Extracting Effective Image Attributes with Refined Universal Detection 473. Self-supervised training for blind multi-frame video denoising 474. On the Tightness of Semidefinite Relaxations for Rotation Estimation 475. Poly Scale Space Technique for Feature Extraction in Lip Reading: A New Strategy 476. Influence of phosphate concentration on amine, amide, and hydroxyl CEST contrast 477. Controlling biases and diversity in diverse image-to-image translation 478. A New Feature Fusion Network for Student Behavior Recognition in Education 479. Localizing License Plates in Real Time with RetinaNet Object Detector 480. Application of a Vision-Based Single Target on Robot Positioning System 481. A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural … 482. Threat Detection in Social Media Images Using the Inception-v3 Model 483. Augmenting Crop Detection for Precision Agriculture with Deep Visual Transfer Learning—A Case Study of Bale Detection 484. Evolving Smooth Manifolds of Arbitrary Codimen-sion in Rn 485. A Unified Learning Approach for Hand Gesture Recognition and Fingertip Detection 486. CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse 487. A multi-platform comparison of local feature description methods 488. Smart Vehicle Tracker for Parking System 489. Knowledge distillation for incremental learning in semantic segmentation 490. MSLp: Deep Superresolution for Meteorological Satellite Image 491. Improving Object Detection Quality by Incorporating Global Contexts via Self-Attention 492. Self-Supervised Pretraining of 3D Features on any Point-Cloud 493. Beyond covariance: Sice and kernel based visual feature representation 494. Attention-based VGG-16 model for COVID-19 chest X-ray image classification 495. High-performance large-scale image recognition without normalization 496. Learning data augmentation with online bilevel optimization for image classification 497. DEEP LEARNING IN LANDCOVER CLASSIFICATION 498. Cobot User Frame Calibration: Evaluation and Comparison between Positioning Repeatability Performances Achieved by Traditional and Vision-Based Methods 499. A Survey of Image Enhancement and Object Detection Methods 500. Multi-object Tracking with a Hierarchical Single-branch Network 501. A Hybrid Approach Based on Lp1 Norm-Based Filters and Normalized Cut Segmentation for Salient Object Detection 502. Skin Lesion Classification Using Deep Learning 503. S-VVAD: Visual Voice Activity Detection by Motion Segmentation 504. Spatio-temporal attention on manifold space for 3D human action recognition 505. Mixup Without Hesitation 506. Deep learning for real-time semantic segmentation: Application in ultrasound imaging 507. Self-supervised monocular depth estimation with direct methods 508. Small object detection using context and attention 509. Resolution invariant person reid based on feature transformation and self-weighted attention 510. Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images 511. Steel bridge corrosion inspection with combined vision and thermographic images 512. Instance Segmentation for Direct Measurements of Satellites in Metal Powders and Automated Microstructural Characterization from Image Data 513. Minimal solution for estimating fundamental matrix under planar motion 514. Component-level Script Classification Benchmark with CNN on AUTNT Dataset 515. Jaa-net: Joint facial action unit detection and face alignment via adaptive attention 516. Machine Learning Techniques for Predicting Crop Production in India 517. Introduction to Natural Language Processing 518. Structured Scene Memory for Vision-Language Navigation 519. Comparison-Based Study to Predict Breast Cancer: A Survey 520. Aadhaar-Based Authentication and Authorization Scheme for Remote Healthcare Monitoring 521. Nonlinear Approximation and (Deep) ReLU Networks 522. A Workflow Allocation Strategy Under Precedence Constraints for IaaS Cloud Environment 523. Person Identification Using Histogram of Gradient and Support Vector Machine on GEI 524. Adaptive streaming of 360-degree videos with reinforcement learning 525. Video Tagging and Recommender System Using Deep Learning 526. Scale variance minimization for unsupervised domain adaptation in image segmentation 527. Convolutional Elman Jordan Neural Network for Reconstruction and Classification Using Attention Window 528. Channel Capacity in Psychovisual Deep-Nets: Gaussianization Versus Kozachenko-Leonenko 529. Assessing the Viability of Visual Vibrometry for Use in Structural Engineering 530. A Deep Learning-Based Hotel Image Classifier for Online Travel Agencies 531. UVCE-IIITT@ DravidianLangTech-EACL2021: Tamil Troll Meme Classification: You need to Pay more Attention 532. Through-Wall Human Pose Reconstruction via UWB MIMO Radar and 3D CNN 533. Novel Assessments of Technical and Nontechnical Cardiac Surgery Quality: Protocol for a Mixed Methods Study 534. Biologically inspired visual computing: the state of the art 535. A Robust Surf-Based Online Human Tracking Algorithm Using Adaptive Object Model 536. Attention based pruning for shift networks 537. Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation 538. Hexacopter design for carrying payload for warehouse applications 539. Implementation of Recommender System Using Neural Networks and Deep Learning 540. A multi-scale and multi-level feature aggregation network for crowd counting 541. Foodborne Disease Outbreak Prediction Using Deep Learning 542. LRA-Net: local region attention network for 3D point cloud completion 543. Partial Domain Adaptation Using Selective Representation Learning For Class-Weight Computation 544. Estimating Body Pose from A Single Image 545. Hybrid Feature Selection Method for Predicting the Kidney Disease Membranous Nephropathy 546. Prognosis of Breast Cancer by Implementing Machine Learning Algorithms Using Modified Bootstrap Aggregating 547. Robust local binary descriptor in rotation change using polar location 548. A Multimodal Biometric System Based on Finger Knuckle Print, Fingerprint, and Palmprint Traits 549. Efficientps: Efficient panoptic segmentation 550. ShadingNet: image intrinsics by fine-grained shading decomposition 551. Perception of Plant Diseases in Color Images Through Adaboost 552. Data Mining in Cloud Computing: Survey 553. Artificial Neural Network Analysis for Predicting Spatial Patterns of Urbanization in India 554. Navier–Stokes-Based Image Inpainting for Restoration of Missing Data Due to Clouds 555. An Attribute-Based Break-Glass Access Control Framework for Medical Emergencies 556. Detection of Life Threatening ECG Arrhythmias Using Morphological Patterns and Wavelet Transform Method 557. Improvement of Identity Recognition with Occlusion Detection-Based Feature Selection 558. Generative Video Compression as Hierarchical Variational Inference 559. SLiKER: Sparse loss induced kernel ensemble regression 560. Is Space-Time Attention All You Need for Video Understanding? 561. Cityguide: A seamless indoor-outdoor wayfinding system for people with vision impairments 562. An Efficient Multimodal Biometric System Integrated with Liveness Detection Technique 563. Microscopic brain tumor detection and classification using 3D CNN and feature selection architecture 564. A Survey of Brain-Inspired Intelligent Robots: Integration of Vision, Decision, Motion Control, and Musculoskeletal Systems 565. Scale selection 566. Automated detection and classification of spilled loads on freeways based on improved YOLO network 567. Information Granules and Granular Computing 568. DAN : Breast Cancer Classification from High-Resolution Histology Images Using Deep Attention Network 569. Image Segmentation of MR Images with Multi-directional Region Growing Algorithm 570. Satellite Radar Interferometry for DEM Generation Using Sentinel-1A Imagery 571. On the Effect of Training Convolution Neural Network for Millimeter-Wave Radar-Based Hand Gesture Recognition 572. Synthetic Vision for Virtual Character Guidance 573. Evaluation of AOMDV Routing Protocol for Optimum Transmitted Power in a Designed Ad-hoc Wireless Sensor Network 574. Evaluation metrics for conditional image generation 575. Image Inpainting Using Double Discriminator Generative Adversarial Networks 576. Crowd counting method based on the self-attention residual network 577. Cyber Espionage—An Ethical Analysis 578. Analysis of Gait and Face Biometric Traits from CCTV Streams for Forensics 579. A facial expression recognition model using hybrid feature selection and support vector machines 580. Predicting the Big-Five Personality Traits from Handwriting 581. Image engineering 582. Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation 583. Faster and Secured Web Services Communication Using Modified IDEA and Custom-Level Security 584. Improved Image Deblurring Using GANs 585. Hand-Based Person Identification using Global and Part-Aware Deep Feature Representation Learning 586. Identification for Recycling Polyethylene Terephthalate (PET) Plastic Bottles by Polarization Vision 587. Holistic filter pruning for efficient deep neural networks 588. Vision-based vehicle speed estimation: A survey 589. Learning Temporal Dynamics from Cycles in Narrated Video 590. Analysis of MRI Image Compression Using Compressive Sensing 591. Analysis of PQ Disturbances in Renewable Grid Integration System Using Non-parametric Spectral Estimation Approach 592. A Porcine Abdomen Cutting Robot System Using Binocular Vision Techniques Based on Kernel Principal Component Analysis 593. Arc Length method for extracting crack pattern characteristics 594. AF-EMS Detector: Improve the Multi-Scale Detection Performance of the Anchor-Free Detector 595. A Transfer Learning Approach for Drowsiness Detection from EEG Signals 596. Deep audio-visual learning: A survey 597. MAFNet: Multi-style attention fusion network for salient object detection 598. HAVANA: Hierarchical and Variation-Normalized Autoencoder for Person Re-identification 599. An Empirical Comparison of Generative Adversarial Network (GAN) Measures 600. An enhanced 3DCNN-ConvLSTM for spatiotemporal multimedia data analysis

More Research Topics on Image processing

  • Image Based Rendering Research Topics
  • Hyperspectral image analysis Research Topics 
  • Medical Imaging Research Topics
  • Computer vision Research Topics

Related Posts:

  • Research Questions - Deep learning-enabled medical computer vision
  • Research Questions - Deep learning, computer vision, and entomology
  • Research questions - Duplicate detection of images using computer vision techniques
  • What is PAO 11/82 of vision?
  • How to Write a Vision Statement?
  • List of Journals on Digital image Processing, Usability and Vision

Javascript is disabled in your browser. Please enable it for full functionality and experience.

  • Direkt zur Navigation springen
  • Direkt zur Suche springen
  • Direkt zum Inhalt springen
  • Fraunhofer HHI
  • Departments
  • Start page >
  • Departments >
  • Vision and Imaging Technologies >
  • Research Groups >
  • Computer Vision & Graphics >
  • CVG Research Overview
  • AI-Based Building Digitalization
  • Portrait Relighting
  • Neural Speech-Driven Face Animation
  • Video-driven Facial Animation
  • Publications
  • Student Opportunities
  • IMC Research Overview
  • Research Topics
  • Pose and gesture analysis
  • Behaviour analysis for human-computer interaction
  • Contact-free Human-Computer Interaction
  • Image Quality Estimation
  • Subjective Tests
  • Birgit Nierula

Research Topics of the Computer Vision & Graphics Group

Seeing, modelling and animating humans.

computer vision dissertation topics

Realistic human modelling is a challenging task in Computer Vision and Graphics. We investigate new methods for capturing and analyzing human bodies and faces in images and videos as well as new compact models for the representation of facial expressions as well as human bodies and their motion. We combine model-based and image-and video based representations with generative AI models as well as neural rendering.

Read more about current research projects in this field.

Scenes, Structure and Motion

computer vision dissertation topics

We have a long tradition in 3D scene analysis and continuously perform innovative research in 3D capturing as well as 3D reconstruction, ranging from highly detailed stereo as well as multi-view images of static objects and scenes, addressing even complex surface and shape properties, over monocular shape-from-X methods, to analyzing deforming objects in monocular video.

Computational Imaging and Video

computer vision dissertation topics

We perform innovative research in the field of video processing and computational video opening up new opportunities for how dynamic scenes can be analyzed and video footage can be represented, edited and seamlessly augmented with new content.

Learning and Inference

computer vision dissertation topics

Our research combines computer vision, computer graphics, and machine learning to understand images and video data. In our research, we focus on the combination of deep learning with strong models or physical constraints in order to combine the advantages of model-based and data-driven methods.

Augmented and Mixed Reality

computer vision dissertation topics

Our experience in tracking dynamic scenes and objects as well as photorealistic rendering enables new augmented reality solutions where virtual content is seamlessly blended into real video footage with applications e.g. multi-media, industry or medicine.

Previous Research Projects

computer vision dissertation topics

We have performed various research projects in the above fields over the years.

Read more about older research projects here.

Subscribe to the PwC Newsletter

Join the community, computer vision, semantic segmentation.

computer vision dissertation topics

Tumor Segmentation

computer vision dissertation topics

Panoptic Segmentation

computer vision dissertation topics

3D Semantic Segmentation

computer vision dissertation topics

Weakly-Supervised Semantic Segmentation

Representation learning.

computer vision dissertation topics

Disentanglement

Graph representation learning, sentence embeddings.

computer vision dissertation topics

Network Embedding

Classification.

computer vision dissertation topics

Text Classification

computer vision dissertation topics

Graph Classification

computer vision dissertation topics

Audio Classification

computer vision dissertation topics

Medical Image Classification

Object detection.

computer vision dissertation topics

3D Object Detection

computer vision dissertation topics

Real-Time Object Detection

computer vision dissertation topics

RGB Salient Object Detection

computer vision dissertation topics

Few-Shot Object Detection

Image classification.

computer vision dissertation topics

Out of Distribution (OOD) Detection

computer vision dissertation topics

Few-Shot Image Classification

computer vision dissertation topics

Fine-Grained Image Classification

computer vision dissertation topics

Semi-Supervised Image Classification

2d object detection.

computer vision dissertation topics

Edge Detection

Thermal image segmentation.

computer vision dissertation topics

Open Vocabulary Object Detection

Reinforcement learning (rl), off-policy evaluation, multi-objective reinforcement learning, 3d point cloud reinforcement learning, deep hashing, table retrieval, domain adaptation.

computer vision dissertation topics

Unsupervised Domain Adaptation

computer vision dissertation topics

Domain Generalization

computer vision dissertation topics

Test-time Adaptation

Source-free domain adaptation, image generation.

computer vision dissertation topics

Image-to-Image Translation

computer vision dissertation topics

Text-to-Image Generation

computer vision dissertation topics

Image Inpainting

computer vision dissertation topics

Conditional Image Generation

Data augmentation.

computer vision dissertation topics

Image Augmentation

computer vision dissertation topics

Text Augmentation

Autonomous vehicles.

computer vision dissertation topics

Autonomous Driving

computer vision dissertation topics

Self-Driving Cars

computer vision dissertation topics

Simultaneous Localization and Mapping

computer vision dissertation topics

Autonomous Navigation

computer vision dissertation topics

Image Denoising

computer vision dissertation topics

Color Image Denoising

computer vision dissertation topics

Sar Image Despeckling

Grayscale image denoising, meta-learning.

computer vision dissertation topics

Few-Shot Learning

computer vision dissertation topics

Sample Probing

Universal meta-learning, contrastive learning.

computer vision dissertation topics

Super-Resolution

computer vision dissertation topics

Image Super-Resolution

computer vision dissertation topics

Video Super-Resolution

computer vision dissertation topics

Multi-Frame Super-Resolution

computer vision dissertation topics

Reference-based Super-Resolution

Pose estimation.

computer vision dissertation topics

3D Human Pose Estimation

computer vision dissertation topics

Keypoint Detection

computer vision dissertation topics

3D Pose Estimation

computer vision dissertation topics

6D Pose Estimation

Self-supervised learning.

computer vision dissertation topics

Point Cloud Pre-training

Unsupervised video clustering, 2d semantic segmentation, image segmentation, text style transfer.

computer vision dissertation topics

Scene Parsing

computer vision dissertation topics

Reflection Removal

Visual question answering (vqa).

computer vision dissertation topics

Visual Question Answering

computer vision dissertation topics

Machine Reading Comprehension

computer vision dissertation topics

Chart Question Answering

computer vision dissertation topics

Embodied Question Answering

computer vision dissertation topics

Depth Estimation

computer vision dissertation topics

3D Reconstruction

computer vision dissertation topics

Neural Rendering

computer vision dissertation topics

3D Face Reconstruction

computer vision dissertation topics

3D Shape Reconstruction

Sentiment analysis.

computer vision dissertation topics

Aspect-Based Sentiment Analysis (ABSA)

computer vision dissertation topics

Multimodal Sentiment Analysis

computer vision dissertation topics

Aspect Sentiment Triplet Extraction

computer vision dissertation topics

Twitter Sentiment Analysis

Anomaly detection.

computer vision dissertation topics

Unsupervised Anomaly Detection

computer vision dissertation topics

One-Class Classification

Supervised anomaly detection, anomaly detection in surveillance videos.

computer vision dissertation topics

Temporal Action Localization

computer vision dissertation topics

Video Understanding

Video generation.

computer vision dissertation topics

Video Object Segmentation

computer vision dissertation topics

Action Classification

Activity recognition.

computer vision dissertation topics

Action Recognition

computer vision dissertation topics

Human Activity Recognition

Egocentric activity recognition.

computer vision dissertation topics

Group Activity Recognition

3d object super-resolution.

computer vision dissertation topics

One-Shot Learning

computer vision dissertation topics

Few-Shot Semantic Segmentation

Cross-domain few-shot.

computer vision dissertation topics

Unsupervised Few-Shot Learning

Medical image segmentation.

computer vision dissertation topics

Lesion Segmentation

computer vision dissertation topics

Brain Tumor Segmentation

computer vision dissertation topics

Cell Segmentation

computer vision dissertation topics

Brain Segmentation

Monocular depth estimation.

computer vision dissertation topics

Stereo Depth Estimation

Depth and camera motion.

computer vision dissertation topics

3D Depth Estimation

Exposure fairness, optical character recognition (ocr).

computer vision dissertation topics

Active Learning

computer vision dissertation topics

Handwriting Recognition

Handwritten digit recognition, irregular text recognition, instance segmentation.

computer vision dissertation topics

Referring Expression Segmentation

computer vision dissertation topics

3D Instance Segmentation

computer vision dissertation topics

Real-time Instance Segmentation

computer vision dissertation topics

Unsupervised Object Segmentation

Facial recognition and modelling.

computer vision dissertation topics

Face Recognition

computer vision dissertation topics

Face Swapping

computer vision dissertation topics

Face Detection

computer vision dissertation topics

Facial Expression Recognition (FER)

computer vision dissertation topics

Face Verification

Object tracking.

computer vision dissertation topics

Multi-Object Tracking

computer vision dissertation topics

Visual Object Tracking

computer vision dissertation topics

Multiple Object Tracking

computer vision dissertation topics

Cell Tracking

Zero-shot learning.

computer vision dissertation topics

Generalized Zero-Shot Learning

computer vision dissertation topics

Compositional Zero-Shot Learning

Multi-label zero-shot learning, quantization, data free quantization, unet quantization, continual learning.

computer vision dissertation topics

Class Incremental Learning

Continual named entity recognition, unsupervised class-incremental learning.

computer vision dissertation topics

Action Recognition In Videos

computer vision dissertation topics

3D Action Recognition

Self-supervised action recognition, few shot action recognition.

computer vision dissertation topics

Scene Understanding

computer vision dissertation topics

Scene Text Recognition

computer vision dissertation topics

Scene Graph Generation

computer vision dissertation topics

Scene Recognition

Adversarial attack.

computer vision dissertation topics

Backdoor Attack

computer vision dissertation topics

Adversarial Text

Adversarial attack detection, real-world adversarial attack, active object detection, image retrieval.

computer vision dissertation topics

Sketch-Based Image Retrieval

computer vision dissertation topics

Content-Based Image Retrieval

computer vision dissertation topics

Composed Image Retrieval (CoIR)

computer vision dissertation topics

Medical Image Retrieval

Dimensionality reduction.

computer vision dissertation topics

Supervised dimensionality reduction

Online nonnegative cp decomposition, emotion recognition.

computer vision dissertation topics

Speech Emotion Recognition

computer vision dissertation topics

Emotion Recognition in Conversation

computer vision dissertation topics

Multimodal Emotion Recognition

Emotion-cause pair extraction.

computer vision dissertation topics

Monocular 3D Object Detection

computer vision dissertation topics

3D Object Detection From Stereo Images

computer vision dissertation topics

Multiview Detection

Robust 3d object detection, style transfer.

computer vision dissertation topics

Image Stylization

Font style transfer, style generalization, face transfer, image reconstruction.

computer vision dissertation topics

MRI Reconstruction

computer vision dissertation topics

Film Removal

Optical flow estimation.

computer vision dissertation topics

Video Stabilization

Action localization.

computer vision dissertation topics

Action Segmentation

Spatio-temporal action localization, image captioning.

computer vision dissertation topics

3D dense captioning

Controllable image captioning, aesthetic image captioning.

computer vision dissertation topics

Relational Captioning

Person re-identification.

computer vision dissertation topics

Unsupervised Person Re-Identification

Video-based person re-identification, generalizable person re-identification, cloth-changing person re-identification, image restoration.

computer vision dissertation topics

Demosaicking

Spectral reconstruction, underwater image restoration.

computer vision dissertation topics

JPEG Artifact Correction

Visual relationship detection, lighting estimation.

computer vision dissertation topics

3D Room Layouts From A Single RGB Panorama

Road scene understanding, action detection.

computer vision dissertation topics

Skeleton Based Action Recognition

computer vision dissertation topics

Online Action Detection

Audio-visual active speaker detection, metric learning.

computer vision dissertation topics

Object Recognition

computer vision dissertation topics

3D Object Recognition

Continuous object recognition.

computer vision dissertation topics

Depiction Invariant Object Recognition

computer vision dissertation topics

Monocular 3D Human Pose Estimation

Pose prediction.

computer vision dissertation topics

3D Multi-Person Pose Estimation

3d human pose and shape estimation, image enhancement.

computer vision dissertation topics

Low-Light Image Enhancement

Image relighting, de-aliasing, multi-label classification.

computer vision dissertation topics

Missing Labels

Extreme multi-label classification, hierarchical multi-label classification, medical code prediction, continuous control.

computer vision dissertation topics

Steering Control

Drone controller.

computer vision dissertation topics

Semi-Supervised Video Object Segmentation

computer vision dissertation topics

Unsupervised Video Object Segmentation

computer vision dissertation topics

Referring Video Object Segmentation

computer vision dissertation topics

Video Salient Object Detection

3d face modelling.

computer vision dissertation topics

Trajectory Prediction

computer vision dissertation topics

Trajectory Forecasting

Human motion prediction, out-of-sight trajectory prediction.

computer vision dissertation topics

Multivariate Time Series Imputation

Object localization.

computer vision dissertation topics

Weakly-Supervised Object Localization

Image-based localization, unsupervised object localization, monocular 3d object localization.

computer vision dissertation topics

Blind Image Deblurring

Single-image blind deblurring, novel view synthesis.

computer vision dissertation topics

Novel LiDAR View Synthesis

computer vision dissertation topics

Gournd video synthesis from satellite image

Image quality assessment, no-reference image quality assessment, blind image quality assessment.

computer vision dissertation topics

Aesthetics Quality Assessment

Stereoscopic image quality assessment, out-of-distribution detection, video semantic segmentation.

computer vision dissertation topics

Camera shot segmentation

Cloud removal.

computer vision dissertation topics

Facial Inpainting

computer vision dissertation topics

Fine-Grained Image Inpainting

Instruction following, visual instruction following, change detection.

computer vision dissertation topics

Semi-supervised Change Detection

Saliency detection.

computer vision dissertation topics

Saliency Prediction

computer vision dissertation topics

Co-Salient Object Detection

Video saliency detection, unsupervised saliency detection, image compression.

computer vision dissertation topics

Feature Compression

Jpeg compression artifact reduction.

computer vision dissertation topics

Lossy-Compression Artifact Reduction

Color image compression artifact reduction, explainable artificial intelligence, explainable models, explanation fidelity evaluation, fad curve analysis, image registration.

computer vision dissertation topics

Unsupervised Image Registration

Visual reasoning.

computer vision dissertation topics

Visual Commonsense Reasoning

Ensemble learning, prompt engineering.

computer vision dissertation topics

Visual Prompting

Salient object detection, saliency ranking, visual tracking.

computer vision dissertation topics

Point Tracking

Rgb-t tracking, real-time visual tracking.

computer vision dissertation topics

RF-based Visual Tracking

3d point cloud classification.

computer vision dissertation topics

3D Object Classification

computer vision dissertation topics

Few-Shot 3D Point Cloud Classification

Supervised only 3d point cloud classification, zero-shot transfer 3d point cloud classification, 2d classification.

computer vision dissertation topics

Neural Network Compression

computer vision dissertation topics

Music Source Separation

Cell detection.

computer vision dissertation topics

Plant Phenotyping

Open-set classification, motion estimation, image manipulation detection.

computer vision dissertation topics

Zero Shot Skeletal Action Recognition

Generalized zero shot skeletal action recognition, whole slide images, activity prediction, motion prediction, cyber attack detection, sequential skip prediction, video captioning.

computer vision dissertation topics

Dense Video Captioning

Boundary captioning, visual text correction, audio-visual video captioning, point cloud registration.

computer vision dissertation topics

Image to Point Cloud Registration

computer vision dissertation topics

Robust 3D Semantic Segmentation

computer vision dissertation topics

Real-Time 3D Semantic Segmentation

computer vision dissertation topics

Unsupervised 3D Semantic Segmentation

Furniture segmentation, text detection, gesture recognition.

computer vision dissertation topics

Hand Gesture Recognition

computer vision dissertation topics

Hand-Gesture Recognition

computer vision dissertation topics

RF-based Gesture Recognition

Video question answering.

computer vision dissertation topics

Zero-Shot Video Question Answer

Few-shot video question answering, medical diagnosis.

computer vision dissertation topics

Alzheimer's Disease Detection

computer vision dissertation topics

Retinal OCT Disease Classification

Blood cell count, thoracic disease classification, 3d point cloud interpolation, visual grounding.

computer vision dissertation topics

Person-centric Visual Grounding

computer vision dissertation topics

Phrase Extraction and Grounding (PEG)

Visual odometry.

computer vision dissertation topics

Face Anti-Spoofing

Monocular visual odometry.

computer vision dissertation topics

Hand Pose Estimation

computer vision dissertation topics

Hand Segmentation

Gesture-to-gesture translation, rain removal.

computer vision dissertation topics

Single Image Deraining

Image clustering.

computer vision dissertation topics

Online Clustering

computer vision dissertation topics

Face Clustering

Multi-view subspace clustering, multi-modal subspace clustering, colorization.

computer vision dissertation topics

Line Art Colorization

computer vision dissertation topics

Point-interactive Image Colorization

computer vision dissertation topics

Color Mismatch Correction

computer vision dissertation topics

Image Dehazing

computer vision dissertation topics

Single Image Dehazing

Robot navigation.

computer vision dissertation topics

PointGoal Navigation

Social navigation.

computer vision dissertation topics

Sequential Place Learning

Image manipulation.

computer vision dissertation topics

Unsupervised Image-To-Image Translation

computer vision dissertation topics

Synthetic-to-Real Translation

computer vision dissertation topics

Multimodal Unsupervised Image-To-Image Translation

computer vision dissertation topics

Cross-View Image-to-Image Translation

computer vision dissertation topics

Fundus to Angiography Generation

Visual place recognition.

computer vision dissertation topics

Indoor Localization

3d place recognition, image editing, rolling shutter correction, shadow removal, multimodel-guided image editing, joint deblur and frame interpolation, multimodal fashion image editing, conformal prediction, visual localization.

computer vision dissertation topics

Stereo Matching

Deepfake detection.

computer vision dissertation topics

Synthetic Speech Detection

Human detection of deepfakes, multimodal forgery detection.

computer vision dissertation topics

Crowd Counting

computer vision dissertation topics

Visual Crowd Analysis

Group detection in crowds, object reconstruction.

computer vision dissertation topics

3D Object Reconstruction

Human-object interaction detection.

computer vision dissertation topics

Affordance Recognition

Point cloud classification, jet tagging, few-shot point cloud classification, image deblurring, low-light image deblurring and enhancement, earth observation, image matching.

computer vision dissertation topics

Semantic correspondence

Patch matching, set matching.

computer vision dissertation topics

Matching Disparate Images

Video quality assessment, video alignment, temporal sentence grounding, long-video activity recognition, hyperspectral.

computer vision dissertation topics

Hyperspectral Image Classification

Hyperspectral unmixing, hyperspectral image segmentation, classification of hyperspectral images, 3d point cloud reconstruction, document text classification, learning with noisy labels, multi-label classification of biomedical texts, political salient issue orientation detection.

computer vision dissertation topics

Weakly Supervised Action Localization

Weakly-supervised temporal action localization.

computer vision dissertation topics

Temporal Action Proposal Generation

Activity recognition in videos, scene classification.

computer vision dissertation topics

2D Human Pose Estimation

Action anticipation.

computer vision dissertation topics

3D Face Animation

Semi-supervised human pose estimation, point cloud generation, point cloud completion, referring expression, reconstruction, 3d human reconstruction.

computer vision dissertation topics

Single-View 3D Reconstruction

4d reconstruction, single-image-based hdr reconstruction, compressive sensing, keyword spotting.

computer vision dissertation topics

Small-Footprint Keyword Spotting

Visual keyword spotting, scene text detection.

computer vision dissertation topics

Curved Text Detection

Multi-oriented scene text detection, camera calibration, boundary detection.

computer vision dissertation topics

Junction Detection

Image matting.

computer vision dissertation topics

Semantic Image Matting

Video retrieval, video-text retrieval, video grounding, video-adverb retrieval, replay grounding, composed video retrieval (covr), motion synthesis.

computer vision dissertation topics

Motion Style Transfer

Temporal human motion composition, emotion classification.

computer vision dissertation topics

Document AI

Document understanding, sensor fusion, superpixels, remote sensing.

computer vision dissertation topics

Remote Sensing Image Classification

Change detection for remote sensing images, building change detection for remote sensing images.

computer vision dissertation topics

Segmentation Of Remote Sensing Imagery

computer vision dissertation topics

The Semantic Segmentation Of Remote Sensing Imagery

Video summarization.

computer vision dissertation topics

Unsupervised Video Summarization

Supervised video summarization, point cloud segmentation.

computer vision dissertation topics

Few-Shot Transfer Learning for Saliency Prediction

computer vision dissertation topics

Aerial Video Saliency Prediction

Document layout analysis.

computer vision dissertation topics

3D Anomaly Detection

Video anomaly detection, artifact detection.

computer vision dissertation topics

Point cloud reconstruction

computer vision dissertation topics

3D Semantic Scene Completion

computer vision dissertation topics

3D Semantic Scene Completion from a single RGB image

Garment reconstruction, cross-modal retrieval, image-text matching, multilingual cross-modal retrieval.

computer vision dissertation topics

Zero-shot Composed Person Retrieval

Cross-modal retrieval on rsitmd, face generation.

computer vision dissertation topics

Talking Head Generation

Talking face generation.

computer vision dissertation topics

Face Age Editing

Facial expression generation, kinship face generation, video instance segmentation.

computer vision dissertation topics

Human Detection

computer vision dissertation topics

Privacy Preserving Deep Learning

Membership inference attack, virtual try-on.

computer vision dissertation topics

Generalized Few-Shot Semantic Segmentation

3d classification, depth completion.

computer vision dissertation topics

Motion Forecasting

computer vision dissertation topics

Multi-Person Pose forecasting

computer vision dissertation topics

Multiple Object Forecasting

Scene flow estimation.

computer vision dissertation topics

Self-supervised Scene Flow Estimation

Video editing, video temporal consistency, face reconstruction, object discovery, carla map leaderboard, dead-reckoning prediction.

computer vision dissertation topics

Generalized Referring Expression Segmentation

Gaze estimation.

computer vision dissertation topics

Texture Synthesis

computer vision dissertation topics

Text-based Image Editing

Text-guided-image-editing.

computer vision dissertation topics

Zero-Shot Text-to-Image Generation

Concept alignment, conditional text-to-image synthesis, sign language recognition.

computer vision dissertation topics

Image Recognition

Fine-grained image recognition, license plate recognition, material recognition, multi-view learning, incomplete multi-view clustering, human parsing.

computer vision dissertation topics

Multi-Human Parsing

Machine unlearning, continual forgetting, pose tracking.

computer vision dissertation topics

3D Human Pose Tracking

Interactive segmentation.

computer vision dissertation topics

Breast Cancer Detection

Skin cancer classification.

computer vision dissertation topics

Breast Cancer Histology Image Classification

Lung cancer diagnosis, classification of breast cancer histology images.

computer vision dissertation topics

3D Multi-Person Pose Estimation (absolute)

computer vision dissertation topics

3D Multi-Person Pose Estimation (root-relative)

computer vision dissertation topics

3D Multi-Person Mesh Recovery

Event-based vision.

computer vision dissertation topics

Event-based Optical Flow

computer vision dissertation topics

Event-Based Video Reconstruction

Event-based motion estimation, gait recognition.

computer vision dissertation topics

Multiview Gait Recognition

Gait recognition in the wild.

computer vision dissertation topics

3D Hand Pose Estimation

Disease prediction, disease trajectory forecasting, facial landmark detection.

computer vision dissertation topics

Unsupervised Facial Landmark Detection

computer vision dissertation topics

3D Facial Landmark Localization

Interest point detection, homography estimation, 3d character animation from a single photo, scene segmentation, weakly supervised segmentation, object counting, training-free object counting, open-vocabulary object counting.

computer vision dissertation topics

Dichotomous Image Segmentation

Activity detection, inverse rendering, scene generation, temporal localization.

computer vision dissertation topics

Language-Based Temporal Localization

Temporal defect localization, template matching, 3d object tracking.

computer vision dissertation topics

3D Single Object Tracking

Camera localization.

computer vision dissertation topics

Camera Relocalization

Multi-label image classification.

computer vision dissertation topics

Multi-label Image Recognition with Partial Labels

Lidar semantic segmentation, motion segmentation, relation network, visual dialog.

computer vision dissertation topics

Text-to-Video Generation

Text-to-video editing, subject-driven video generation, intelligent surveillance.

computer vision dissertation topics

Vehicle Re-Identification

Text spotting.

computer vision dissertation topics

Disparity Estimation

computer vision dissertation topics

Handwritten Text Recognition

Handwritten document recognition, unsupervised text recognition, knowledge distillation.

computer vision dissertation topics

Data-free Knowledge Distillation

Self-knowledge distillation, few-shot class-incremental learning, class-incremental semantic segmentation, non-exemplar-based class incremental learning, moment retrieval.

computer vision dissertation topics

Zero-shot Moment Retrieval

Text to video retrieval, partially relevant video retrieval, decision making under uncertainty.

computer vision dissertation topics

Uncertainty Visualization

Person search, shadow detection.

computer vision dissertation topics

Shadow Detection And Removal

Semi-supervised object detection.

computer vision dissertation topics

Unconstrained Lip-synchronization

Mixed reality, video inpainting.

computer vision dissertation topics

Cross-corpus

Micro-expression recognition, micro-expression spotting.

computer vision dissertation topics

3D Facial Expression Recognition

computer vision dissertation topics

Smile Recognition

Future prediction, human mesh recovery.

computer vision dissertation topics

Face Image Quality Assessment

Lightweight face recognition.

computer vision dissertation topics

Age-Invariant Face Recognition

Synthetic face recognition, face quality assessement, video enhancement.

computer vision dissertation topics

3D Multi-Object Tracking

Real-time multi-object tracking, multi-animal tracking with identification, trajectory long-tail distribution for muti-object tracking, grounded multiple object tracking, open vocabulary semantic segmentation, zero-guidance segmentation, overlapped 10-1, overlapped 15-1, overlapped 15-5, disjoint 10-1, disjoint 15-1, color constancy.

computer vision dissertation topics

Few-Shot Camera-Adaptive Color Constancy

Image categorization, fine-grained visual categorization, physics-informed machine learning, soil moisture estimation, deep attention, zero shot segmentation.

computer vision dissertation topics

Stereo Image Super-Resolution

Burst image super-resolution, satellite image super-resolution, multispectral image super-resolution, hdr reconstruction, multi-exposure image fusion, line detection, video reconstruction.

computer vision dissertation topics

Visual Recognition

computer vision dissertation topics

Fine-Grained Visual Recognition

Image cropping, sign language translation.

computer vision dissertation topics

Stereo Matching Hand

computer vision dissertation topics

3D Absolute Human Pose Estimation

computer vision dissertation topics

Text-to-Face Generation

Image forensics, tone mapping, zero-shot action recognition, video restoration.

computer vision dissertation topics

Analog Video Restoration

Natural language transduction, transparent object detection, transparent objects, novel class discovery.

computer vision dissertation topics

Surface Normals Estimation

computer vision dissertation topics

hand-object pose

computer vision dissertation topics

Grasp Generation

computer vision dissertation topics

3D Canonical Hand Pose Estimation

Cross-domain few-shot learning, texture classification, vision-language navigation.

computer vision dissertation topics

Breast Cancer Histology Image Classification (20% labels)

Infrared and visible image fusion.

computer vision dissertation topics

Image Animation

computer vision dissertation topics

Pedestrian Attribute Recognition

computer vision dissertation topics

Probabilistic Deep Learning

Unsupervised few-shot image classification, generalized few-shot classification, abnormal event detection in video.

computer vision dissertation topics

Semi-supervised Anomaly Detection

Image to 3d, steganalysis.

computer vision dissertation topics

Sketch Recognition

computer vision dissertation topics

Face Sketch Synthesis

Drawing pictures.

computer vision dissertation topics

Photo-To-Caricature Translation

Spoof detection, face presentation attack detection, detecting image manipulation, cross-domain iris presentation attack detection, finger dorsal image spoof detection, computer vision techniques adopted in 3d cryogenic electron microscopy, single particle analysis, cryogenic electron tomography, highlight detection, iris recognition, pupil dilation.

computer vision dissertation topics

One-shot visual object segmentation

Image to video generation.

computer vision dissertation topics

Unconditional Video Generation

Action quality assessment, automatic post-editing.

computer vision dissertation topics

Image Stitching

computer vision dissertation topics

Multi-View 3D Reconstruction

Person retrieval, universal domain adaptation.

computer vision dissertation topics

Unbiased Scene Graph Generation

computer vision dissertation topics

Panoptic Scene Graph Generation

Action understanding, blind face restoration.

computer vision dissertation topics

Dense Captioning

Document image classification.

computer vision dissertation topics

Face Reenactment

computer vision dissertation topics

Geometric Matching

Human action generation.

computer vision dissertation topics

Action Generation

Object categorization, text based person retrieval, human dynamics.

computer vision dissertation topics

3D Human Dynamics

Meme classification, hateful meme classification, severity prediction, intubation support prediction, text-to-image, story visualization, complex scene breaking and synthesis, image fusion, pansharpening, cloud detection.

computer vision dissertation topics

Image Deconvolution

computer vision dissertation topics

Image Outpainting

computer vision dissertation topics

Diffusion Personalization

computer vision dissertation topics

Diffusion Personalization Tuning Free

computer vision dissertation topics

Efficient Diffusion Personalization

Object segmentation.

computer vision dissertation topics

Camouflaged Object Segmentation

Landslide segmentation, text-line extraction, surgical phase recognition, online surgical phase recognition, offline surgical phase recognition.

computer vision dissertation topics

Semantic SLAM

computer vision dissertation topics

Object SLAM

Intrinsic image decomposition, table recognition, point clouds, point cloud video understanding, point cloud rrepresentation learning, situation recognition, grounded situation recognition, line segment detection, motion detection, multi-target domain adaptation.

computer vision dissertation topics

Robot Pose Estimation

computer vision dissertation topics

Camouflaged Object Segmentation with a Single Task-generic Prompt

Image morphing, image shadow removal, sports analytics, visual prompt tuning, weakly-supervised instance segmentation, image smoothing, fake image detection.

computer vision dissertation topics

GAN image forensics

computer vision dissertation topics

Fake Image Attribution

Image steganography, person identification, rotated mnist, contour detection.

computer vision dissertation topics

Face Image Quality

Lane detection.

computer vision dissertation topics

3D Lane Detection

Layout design, license plate detection.

computer vision dissertation topics

Video Panoptic Segmentation

Viewpoint estimation.

computer vision dissertation topics

Drone navigation

Drone-view target localization, value prediction, body mass index (bmi) prediction, multi-object tracking and segmentation.

computer vision dissertation topics

Occlusion Handling

Zero-shot transfer image classification.

computer vision dissertation topics

3D Object Reconstruction From A Single Image

computer vision dissertation topics

CAD Reconstruction

3d point cloud linear classification, crop classification, crop yield prediction, photo retouching, motion retargeting, shape representation of 3d point clouds, bird's-eye view semantic segmentation.

computer vision dissertation topics

Dense Pixel Correspondence Estimation

Human part segmentation.

computer vision dissertation topics

Multiview Learning

Person recognition.

computer vision dissertation topics

Document Shadow Removal

Symmetry detection, traffic sign detection, video style transfer, referring image matting.

computer vision dissertation topics

Referring Image Matting (Expression-based)

computer vision dissertation topics

Referring Image Matting (Keyword-based)

computer vision dissertation topics

Referring Image Matting (RefMatte-RW100)

Referring image matting (prompt-based), human interaction recognition, one-shot 3d action recognition, mutual gaze, affordance detection.

computer vision dissertation topics

Gaze Prediction

Image forgery detection, image instance retrieval, amodal instance segmentation, image quality estimation.

computer vision dissertation topics

Image Similarity Search

computer vision dissertation topics

Precipitation Forecasting

Referring expression generation, road damage detection.

computer vision dissertation topics

Space-time Video Super-resolution

Video matting.

computer vision dissertation topics

Open-World Semi-Supervised Learning

Semi-supervised image classification (cold start), hand detection, material classification.

computer vision dissertation topics

Open Vocabulary Attribute Detection

Inverse tone mapping, image/document clustering, self-organized clustering, 3d shape modeling.

computer vision dissertation topics

Action Analysis

Facial editing.

computer vision dissertation topics

Food Recognition

computer vision dissertation topics

Holdout Set

Motion magnification, semi-supervised instance segmentation, video segmentation, camera shot boundary detection, open-vocabulary video segmentation, open-world video segmentation, instance search.

computer vision dissertation topics

Audio Fingerprint

Lung nodule detection, lung nodule 3d detection, art analysis.

computer vision dissertation topics

Zero-Shot Composed Image Retrieval (ZS-CIR)

Event segmentation, generic event boundary detection, image retouching, image-variation, jpeg artifact removal, multispectral object detection, point cloud super resolution, skills assessment.

computer vision dissertation topics

Sensor Modeling

Binary classification, llm-generated text detection, cancer-no cancer per breast classification, cancer-no cancer per image classification, suspicous (birads 4,5)-no suspicous (birads 1,2,3) per image classification, cancer-no cancer per view classification, lung nodule classification, lung nodule 3d classification, video prediction, earth surface forecasting, predict future video frames, 3d scene reconstruction, audio-visual synchronization, handwriting generation, pose retrieval, scanpath prediction, scene change detection.

computer vision dissertation topics

Sketch-to-Image Translation

Skills evaluation, highlight removal, 3d shape reconstruction from a single 2d image.

computer vision dissertation topics

Shape from Texture

Deception detection, deception detection in videos, handwriting verification, bangla spelling error correction, 3d open-vocabulary instance segmentation.

computer vision dissertation topics

3D Shape Representation

computer vision dissertation topics

3D Dense Shape Correspondence

Birds eye view object detection.

computer vision dissertation topics

Multiple People Tracking

computer vision dissertation topics

Network Interpretation

Rgb-d reconstruction, seeing beyond the visible, semi-supervised domain generalization, unsupervised semantic segmentation.

computer vision dissertation topics

Unsupervised Semantic Segmentation with Language-image Pre-training

Multiple object tracking with transformer.

computer vision dissertation topics

Multiple Object Track and Segmentation

Constrained lip-synchronization, face dubbing, vietnamese visual question answering, explanatory visual question answering.

computer vision dissertation topics

Video Visual Relation Detection

Human-object relationship detection, ad-hoc video search, defocus blur detection, event data classification, image comprehension, image manipulation localization, instance shadow detection, kinship verification, medical image enhancement, open vocabulary panoptic segmentation, single-object discovery, synthetic image detection, training-free 3d point cloud classification.

computer vision dissertation topics

Sequential Place Recognition

Autonomous flight (dense forest), autonomous web navigation.

computer vision dissertation topics

Generative 3D Object Classification

Cube engraving classification, multimodal machine translation.

computer vision dissertation topics

Face to Face Translation

Multimodal lexical translation, 10-shot image generation, 2d semantic segmentation task 3 (25 classes), document enhancement, action assessment, bokeh effect rendering, drivable area detection, face anonymization, font recognition, horizon line estimation, image imputation.

computer vision dissertation topics

Long Video Retrieval (Background Removed)

Medical image denoising.

computer vision dissertation topics

Occlusion Estimation

Physiological computing.

computer vision dissertation topics

Lake Ice Monitoring

Short-term object interaction anticipation, spatio-temporal video grounding, unsupervised 3d point cloud linear evaluation, video forensics, wireframe parsing, single-image-generation, unsupervised anomaly detection with specified settings -- 30% anomaly, root cause ranking, anomaly detection at 30% anomaly, anomaly detection at various anomaly percentages.

computer vision dissertation topics

Unsupervised Contextual Anomaly Detection

2d pose estimation, category-agnostic pose estimation, overlapping pose estimation, facial expression recognition, cross-domain facial expression recognition, zero-shot facial expression recognition, landmark tracking, muscle tendon junction identification, 3d object captioning, animated gif generation, generalized referring expression comprehension, image deblocking, motion disentanglement, persuasion strategies, scene text editing, traffic accident detection, accident anticipation, unsupervised landmark detection, visual speech recognition, lip to speech synthesis, continual anomaly detection, gaze redirection, weakly supervised action segmentation (transcript), weakly supervised action segmentation (action set)), calving front delineation in synthetic aperture radar imagery, calving front delineation in synthetic aperture radar imagery with fixed training amount.

computer vision dissertation topics

Handwritten Line Segmentation

Handwritten word segmentation.

computer vision dissertation topics

General Action Video Anomaly Detection

Physical video anomaly detection, monocular cross-view road scene parsing(road), monocular cross-view road scene parsing(vehicle).

computer vision dissertation topics

Transparent Object Depth Estimation

3d semantic occupancy prediction, 3d scene editing, 4d panoptic segmentation, age and gender estimation, data ablation.

computer vision dissertation topics

Occluded Face Detection

Gait identification, historical color image dating, stochastic human motion prediction, image retargeting, image and video forgery detection, infrared image super-resolution, motion captioning, personality trait recognition, personalized segmentation, scene-aware dialogue, spatial relation recognition, spatial token mixer, steganographics, story continuation.

computer vision dissertation topics

Unsupervised Anomaly Detection with Specified Settings -- 0.1% anomaly

Unsupervised anomaly detection with specified settings -- 1% anomaly, unsupervised anomaly detection with specified settings -- 10% anomaly, unsupervised anomaly detection with specified settings -- 20% anomaly, vehicle speed estimation, visual social relationship recognition, zero-shot text-to-video generation, text-guided-generation, video frame interpolation, 3d video frame interpolation, unsupervised video frame interpolation.

computer vision dissertation topics

eXtreme-Video-Frame-Interpolation

Continual semantic segmentation, overlapped 5-3, overlapped 25-25, evolving domain generalization, source-free domain generalization, micro-expression generation, micro-expression generation (megc2021), mistake detection, online mistake detection, unsupervised panoptic segmentation, unsupervised zero-shot panoptic segmentation, 3d rotation estimation, camera auto-calibration, defocus estimation, derendering, fingertip detection, hierarchical text segmentation, human-object interaction concept discovery.

computer vision dissertation topics

One-Shot Face Stylization

Speaker-specific lip to speech synthesis, multi-person pose estimation, neural stylization.

computer vision dissertation topics

Part-aware Panoptic Segmentation

computer vision dissertation topics

Population Mapping

Pornography detection, prediction of occupancy grid maps, raw reconstruction, svbrdf estimation, semi-supervised video classification, spectrum cartography, supervised image retrieval, synthetic image attribution, training-free 3d part segmentation, unsupervised image decomposition, video propagation, vietnamese multimodal learning, visual analogies, weakly supervised 3d point cloud segmentation, weakly-supervised panoptic segmentation, drone-based object tracking, brain visual reconstruction, brain visual reconstruction from fmri.

computer vision dissertation topics

Human-Object Interaction Generation

Image-guided composition, fashion understanding, semi-supervised fashion compatibility.

computer vision dissertation topics

intensity image denoising

Lifetime image denoising, observation completion, active observation completion, boundary grounding.

computer vision dissertation topics

Video Narrative Grounding

3d inpainting, 3d scene graph alignment, 4d spatio temporal semantic segmentation.

computer vision dissertation topics

Age Estimation

computer vision dissertation topics

Few-shot Age Estimation

Brdf estimation, camouflage segmentation, clothing attribute recognition, damaged building detection, depth image estimation, detecting shadows, dynamic texture recognition.

computer vision dissertation topics

Disguised Face Verification

Few shot open set object detection, gaze target estimation, generalized zero-shot learning - unseen, hd semantic map learning, human-object interaction anticipation, image deep networks, keypoint detection and image matching, manufacturing quality control, materials imaging, micro-gesture recognition, multi-person pose estimation and tracking.

computer vision dissertation topics

Multi-modal image segmentation

Multi-object discovery, neural radiance caching.

computer vision dissertation topics

Parking Space Occupancy

computer vision dissertation topics

Partial Video Copy Detection

computer vision dissertation topics

Multimodal Patch Matching

Perpetual view generation, procedure learning, prompt-driven zero-shot domain adaptation, repetitive action counting, single-shot hdr reconstruction, on-the-fly sketch based image retrieval, thermal image denoising, trademark retrieval, unsupervised instance segmentation, unsupervised zero-shot instance segmentation, vehicle key-point and orientation estimation.

computer vision dissertation topics

Video Individual Counting

Video-adverb retrieval (unseen compositions), video-to-image affordance grounding.

computer vision dissertation topics

Vietnamese Scene Text

Visual sentiment prediction, human-scene contact detection, localization in video forgery, 3d canonicalization, 3d surface generation.

computer vision dissertation topics

Visibility Estimation from Point Cloud

Amodal layout estimation, blink estimation, camera absolute pose regression, change data generation, constrained diffeomorphic image registration, continuous affect estimation, deep feature inversion, document image skew estimation, earthquake prediction, fashion compatibility learning.

computer vision dissertation topics

Displaced People Recognition

Finger vein recognition, flooded building segmentation.

computer vision dissertation topics

Future Hand Prediction

Generative temporal nursing, house generation, human fmri response prediction, hurricane forecasting, ifc entity classification, image declipping, image similarity detection.

computer vision dissertation topics

Image Text Removal

Image-to-gps verification.

computer vision dissertation topics

Image-based Automatic Meter Reading

Dial meter reading, indoor scene reconstruction, jpeg decompression.

computer vision dissertation topics

Kiss Detection

Laminar-turbulent flow localisation.

computer vision dissertation topics

Landmark Recognition

Brain landmark detection, corpus video moment retrieval, mllm evaluation: aesthetics, medical image deblurring, mental workload estimation, meter reading, motion expressions guided video segmentation, natural image orientation angle detection, multi-object colocalization, multilingual text-to-image generation, video emotion detection, nwp post-processing, occluded 3d object symmetry detection, open set video captioning, pso-convnets dynamics 1, pso-convnets dynamics 2, partial point cloud matching.

computer vision dissertation topics

Partially View-aligned Multi-view Learning

computer vision dissertation topics

Pedestrian Detection

computer vision dissertation topics

Thermal Infrared Pedestrian Detection

Personality trait recognition by face, physical attribute prediction, point cloud semantic completion, point cloud classification dataset, point- of-no-return (pnr) temporal localization, pose contrastive learning, potrait generation, prostate zones segmentation, pulmorary vessel segmentation, pulmonary artery–vein classification, reference expression generation, safety perception recognition, interspecies facial keypoint transfer, specular reflection mitigation, specular segmentation, state change object detection, surface normals estimation from point clouds, train ego-path detection.

computer vision dissertation topics

Transform A Video Into A Comics

Transparency separation, typeface completion.

computer vision dissertation topics

Unbalanced Segmentation

computer vision dissertation topics

Unsupervised Long Term Person Re-Identification

Video correspondence flow.

computer vision dissertation topics

Key-Frame-based Video Super-Resolution (K = 15)

Zero-shot single object tracking, yield mapping in apple orchards, lidar absolute pose regression, opd: single-view 3d openable part detection, self-supervised scene text recognition, video narration captioning, period estimation, art period estimation (544 artists), spectral estimation, spectral estimation from a single rgb image, 3d prostate segmentation, aggregate xview3 metric, atomic action recognition, composite action recognition, calving front delineation from synthetic aperture radar imagery, computer vision transduction, crosslingual text-to-image generation, zero-shot dense video captioning, document to image conversion, frame duplication detection, geometrical view, hyperview challenge.

computer vision dissertation topics

Image Operation Chain Detection

Kinematic based workflow recognition, logo recognition.

computer vision dissertation topics

MLLM Aesthetic Evaluation

Motion detection in non-stationary scenes, open-set video tagging, satellite orbit determination.

computer vision dissertation topics

Segmentation Based Workflow Recognition

2d particle picking, small object detection.

computer vision dissertation topics

Rice Grain Disease Detection

Sperm morphology classification, video & kinematic base workflow recognition, video based workflow recognition, video, kinematic & segmentation base workflow recognition, animal pose estimation.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

TOPBOTS Logo

The Best of Applied Artificial Intelligence, Machine Learning, Automation, Bots, Chatbots

10 Cutting Edge Research Papers In Computer Vision & Image Generation

January 24, 2019 by Mariya Yao

Computer Vision Research Papers

UPDATE: We’ve also summarized the top 2019 and top 2020 Computer Vision research papers. 

Ever since convolutional neural networks began outperforming humans in  specific image recognition tasks, research in the field of computer vision has proceeded at breakneck pace.

The basic architecture of CNNs (or ConvNets) was developed in the 1980s . Yann LeCun improved upon the original design in 1989 by using backpropagation to train models to recognize handwritten digits.

We’ve come a long way since then.

In 2018, we saw novel architecture designs that improve upon performance benchmarks and also expand the range of media that machine learning models can analyze.  We also saw a number of breakthroughs with media generation which enable photorealistic style transfer, high-resolution image generation, and video-to-video synthesis.

Due to the importance and prevalence of computer vision and image generation for applied and enterprise AI, we did feature some of the papers below in our previous article summarizing the top overall machine learning papers of 2018 . Since you might not have read that previous piece, we chose to highlight the vision-related research ones again here.

We’ve done our best to summarize these papers correctly, but if we’ve made any mistakes, please contact us to request a fix . Special thanks also goes to computer vision specialist  Rebecca BurWei  for generously offering her expertise in editing and revising drafts of this article.

If these summaries of scientific AI research papers are useful for you, you can subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries.  We’re planning to release summaries of important papers in computer vision, reinforcement learning, and conversational AI in the next few weeks.

If you’d like to skip around, here are the papers we featured:

  • Spherical CNNs
  • Adversarial Examples that Fool both Computer Vision and Time-Limited Humans
  • A Closed-form Solution to Photorealistic Image Stylization
  • Group Normalization
  • Taskonomy: Disentangling Task Transfer Learning
  • Self-Attention Generative Adversarial Networks
  • GANimation: Anatomically-aware Facial Animation from a Single Image
  • Video-to-Video Synthesis
  • Everybody Dance Now
  • Large Scale GAN Training for High Fidelity Natural Image Synthesis

Important Computer Vision Research Papers of 2018

1. spherical cnns , by taco s. cohen, mario geiger, jonas koehler, and max welling, original abstract.

Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective.

In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

Our Summary

Omnidirectional cameras that are already used by cars, drones, and other robots capture a spherical image of their entire surroundings. We could analyze such spherical signals by projecting them to the plane and using CNNs. However, any planar projection of a spherical signal results in distortions. To overcome this problem, the group of researchers from the University of Amsterdam introduces the theory of spherical CNNs, the networks that can analyze spherical images without being fooled by distortions.  The approach demonstrates its effectiveness for classifying 3D shapes and Spherical MNIST images as well as for molecular energy regression, an important problem in computational chemistry.

What’s the core idea of this paper?

  • Planar projections of spherical signals result in significant distortions as some areas look larger or smaller than they really are.
  • Traditional CNNs are ineffective for spherical images because as objects move around the sphere, they also appear to shrink and stretch (think maps where Greenland looks much bigger than it actually is).
  • The solution is to use a spherical CNN which is robust to spherical rotations in the input data. By preserving the original shape of the input data, spherical CNNs treat all objects on the sphere equally without distortion.

What’s the key achievement?

  • Introducing a mathematical framework for building spherical CNNs.
  • Providing easy to use, fast and memory efficient PyTorch code for implementation of these CNNs.
  • classification of Spherical MNIST images
  • classification of 3D shapes,
  • molecular energy regression.

What does the AI community think?

  • The paper won the Best Paper Award at ICLR 2018, one of the leading machine learning conferences.

What are future research areas?

  • Development of a Steerable CNN for the sphere to analyze sections of vector bundles over the sphere (e.g., wind directions).
  • Expanding the mathematical theory from 2D spheres to 3D point clouds for classification tasks that are invariant under reflections as well as rotations.

What are possible business applications?

  • the omnidirectional vision for drones, robots, and autonomous cars;
  • molecular regression problems in computational chemistry;
  • global weather and climate modeling.

Where can you get implementation code?

  • The authors provide the original implementation for this research paper on GitHub .

Applied AI Book Second Edition

2. Adversarial Examples that Fool both Computer Vision and Time-Limited Humans , by Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

Google Brain researchers seek an answer to the question: do adversarial examples that are not model-specific and can fool different computer vision models without access to their parameters and architectures, can also fool time-limited humans? They leverage key ideas from machine learning, neuroscience, and psychophysics to create adversarial examples that do in fact impact human perception in a time-limited setting. Thus, the paper introduces a new class of illusions that are shared between machines and humans.

TOP Computer Vision Papers

  • As the first step, the researchers use the black box adversarial example construction techniques that create adversarial examples without access to the model’s architecture or parameters.
  • prepending each model with a retinal layer that pre-processes the input to incorporate some of the transformations performed by the human eye;
  • performing an eccentricity-dependent blurring of the image to approximate the input which is received by the visual cortex of human subjects through their retinal lattice.
  • Classification decisions of humans are evaluated in a time-limited setting to detect even subtle effects in human perception.
  • Showing that adversarial examples that transfer across computer vision models do also successfully influence the perception of humans.
  • Demonstrating the similarity between convolutional neural networks and the human visual system.
  • The paper is widely discussed by the AI community. While most of the researchers are stunned by the results , some argue that we need a stricter definition of adversarial image because if humans classify the perturbated picture of a cat as a dog than it’s probably already a dog, not a cat.
  • Researching which techniques are crucial for the transfer of adversarial examples to humans (i.e., retinal preprocessing, model ensembling).
  • Practitioners should consider the risk that imagery could be manipulated to cause human observers to have unusual reactions because adversarial images can affect us below the horizon of awareness .

3. A Closed-form Solution to Photorealistic Image Stylization , by Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz

Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic. While several photorealistic image stylization methods exist, they tend to generate spatially inconsistent stylizations with noticeable artifacts. In this paper, we propose a method to address these issues. The proposed method consists of a stylization step and a smoothing step. While the stylization step transfers the style of the reference photo to the content photo, the smoothing step ensures spatially consistent stylizations. Each of the steps has a closed-form solution and can be computed efficiently. We conduct extensive experimental validations. The results show that the proposed method generates photorealistic stylization outputs that are more preferred by human subjects as compared to those by the competing methods while running much faster. Source code and additional results are available at https://github.com/NVIDIA/FastPhotoStyle .

The team of scientists at NVIDIA and the University of California, Merced propose a new solution to photorealistic image stylization, FastPhotoStyle. The method consists of two steps: stylization and smoothing. Extensive experiments show that the suggested approach generates more realistic and compelling images than previous state-of-the-art. Even more, thanks to the closed-form solution, FastPhotoStyle can produce the stylized image 49 times faster than traditional methods.

Top Computer Vision Research Papers

  • The goal of photorealistic image stylization is to transfer style of a reference photo to a content photo while keeping the stylized image photorealistic.
  • The stylization step is based on the whitening and coloring transform (WCT), which processes images via feature projections. However, WCT was developed for artistic image stylizations, and thus, often generates structural artifacts for photorealistic image stylization. To overcome this problem, the paper introduces PhotoWCT method, which replaces the upsampling layers in the WCT with unpooling layers, and so, preserves more spatial information.
  • The smoothing step is required to solve spatially inconsistent stylizations that could arise after the first step. Smoothing is based on a manifold ranking algorithm.
  • Both steps have a closed-form solution, which means that the solution can be obtained in a fixed number of operations (i.e., convolutions, max-pooling, whitening, etc.). Thus, computations are much more efficient compared to the traditional methods.
  • outperforms artistic stylization algorithms by rendering much fewer structural artifacts and inconsistent stylizations, and
  • outperforms photorealistic stylization algorithms by synthesizing not only colors but also patterns in the style photos.
  • The experiments demonstrate that users prefer FastPhotoStyle results over the previous state-of-the-art in terms of both stylization effects (63.1%) and photorealism (73.5%).
  • FastPhotoSyle can synthesize an image of 1024 x 512 resolution in only 13 seconds, while the previous state-of-the-art method needs 650 seconds for the same task.
  • The paper was presented at ECCV 2018, leading European Conference on Computer Vision.
  • Finding the way to transfer small patterns from the style photo as they are smoothed away by the suggested method.
  • Exploring the possibilities to further reduce the number of structural artifacts in the stylized photos.
  • Content creators in the business settings can largely benefit from photorealistic image stylization as the tool basically allows you to automatically change the style of any photo based on what fits the narrative.
  • The photographers also discuss the tremendous impact that this technology can have in real estate photography.
  • NVIDIA team provides the original implementation for this research paper on GitHub .

4. Group Normalization , by Yuxin Wu and Kaiming He

Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems – BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

Facebook AI research team suggest Group Normalization (GN) as an alternative to Batch Normalization (BN). They argue that BN’s error increases dramatically for small batch sizes. This limits the usage of BN when working with large models to solve computer vision tasks that require small batches due to memory constraints. On the contrary, Group Normalization is independent of batch sizes as it divides the channels into groups and computes the mean and variance for normalization within each group. The experiments confirm that GN outperforms BN in a variety of tasks, including object detection, segmentation, and video classification.

TOP Computer Vision Papers

  • Group Normalization is a simple alternative to Batch Normalization, especially in the scenarios where batch size tends to be small, for example, computer vision tasks, requiring high-resolution input.
  • GN explores only the layer dimensions, and thus, its computation is independent of batch size. Specifically, GN divides channels, or feature maps, into groups and normalizes the features within each group.
  • Group Normalization can be easily implemented by a few lines of code in PyTorch and TensorFlow.
  • Introducing Group Normalization, new effective normalization method.
  • GN’s accuracy is stable in a wide range of batch sizes as its computation is independent of batch size. For example, GN demonstrated a 10.6% lower error rate than its BN-based counterpart for ResNet-50 in ImageNet with a batch size of 2.
  • GN can be also transferred to fine-tuning. The experiments show that GN can outperform BN counterparts for object detection and segmentation in COCO dataset and video classification in Kinetics dataset.
  • The paper received an honorable mention at ECCV 2018, leading European Conference on Computer Vision.
  • It is also the second most popular paper in 2018 based on the people’s libraries at Arxiv Sanity Preserver.
  • Applying group normalization to sequential or generative models.
  • Investigating GN’s performance on learning representations for reinforcement learning.
  • Exploring if GN combined with a suitable regularizer will improve results.
  • Business applications that rely on BN-based models for object detection, segmentation, video classification and other computer vision tasks that require high-resolution input may benefit from moving to GN-based models as they are more accurate in these settings.
  • Facebook AI research team provides Mask R-CNN baseline results and models trained with Group Normalization .
  • PyTorch implementation of group normalization is also available on GitHub.

5. Taskonomy: Disentangling Task Transfer Learning , by Amir R. Zamir, Alexander Sax, William Shen, Leonidas J. Guibas, Jitendra Malik, and Silvio Savarese

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.

We proposes a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

Assertions of the existence of a structure among visual tasks have been made by many researchers since the early years of modern computer science. And now Amir Zamir and his team make an attempt to actually find this structure. They model it using a fully computational approach and discover lots of useful relationships between different visual tasks, including the nontrivial ones. They also show that by taking advantage of these interdependencies, it is possible to achieve the same model performance with the labeled data requirements reduced by roughly ⅔.

TOP Computer Vision Papers

  • A model aware of the relationships among different visual tasks demands less supervision, uses less computation, and behaves in more predictable ways.
  • A fully computational approach to discovering the relationships between visual tasks is preferable because it avoids imposing prior, and possibly incorrect, assumptions: the priors are derived from either human intuition or analytical knowledge, while neural networks might operate on different principles.
  • Identifying relationships between 26 common visual tasks.
  • Showing how this structure helps in discovering types of transfer learning that will be most effective for each visual task.
  • Creating a new dataset of 4 million images of indoor scenes including 600 buildings annotated with 26 tasks.
  • The paper won the Best Paper Award at CVPR 2018, the key conference on computer vision and pattern recognition.
  • The results are very important as for the most real-world tasks large-scale labeled datasets are not available .
  • To move from a model where common visual tasks are entirely defined by humans and try an approach where human-defined visual tasks are viewed as observed samples which are composed of computationally found latent subtasks.
  • Exploring the possibility to transfer the findings to not entirely visual tasks, e.g. robotic manipulation.
  • Relationships discovered in this paper can be used to build more effective visual systems that will require less labeled data and lower computational costs.

6. Self-Attention Generative Adversarial Networks , by Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

Traditional convolutional GANs demonstrated some very promising results with respect to image synthesis. However, they have at least one important weakness – convolutional layers alone fail to capture geometrical and structural patterns in the images. Since convolution is a local operation, it is hardly possible for an output on the top-left position to have any relation to the output at bottom-right . The paper introduces a simple solution to this problem – incorporating the self-attention mechanism into the GAN framework. This solution combined with several stabilization techniques helps the Senf-Attention Generative Adversarial Networks (SAGANs) achieve the state-of-the-art results in image synthesis.

TOP Computer Vision papers

  • Convolutional layers alone are computationally inefficient for modeling long-range dependencies in images. On the contrary, a self-attention mechanism incorporated into the GAN framework will enable both the generator and the discriminator to efficiently model relationships between widely separated spatial regions.
  • The self-attention module calculates response at a position as a weighted sum of the features at all positions.
  • Applying spectral normalization for both generator and discriminator – the researchers argue that not only the discriminator but also the generator can benefit from spectral normalization, as it can prevent the escalation of parameter magnitudes and avoid unusual gradients.
  • Using separate learning rates for the generator and the discriminator to compensate for the problem of slow learning in a regularized discriminator and make it possible to use fewer generator steps per discriminator step.
  • Showing that self-attention module incorporated into the GAN framework is, in fact, effective in modeling long-range dependencies.
  • spectral normalization applied to the generator stabilizes GAN training;
  • utilizing imbalanced learning rates speeds up training of regularized discriminators.
  • Achieving state-of-the-art results in image synthesis by boosting the Inception Score from 36.8 to 52.52 and reducing Fréchet Inception Distance from 27.62 to 18.65.
  • “The idea is simple and intuitive yet very effective, plus easy to implement.” – Sebastian Raschka , assistant professor of Statistics at the University of Wisconsin-Madison.
  • Exploring the possibilities to reduce the number of weird samples generated by GANs.
  • Image synthesis with GANs can replace expensive manual media creation for advertising and e-commerce purposes.
  • PyTorch and TensorFlow implementations of Self-Attention GANs are available on GitHub.

7. GANimation: Anatomically-aware Facial Animation from a Single Image , by Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, Francesc Moreno-Noguer

Recent advances in Generative Adversarial Networks (GANs) have shown impressive results for task of facial expression synthesis. The most successful architecture is StarGAN, that conditions GANs generation process with images of a specific domain, namely a set of images of persons sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combine several of them. Additionally, we propose a fully unsupervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit attention mechanisms that make our network robust to changing backgrounds and lighting conditions. Extensive evaluation show that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild.

The paper introduces a novel GAN model that is able to generate anatomically-aware facial animations from a single image under changing backgrounds and illumination conditions. It advances current works, which had only addressed the problem for discrete emotions category editing and portrait images. The approach renders a wide range of emotions by encoding facial deformations as Action Units. The resulting animations demonstrate a remarkably smooth and consistent transformation across frames even with challenging light conditions and backgrounds.

TOP Computer Vision Papers

  • Facial expressions can be described in terms of Action Units (AUs), which anatomically describe the contractions of specific facial muscles. For example, the facial expression for ‘fear’ is generally produced with the following activations: Inner Brow Raiser (AU1), Outer Brow Raiser (AU2), Brow Lowerer (AU4), Upper Lid Raiser (AU5), Lid Tightener (AU7), Lip Stretcher (AU20) and Jaw Drop (AU26). The magnitude of each AU defines the extent of emotion.
  • A model for synthetic facial animation is based on the GAN architecture, which is conditioned on a one-dimensional vector indicating the presence/absence and the magnitude of each Action Unit.
  • To circumvent the need for pairs of training images of the same person under different expressions, a bidirectional generator is used to both transform an image into a desired expression and transform the synthesized image back into the original pose.
  • To handle images under changing backgrounds and illumination conditions, the model includes an attention layer that focuses the action of the network only in those regions of the image that are relevant to convey the novel expression.
  • Introducing a novel GAN model for face animation in the wild that can be trained in a fully unsupervised manner and generate visually compelling images with remarkably smooth and consistent transformation across frames even with challenging light conditions and non-real world data.
  • Demonstrating how a wider range of emotions can be generated by interpolating between emotions the GAN has already seen.
  • Applying the introduced approach to video sequences.
  • The technology that automatically animates the facial expression from a single image can be applied in several areas including the fashion and e-commerce business, the movie industry, photography technologies.
  • The authors provide the original implementation of this research paper on GitHub .

8. Video-to-Video Synthesis , by Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

Researchers from NVIDIA have introduced a novel video-to-video synthesis approach. The framework is based on conditional GANs. Specifically, the method couples carefully-designed generator and discriminator with a spatio-temporal adversarial objective. The experiments demonstrate that the suggested vid2vid approach can synthesize high-resolution, photorealistic, temporally coherent videos on a diverse set of input formats including segmentation masks, sketches, and poses. It can also predict the next frames with far superior results than the baseline models.

TOP Computer Vision Papers

  • current source frame;
  • past two source frames;
  • past two generated frames.
  • Conditional image discriminator ensures that each output frame resembles a real image given the same source image.
  • Conditional video discriminator ensures that consecutive output frames resemble the temporal dynamics of a real video given the same optical flow.
  • Foreground-background prior in the generator design further improves the synthesis performance of the proposed model.
  • Using a soft occlusion mask instead of binary allows to better handle the “zoom in” scenario: we can add details by gradually blending the warped pixels and the newly synthesized pixels.
  • Generating high-resolution (2048х2048), photorealistic, temporally coherent videos up to 30 seconds long.
  • Outputting several videos with different visual appearances depending on sampling different feature vectors.
  • Outperforming the baseline models in future video prediction.
  • Converting semantic labels into realistic real-world videos.
  • Generating multiple outputs of talking people from edge maps.
  • Generating an entire human body given a pose.
  • “NVIDIA’s new vid2vid is the first open-source code that lets you fake anybody’s face convincingly from one source video. […] interesting times ahead…”, Gene Kogan , an artist and a programmer.
  • The paper has also received some criticism over the concern that it can be used to create deepfakes or tampered videos which can deceive people.
  • Using object tracking information to make sure that each object has a consistent appearance across the whole video.
  • Researching if training the model with coarser semantic labels will help reduce the visible artifacts that appear after semantic manipulations (e.g., turning trees into buildings).
  • Adding additional 3D cues, such as depth maps, to enable synthesis of turning cars.
  • Marketing and advertising can benefit from the opportunities created by the vid2vid method (e.g., replacing the face or even the entire body in the video). However, this should be used with caution, keeping in mind the ethical considerations.
  • NVIDIA team provides the original implementation of this research paper on GitHub .

9. Everybody Dance Now , by Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros

This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We pose this problem as a per-frame image-to-image translation with spatio-temporal smoothing. Using pose detections as an intermediate representation between source and target, we learn a mapping from pose images to a target subject’s appearance. We adapt this setup for temporally coherent video generation including realistic face synthesis. Our video demo can be found at https://youtu.be/PCBTZh41Ris .

UC Berkeley researchers present a simple method for generating videos with amateur dancers performing like professional dancers. If you want to take part in the experiment, all you need to do is to record a few minutes of yourself performing some standard moves and then pick up the video with the dance you want to repeat. The neural network will do the main job: it solves the problem as a per-frame image-to-image translation with spatio-temporal smoothing. By conditioning the prediction at each frame on that of the previous time step for temporal smoothness and applying a specialized GAN for realistic face synthesis, the method achieves really amazing results.

TOP Computer Vision Papers

  • A pre-trained state-of-the-art pose detector creates pose stick figures from the source video.
  • Global pose normalization is applied to account for differences between the source and target subjects in body shapes and locations within the frame.
  • Normalized pose stick figures are mapped to the target subject.
  • To make videos smooth, the researchers suggest conditioning the generator on the previously generated frame and then giving both images to the discriminator. Gaussian smoothing on the pose keypoints allows to further reduce jitter.
  • To generate more realistic faces, the method includes an additional face-specific GAN that brushes up the face after the main generation is finished.
  • Suggesting a novel approach to motion transfer that outperforms a strong baseline (pix2pixHD), according to both qualitative and quantitative assessments.
  • Demonstrating that face-specific GAN adds considerable detail to the output video.
  • “Overall I thought this was really fun and well executed. Looking forward to the code release so that I can start training my dance moves.”, Tom Brown , member of technical staff at Google Brain.
  • “’Everybody Dance Now’ from Caroline Chan, Alyosha Efros and team transfers dance moves from one subject to another. The only way I’ll ever dance well. Amazing work!!!”, Soumith Chintala‏, AI Research Engineer at Facebook.
  • Replacing pose stick figures with temporally coherent inputs and representation specifically optimized for motion transfer.
  • “Do as I do” motion transfer might be applied to replace subjects when creating marketing and promotional videos.
  • PyTorch implementation of this research paper is available on GitHub .

10. Large Scale GAN Training for High Fidelity Natural Image Synthesis , by Andrew Brock, Jeff Donahue, and Karen Simonyan

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple “truncation trick”, allowing fine control over the trade-off between sample fidelity and variety by truncating the latent space. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128×128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.3 and Frechet Inception Distance (FID) of 9.6, improving over the previous best IS of 52.52 and FID of 18.65.

DeepMind team finds that current techniques are sufficient for synthesizing high-resolution, diverse images from available datasets such as ImageNet and  JFT-300M. In particular, they show that Generative Adversarial Networks (GANs) can generate images that look very realistic if they are trained at the very large scale, i.e. using two to four times as many parameters and eight times the batch size compared to prior art. These large-scale GANs, or BigGANs, are the new state-of-the-art in class-conditional image synthesis.

TOP Computer Vision Papers

  • GANs perform much better with the increased batch size and number of parameters.
  • Applying orthogonal regularization to the generator makes the model responsive to a specific technique (“truncation trick”), which provides control over the trade-off between sample fidelity and variety.
  • Demonstrating that GANs can benefit significantly from scaling.
  • Building models that allow explicit, fine-grained control of the trade-off between sample variety and fidelity.
  • Discovering instabilities of large-scale GANs and characterizing them empirically.
  • an Inception Score (IS) of 166.3 with the previous best IS of 52.52;
  • Frechet Inception Distance (FID) of 9.6 with the previous best FID of 18.65.
  • The paper is under review for next ICLR 2019.
  • After BigGAN generators become available on TF Hub, AI researchers from all over the world are playing with BigGANs to generate dogs, watches, bikini images, Mona Lisa, seashores and many more.
  • Moving to larger datasets to mitigate GAN stability issues.
  • Replacing expensive manual media creation for advertising and e-commerce purposes.
  • A BigGAN demo implemented in TensorFlow is available to use on Google’s Colab tool.
  • Aaron Leong has a Github repository for BigGAN implemented in PyTorch .

Want Deeper Dives Into Specific AI Research Topics?

Due to popular demand, we’ve released several of these easy-to-read summaries and syntheses of major research papers for different subtopics within AI and machine learning.

  • Top 10 machine learning & AI research papers of 2018
  • Top 10 AI fairness, accountability, transparency, and ethics (FATE) papers of 2018
  • Top 14 natural language processing (NLP) research papers of 2018
  • Top 10 computer vision and image generation research papers of 2018
  • Top 10 conversational AI and dialog systems research papers of 2018
  • Top 10 deep reinforcement learning research papers of 2018

Update: 2019 Research Summaries Are Released

  • Top 10 AI & machine learning research papers from 2019
  • Top 11 NLP achievements & papers from 2019
  • Top 10 research papers in conversational AI from 2019
  • Top 10 computer vision research papers from 2019
  • Top 12 AI ethics research papers introduced in 2019
  • Top 10 reinforcement learning research papers from 2019

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more summary articles like this one.

  • Email Address *
  • Name * First Last
  • Natural Language Processing (NLP)
  • Chatbots & Conversational AI
  • Computer Vision
  • Ethics & Safety
  • Machine Learning
  • Deep Learning
  • Reinforcement Learning
  • Generative Models
  • Other (Please Describe Below)
  • What is your biggest challenge with AI research? *

Reader Interactions

' src=

About Mariya Yao

Mariya is the co-author of Applied AI: A Handbook For Business Leaders and former CTO at Metamaven. She "translates" arcane technical concepts into actionable business advice for executives and designs lovable products people actually want to use. Follow her on Twitter at @thinkmariya to raise your AI IQ.

' src=

March 13, 2024 at 4:32 pm

If you have a patio, deck or pool and are looking for some fun ways to resurface it, you may be wondering how to do stamped concrete over existing patio surfaces. https://www.google.com/maps/place/?cid=10866013157741552281

' src=

March 21, 2024 at 6:18 am

Yes! Finally someone writes about tote bags.

' src=

March 27, 2024 at 7:39 am

A coloured concrete driveway can be a great option if you want to add character to a plain concrete driveway. It is durable, weatherproof, and offers many different design options. https://search.google.com/local/reviews?placeid=ChIJLRrbgctL4okRbNmXXl3Lpkk

Leave a Reply

You must be logged in to post a comment.

About TOPBOTS

  • Expert Contributors
  • Terms of Service & Privacy Policy
  • Contact TOPBOTS

Center for Security and Emerging Technology

Computer Vision Research Clusters

Data Snapshot

Concentrations of ai-related topics in research: computer vision.

Autumn Toney

Data Snapshots are informative descriptions and quick analyses that dig into CSET’s unique data resources. Our first series of Snapshots introduced CSET’s Map of Science and explored the underlying data and analytic utility of this new tool, which enables users to interact with the Map directly.

After defining AI-related research topics across all of science in Defining Computer Vision, Natural Language Processing, and Robotics Research Clusters , here we explore the 1,105 research clusters that are labeled as computer vision RCs (as of February 2021). 1 RCs are assigned a CV label if they have at least 25 percent AI-related papers and 25 percent CV-related papers, with CV being the dominant AI-related topic (i.e., natural language processing and robotics have lower percentages). We investigate how artificial intelligence-related CV research is developed and applied across all of science and provide details on RCs with low and high CV-related paper concentrations. 

Applied in a wide range of domains, from autonomous vehicles to medical imaging, CV technologies utilize AI/machine learning methods on visual inputs, such as images or videos. Tumor recognition in medical imaging, for instance, is an example of applied CV technology. 2 Figure 1 displays CV RCs highlighted in the Map of Science, where each RC is colored by its broad area of research.

Figure 1. CV RCs Highlighted in the Map of Science. 

computer vision dissertation topics

Table 1 provides the breakdown of CV RCs by their broad area of research, with the overwhelming majority falling under computer science. 

Table 1. Number of CV RCs by Broad Research Area

Table 2 provides the breakdown of CV RCs by their concentration of CV-related publications. We find that 43 percent of CV RCs are made up of anywhere between 25 percent and 50 percent CV-related papers.

Table 2. CV-related Publication Concentrations Across CV RCs 

In order to understand the range of RCs that can be assigned the CV label, we provide details on four RCs: 

  • The CV RC with the highest percentage of CV-related publications
  • The CV RC with the lowest percentage of CV-related publications
  • A CV RC in non-computer science STEM field
  • A CV RC in a non-STEM field 

For each of these RCs, we provide the top five core papers. Core papers are publications that have strong citation links within an RC, meaning that they have high citation counts from the other publications in that cluster. Since RCs do not necessarily represent a homogenous area of research, we can review the member publications to identify the central areas of research that a RC is focused on. 

CV-related RC with the highest percentage of CV-related publications: 

With 679 papers between 2015 and 2021, RC 35520 focuses on object segmentation in video footage. More generally described, this RC focuses on pattern recognition. Papers in this RC range from data set creation to proposing new methods of object segmentation. 3 98 percent of papers in RC 35520 are CV-related, as the main area of research is pattern recognition, a subset of CV research.  

RC 35520 Top Five Core Papers:

  • Learning Video Object Segmentation from Static Images
  • One-Shot Video Object Segmentation
  • A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
  • Video Object Segmentation without Temporal Information
  • Fast Video Object Segmentation by Reference-Guided Mask Propagation

CV-related RC with the lowest percentage of CV-related publications: 

With 617 papers between 2015 and 2021, RC 54261 focuses on real-time computing, specifically using distributed computing on the cloud for mobile devices. Papers in this RC range from improvements to real-time learning to mobile augmented reality. 4 RC 54261 is a cross-disciplinary subset of research, as it contains scientific publications from different research areas (e.g., distributed computing, real-time computing, computer vision). While computer vision is a contributing area of research to this RC, with 25 percent of papers being CV-related, RC 54261 is not strictly CV research.  

RC 54261 Top Five Core Papers:

  • Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge
  • MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints
  • DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications
  • DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics
  • Chameleon: scalable adaptation of video analytics

CV-related RC in Mathematics

With 217 papers between 2015 and 2021, RC 87188 focuses on noise removal in images, using Possanian image deconvolution and Cauchy noise removal. RC 87188 has 66 percent CV-related publications. As a mathematics-dominant RC, papers in this cluster focus on mathematical approaches to computer vision. 

RC 87188 Top Five Core Papers:

  • Multiplicative noise removal in imaging: An exp-model and its fixed-point proximitied algorithm
  • A new variational approach for restoring images with multiplicative noise
  • A convex total generalized variation regularized model for multiplicative noise and blur removal
  • Multiplicative noise removal via using nonconvex regularizers based on total variation and wavelet frame
  • Cauchy Noise Removal by Nonconvex ADMM with Convergence Guarantees

CV-related RC in Social Science

With 518 papers between 2015 and 2021, RC 5508 focuses on face-processing systems as learned by primates. RC 55088 has 49 percent CV-related publications. As a social science-dominant RC, papers in this cluster explore facial processing and recognition in primates from lab experiments. Because this RC focuses on an important CV topic, facial recognition, this RC gets associated with CV even though many papers are not direct CV implementations. 

RC 55088 Top Five Core Papers:

  • A Revised Neural Framework for Face Processing
  • The Code for Facial Identity in the Primate Brain
  • Anatomical Connections of the Functionally Defined “Face Patches” in the Macaque Monkey
  • What can we learn about human individual face recognition from experimental studies in monkeys?
  • Transformation of Visual Representations Across Ventral Stream Body-selective Patches

Parts three and four of this snapshot mini-series will explore RCs labeled as “NLP” and then Robotics.”

In August 2021, CSET updated the Map of Science, linking more data to the research clusters and implementing a more stable clustering method. With this update, research clusters were assigned new IDs, so the cluster IDs reported in this Snapshot will not match IDs in the current Map of Science user interface. If you are interested in knowing which clusters in the updated Map are most similar to those reported here, or have general questions about our methodology or want to discuss this research, you can email  [email protected] .

Download Related Data Brief

  • Autumn Toney, “Defining Computer Vision, Natural Language Processing, and Robotics Research Clusters ” (Center for Security and Emerging Technology: August 2021).
  • Svoboda, E. (2020). Artificial intelligence is improving the detection of lung cancer. Nature, 587(7834), S20–S22. https://doi.org/10.1038/d41586-020-03157-9
  • See for example: Perazzi, Federico, et al. “A benchmark dataset and evaluation methodology for video object segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition . 2016 and Cao, Xiaochun, et al. “Unsupervised pixel-level video foreground object segmentation via shortest path algorithm.” Neurocomputing 172 (2016): 235-243.
  • See for example: Lee, Kyungmin, et al. “Outatime: Using speculation to enable low-latency continuous interaction for mobile cloud gaming.” Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services . 2015. And Jain, Puneet, Justin Manweiler, and Romit Roy Choudhury. “Overlay: Practical mobile augmented reality.” Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services . 2015.

Related Content

Defining computer vision, natural language processing, and robotics research clusters.

Data Snapshots are informative descriptions and quick analyses that dig into CSET’s unique data resources. Our first series of Snapshots introduced CSET’s Map of Science and explored the underlying data and analytic utility of this… Read More

Measuring AI RC Growth

Locating ai research in the map of science, coloring the map of science, creating a map of science and measuring the role of ai in it, this website uses cookies., privacy overview.

MIT Libraries home DSpace@MIT

  • DSpace@MIT Home
  • MIT Libraries
  • Doctoral Theses

Learning to solve problems in computer vision with synthetic data

Thumbnail

Other Contributors

Terms of use, description, date issued, collections.

login.png

Computer Vision Group TUM School of Computation, Information and Technology Technical University of Munich

Technical university of munich.

  • Our University
  • Coronavirus
  • Publications
  • Departments
  • Awards and Honors
  • University Hospitals
  • Teaching and QMS
  • Working at TUM
  • Contact & Directions
  • Research Centers
  • Excellence Strategy
  • Research projects
  • Research Partners
  • Research promotion
  • Doctorate (Ph.D.)
  • Career openings
  • Entrepre­neurship
  • Technology transfer
  • Industry Liaison Office
  • Lifelong learning
  • Degree programs
  • International Students
  • Application
  • Fees and Financial Aid
  • During your Studies
  • Completing your Studies
  • Student Life
  • Accommo­dation
  • Music and Arts
  • Alumni Services
  • Career Service
  • TUM for schools
  • International Locations
  • International Alliances
  • Language Center

computer vision dissertation topics

Princeton University

  • Advisers & Contacts
  • Bachelor of Arts & Bachelor of Science in Engineering
  • Prerequisites
  • Declaring Computer Science for AB Students
  • Declaring Computer Science for BSE Students
  • Class of '25, '26 & '27 - Departmental Requirements
  • Class of 2024 - Departmental Requirements
  • COS126 Information
  • Important Steps and Deadlines
  • Independent Work Seminars
  • Guidelines and Useful Information

Undergraduate Research Topics

  • AB Junior Research Workshops
  • Undergraduate Program FAQ
  • How to Enroll
  • Requirements
  • Certificate Program FAQ
  • Interdepartmental Committee
  • Minor Program
  • Funding for Student Group Activities
  • Mailing Lists and Policies
  • Study Abroad
  • Jobs & Careers
  • Admissions Requirements
  • Breadth Requirements
  • Pre-FPO Checklist
  • FPO Checklist
  • M.S.E. Track
  • M.Eng. Track
  • Departmental Internship Policy (for Master's students)
  • General Examination
  • Fellowship Opportunities
  • Travel Reimbursement Policy
  • Communication Skills
  • Course Schedule
  • Course Catalog
  • Research Areas
  • Interdisciplinary Programs
  • Technical Reports
  • Computing Facilities
  • Researchers
  • Technical Staff
  • Administrative Staff
  • Graduate Students
  • Undergraduate Students
  • Graduate Alumni
  • Climate and Inclusion Committee
  • Resources for Undergraduate & Graduate Students
  • Outreach Initiatives
  • Resources for Faculty & Staff
  • Spotlight Stories
  • Job Openings
  • Undergraduate Program
  • Independent Work & Theses

Suggested Undergraduate Research Topics

computer vision dissertation topics

How to Contact Faculty for IW/Thesis Advising

Send the professor an e-mail. When you write a professor, be clear that you want a meeting regarding a senior thesis or one-on-one IW project, and briefly describe the topic or idea that you want to work on. Check the faculty listing for email addresses.

Parastoo Abtahi, Room 419

Available for single-semester IW and senior thesis advising, 2024-2025

  • Research Areas: Human-Computer Interaction (HCI), Augmented Reality (AR), and Spatial Computing
  • Input techniques for on-the-go interaction (e.g., eye-gaze, microgestures, voice) with a focus on uncertainty, disambiguation, and privacy.
  • Minimal and timely multisensory output (e.g., spatial audio, haptics) that enables users to attend to their physical environment and the people around them, instead of a 2D screen.
  • Interaction with intelligent systems (e.g., IoT, robots) situated in physical spaces with a focus on updating users’ mental model despite the complexity and dynamicity of these systems.

Ryan Adams, Room 411

Research areas:

  • Machine learning driven design
  • Generative models for structured discrete objects
  • Approximate inference in probabilistic models
  • Accelerating solutions to partial differential equations
  • Innovative uses of automatic differentiation
  • Modeling and optimizing 3d printing and CNC machining

Andrew Appel, Room 209

Available for Fall 2024 IW advising, only

  • Research Areas: Formal methods, programming languages, compilers, computer security.
  • Software verification (for which taking COS 326 / COS 510 is helpful preparation)
  • Game theory of poker or other games (for which COS 217 / 226 are helpful)
  • Computer game-playing programs (for which COS 217 / 226)
  •  Risk-limiting audits of elections (for which ORF 245 or other knowledge of probability is useful)

Sanjeev Arora, Room 407

  • Theoretical machine learning, deep learning and its analysis, natural language processing. My advisees would typically have taken a course in algorithms (COS423 or COS 521 or equivalent) and a course in machine learning.
  • Show that finding approximate solutions to NP-complete problems is also NP-complete (i.e., come up with NP-completeness reductions a la COS 487). 
  • Experimental Algorithms: Implementing and Evaluating Algorithms using existing software packages. 
  • Studying/designing provable algorithms for machine learning and implementions using packages like scipy and MATLAB, including applications in Natural language processing and deep learning.
  • Any topic in theoretical computer science.

David August, Room 221

Not available for IW or thesis advising, 2024-2025

  • Research Areas: Computer Architecture, Compilers, Parallelism
  • Containment-based approaches to security:  We have designed and tested a simple hardware+software containment mechanism that stops incorrect communication resulting from faults, bugs, or exploits from leaving the system.   Let's explore ways to use containment to solve real problems.  Expect to work with corporate security and technology decision-makers.
  • Parallelism: Studies show much more parallelism than is currently realized in compilers and architectures.  Let's find ways to realize this parallelism.
  • Any other interesting topic in computer architecture or compilers. 

Mark Braverman, 194 Nassau St., Room 231

  • Research Areas: computational complexity, algorithms, applied probability, computability over the real numbers, game theory and mechanism design, information theory.
  • Topics in computational and communication complexity.
  • Applications of information theory in complexity theory.
  • Algorithms for problems under real-life assumptions.
  • Game theory, network effects
  • Mechanism design (could be on a problem proposed by the student)

Sebastian Caldas, 221 Nassau Street, Room 105

  • Research Areas: collaborative learning, machine learning for healthcare. Typically, I will work with students that have taken COS324.
  • Methods for collaborative and continual learning.
  • Machine learning for healthcare applications.

Bernard Chazelle, 194 Nassau St., Room 301

  • Research Areas: Natural Algorithms, Computational Geometry, Sublinear Algorithms. 
  • Natural algorithms (flocking, swarming, social networks, etc).
  • Sublinear algorithms
  • Self-improving algorithms
  • Markov data structures

Danqi Chen, Room 412

  • My advisees would be expected to have taken a course in machine learning and ideally have taken COS484 or an NLP graduate seminar.
  • Representation learning for text and knowledge bases
  • Pre-training and transfer learning
  • Question answering and reading comprehension
  • Information extraction
  • Text summarization
  • Any other interesting topics related to natural language understanding/generation

Marcel Dall'Agnol, Corwin 034

  • Research Areas: Theoretical computer science. (Specifically, quantum computation, sublinear algorithms, complexity theory, interactive proofs and cryptography)
  • Research Areas: Machine learning

Jia Deng, Room 423

  •  Research Areas: Computer Vision, Machine Learning.
  • Object recognition and action recognition
  • Deep Learning, autoML, meta-learning
  • Geometric reasoning, logical reasoning

Adji Bousso Dieng, Room 406

  • Research areas: Vertaix is a research lab at Princeton University led by Professor Adji Bousso Dieng. We work at the intersection of artificial intelligence (AI) and the natural sciences. The models and algorithms we develop are motivated by problems in those domains and contribute to advancing methodological research in AI. We leverage tools in statistical machine learning and deep learning in developing methods for learning with the data, of various modalities, arising from the natural sciences.

Robert Dondero, Corwin Hall, Room 038

  • Research Areas:  Software engineering; software engineering education.
  • Develop or evaluate tools to facilitate student learning in undergraduate computer science courses at Princeton, and beyond.
  • In particular, can code critiquing tools help students learn about software quality?

Zeev Dvir, 194 Nassau St., Room 250

  • Research Areas: computational complexity, pseudo-randomness, coding theory and discrete mathematics.
  • Independent Research: I have various research problems related to Pseudorandomness, Coding theory, Complexity and Discrete mathematics - all of which require strong mathematical background. A project could also be based on writing a survey paper describing results from a few theory papers revolving around some particular subject.

Benjamin Eysenbach, Room 416

  • Research areas: reinforcement learning, machine learning. My advisees would typically have taken COS324.
  • Using RL algorithms to applications in science and engineering.
  • Emergent behavior of RL algorithms on high-fidelity robotic simulators.
  • Studying how architectures and representations can facilitate generalization.

Christiane Fellbaum, 1-S-14 Green

  • Research Areas: theoretical and computational linguistics, word sense disambiguation, lexical resource construction, English and multilingual WordNet(s), ontology
  • Anything having to do with natural language--come and see me with/for ideas suitable to your background and interests. Some topics students have worked on in the past:
  • Developing parsers, part-of-speech taggers, morphological analyzers for underrepresented languages (you don't have to know the language to develop such tools!)
  • Quantitative approaches to theoretical linguistics questions
  • Extensions and interfaces for WordNet (English and WN in other languages),
  • Applications of WordNet(s), including:
  • Foreign language tutoring systems,
  • Spelling correction software,
  • Word-finding/suggestion software for ordinary users and people with memory problems,
  • Machine Translation 
  • Sentiment and Opinion detection
  • Automatic reasoning and inferencing
  • Collaboration with professors in the social sciences and humanities ("Digital Humanities")

Adam Finkelstein, Room 424 

  • Research Areas: computer graphics, audio.

Robert S. Fish, Corwin Hall, Room 037

  • Networking and telecommunications
  • Learning, perception, and intelligence, artificial and otherwise;
  • Human-computer interaction and computer-supported cooperative work
  • Online education, especially in Computer Science Education
  • Topics in research and development innovation methodologies including standards, open-source, and entrepreneurship
  • Distributed autonomous organizations and related blockchain technologies

Michael Freedman, Room 308 

  • Research Areas: Distributed systems, security, networking
  • Projects related to streaming data analysis, datacenter systems and networks, untrusted cloud storage and applications. Please see my group website at http://sns.cs.princeton.edu/ for current research projects.

Ruth Fong, Room 032

  • Research Areas: computer vision, machine learning, deep learning, interpretability, explainable AI, fairness and bias in AI
  • Develop a technique for understanding AI models
  • Design a AI model that is interpretable by design
  • Build a paradigm for detecting and/or correcting failure points in an AI model
  • Analyze an existing AI model and/or dataset to better understand its failure points
  • Build a computer vision system for another domain (e.g., medical imaging, satellite data, etc.)
  • Develop a software package for explainable AI
  • Adapt explainable AI research to a consumer-facing problem

Note: I am happy to advise any project if there's a sufficient overlap in interest and/or expertise; please reach out via email to chat about project ideas.

Tom Griffiths, Room 405

Available for Fall 2024 single-semester IW advising, only

Research areas: computational cognitive science, computational social science, machine learning and artificial intelligence

Note: I am open to projects that apply ideas from computer science to understanding aspects of human cognition in a wide range of areas, from decision-making to cultural evolution and everything in between. For example, we have current projects analyzing chess game data and magic tricks, both of which give us clues about how human minds work. Students who have expertise or access to data related to games, magic, strategic sports like fencing, or other quantifiable domains of human behavior feel free to get in touch.

Aarti Gupta, Room 220

  • Research Areas: Formal methods, program analysis, logic decision procedures
  • Finding bugs in open source software using automatic verification tools
  • Software verification (program analysis, model checking, test generation)
  • Decision procedures for logical reasoning (SAT solvers, SMT solvers)

Elad Hazan, Room 409  

  • Research interests: machine learning methods and algorithms, efficient methods for mathematical optimization, regret minimization in games, reinforcement learning, control theory and practice
  • Machine learning, efficient methods for mathematical optimization, statistical and computational learning theory, regret minimization in games.
  • Implementation and algorithm engineering for control, reinforcement learning and robotics
  • Implementation and algorithm engineering for time series prediction

Felix Heide, Room 410

  • Research Areas: Computational Imaging, Computer Vision, Machine Learning (focus on Optimization and Approximate Inference).
  • Optical Neural Networks
  • Hardware-in-the-loop Holography
  • Zero-shot and Simulation-only Learning
  • Object recognition in extreme conditions
  • 3D Scene Representations for View Generation and Inverse Problems
  • Long-range Imaging in Scattering Media
  • Hardware-in-the-loop Illumination and Sensor Optimization
  • Inverse Lidar Design
  • Phase Retrieval Algorithms
  • Proximal Algorithms for Learning and Inference
  • Domain-Specific Language for Optics Design

Peter Henderson , 302 Sherrerd Hall

  • Research Areas: Machine learning, law, and policy

Kyle Jamieson, Room 306

  • Research areas: Wireless and mobile networking; indoor radar and indoor localization; Internet of Things
  • See other topics on my independent work  ideas page  (campus IP and CS dept. login req'd)

Alan Kaplan, 221 Nassau Street, Room 105

Research Areas:

  • Random apps of kindness - mobile application/technology frameworks used to help individuals or communities; topic areas include, but are not limited to: first response, accessibility, environment, sustainability, social activism, civic computing, tele-health, remote learning, crowdsourcing, etc.
  • Tools automating programming language interoperability - Java/C++, React Native/Java, etc.
  • Software visualization tools for education
  • Connected consumer devices, applications and protocols

Brian Kernighan, Room 311

  • Research Areas: application-specific languages, document preparation, user interfaces, software tools, programming methodology
  • Application-oriented languages, scripting languages.
  • Tools; user interfaces
  • Digital humanities

Zachary Kincaid, Room 219

  • Research areas: programming languages, program analysis, program verification, automated reasoning
  • Independent Research Topics:
  • Develop a practical algorithm for an intractable problem (e.g., by developing practical search heuristics, or by reducing to, or by identifying a tractable sub-problem, ...).
  • Design a domain-specific programming language, or prototype a new feature for an existing language.
  • Any interesting project related to programming languages or logic.

Gillat Kol, Room 316

Aleksandra korolova, 309 sherrerd hall.

  • Research areas: Societal impacts of algorithms and AI; privacy; fair and privacy-preserving machine learning; algorithm auditing.

Advisees typically have taken one or more of COS 226, COS 324, COS 423, COS 424 or COS 445.

Pravesh Kothari, Room 320

  • Research areas: Theory

Amit Levy, Room 307

  • Research Areas: Operating Systems, Distributed Systems, Embedded Systems, Internet of Things
  • Distributed hardware testing infrastructure
  • Second factor security tokens
  • Low-power wireless network protocol implementation
  • USB device driver implementation

Kai Li, Room 321

  • Research Areas: Distributed systems; storage systems; content-based search and data analysis of large datasets.
  • Fast communication mechanisms for heterogeneous clusters.
  • Approximate nearest-neighbor search for high dimensional data.
  • Data analysis and prediction of in-patient medical data.
  • Optimized implementation of classification algorithms on manycore processors.

Xiaoyan Li, 221 Nassau Street, Room 104

  • Research areas: Information retrieval, novelty detection, question answering, AI, machine learning and data analysis.
  • Explore new statistical retrieval models for document retrieval and question answering.
  • Apply AI in various fields.
  • Apply supervised or unsupervised learning in health, education, finance, and social networks, etc.
  • Any interesting project related to AI, machine learning, and data analysis.

Lydia Liu, Room 414

  • Research Areas: algorithmic decision making, machine learning and society
  • Theoretical foundations for algorithmic decision making (e.g. mathematical modeling of data-driven decision processes, societal level dynamics)
  • Societal impacts of algorithms and AI through a socio-technical lens (e.g. normative implications of worst case ML metrics, prediction and model arbitrariness)
  • Machine learning for social impact domains, especially education (e.g. responsible development and use of LLMs for education equity and access)
  • Evaluation of human-AI decision making using statistical methods (e.g. causal inference of long term impact)

Wyatt Lloyd, Room 323

  • Research areas: Distributed Systems
  • Caching algorithms and implementations
  • Storage systems
  • Distributed transaction algorithms and implementations

Alex Lombardi , Room 312

  • Research Areas: Theory

Margaret Martonosi, Room 208

  • Quantum Computing research, particularly related to architecture and compiler issues for QC.
  • Computer architectures specialized for modern workloads (e.g., graph analytics, machine learning algorithms, mobile applications
  • Investigating security and privacy vulnerabilities in computer systems, particularly IoT devices.
  • Other topics in computer architecture or mobile / IoT systems also possible.

Jonathan Mayer, Sherrerd Hall, Room 307 

Available for Spring 2025 single-semester IW, only

  • Research areas: Technology law and policy, with emphasis on national security, criminal procedure, consumer privacy, network management, and online speech.
  • Assessing the effects of government policies, both in the public and private sectors.
  • Collecting new data that relates to government decision making, including surveying current business practices and studying user behavior.
  • Developing new tools to improve government processes and offer policy alternatives.

Mae Milano, Room 307

  • Local-first / peer-to-peer systems
  • Wide-ares storage systems
  • Consistency and protocol design
  • Type-safe concurrency
  • Language design
  • Gradual typing
  • Domain-specific languages
  • Languages for distributed systems

Andrés Monroy-Hernández, Room 405

  • Research Areas: Human-Computer Interaction, Social Computing, Public-Interest Technology, Augmented Reality, Urban Computing
  • Research interests:developing public-interest socio-technical systems.  We are currently creating alternatives to gig work platforms that are more equitable for all stakeholders. For instance, we are investigating the socio-technical affordances necessary to support a co-op food delivery network owned and managed by workers and restaurants. We are exploring novel system designs that support self-governance, decentralized/federated models, community-centered data ownership, and portable reputation systems.  We have opportunities for students interested in human-centered computing, UI/UX design, full-stack software development, and qualitative/quantitative user research.
  • Beyond our core projects, we are open to working on research projects that explore the use of emerging technologies, such as AR, wearables, NFTs, and DAOs, for creative and out-of-the-box applications.

Christopher Moretti, Corwin Hall, Room 036

  • Research areas: Distributed systems, high-throughput computing, computer science/engineering education
  • Expansion, improvement, and evaluation of open-source distributed computing software.
  • Applications of distributed computing for "big science" (e.g. biometrics, data mining, bioinformatics)
  • Software and best practices for computer science education and study, especially Princeton's 126/217/226 sequence or MOOCs development
  • Sports analytics and/or crowd-sourced computing

Radhika Nagpal, F316 Engineering Quadrangle

  • Research areas: control, robotics and dynamical systems

Karthik Narasimhan, Room 422

  • Research areas: Natural Language Processing, Reinforcement Learning
  • Autonomous agents for text-based games ( https://www.microsoft.com/en-us/research/project/textworld/ )
  • Transfer learning/generalization in NLP
  • Techniques for generating natural language
  • Model-based reinforcement learning

Arvind Narayanan, 308 Sherrerd Hall 

Research Areas: fair machine learning (and AI ethics more broadly), the social impact of algorithmic systems, tech policy

Pedro Paredes, Corwin Hall, Room 041

My primary research work is in Theoretical Computer Science.

 * Research Interest: Spectral Graph theory, Pseudorandomness, Complexity theory, Coding Theory, Quantum Information Theory, Combinatorics.

The IW projects I am interested in advising can be divided into three categories:

 1. Theoretical research

I am open to advise work on research projects in any topic in one of my research areas of interest. A project could also be based on writing a survey given results from a few papers. Students should have a solid background in math (e.g., elementary combinatorics, graph theory, discrete probability, basic algebra/calculus) and theoretical computer science (226 and 240 material, like big-O/Omega/Theta, basic complexity theory, basic fundamental algorithms). Mathematical maturity is a must.

A (non exhaustive) list of topics of projects I'm interested in:   * Explicit constructions of better vertex expanders and/or unique neighbor expanders.   * Construction deterministic or random high dimensional expanders.   * Pseudorandom generators for different problems.   * Topics around the quantum PCP conjecture.   * Topics around quantum error correcting codes and locally testable codes, including constructions, encoding and decoding algorithms.

 2. Theory informed practical implementations of algorithms   Very often the great advances in theoretical research are either not tested in practice or not even feasible to be implemented in practice. Thus, I am interested in any project that consists in trying to make theoretical ideas applicable in practice. This includes coming up with new algorithms that trade some theoretical guarantees for feasible implementation yet trying to retain the soul of the original idea; implementing new algorithms in a suitable programming language; and empirically testing practical implementations and comparing them with benchmarks / theoretical expectations. A project in this area doesn't have to be in my main areas of research, any theoretical result could be suitable for such a project.

Some examples of areas of interest:   * Streaming algorithms.   * Numeric linear algebra.   * Property testing.   * Parallel / Distributed algorithms.   * Online algorithms.    3. Machine learning with a theoretical foundation

I am interested in projects in machine learning that have some mathematical/theoretical, even if most of the project is applied. This includes topics like mathematical optimization, statistical learning, fairness and privacy.

One particular area I have been recently interested in is in the area of rating systems (e.g., Chess elo) and applications of this to experts problems.

Final Note: I am also willing to advise any project with any mathematical/theoretical component, even if it's not the main one; please reach out via email to chat about project ideas.

Iasonas Petras, Corwin Hall, Room 033

  • Research Areas: Information Based Complexity, Numerical Analysis, Quantum Computation.
  • Prerequisites: Reasonable mathematical maturity. In case of a project related to Quantum Computation a certain familiarity with quantum mechanics is required (related courses: ELE 396/PHY 208).
  • Possible research topics include:

1.   Quantum algorithms and circuits:

  • i. Design or simulation quantum circuits implementing quantum algorithms.
  • ii. Design of quantum algorithms solving/approximating continuous problems (such as Eigenvalue problems for Partial Differential Equations).

2.   Information Based Complexity:

  • i. Necessary and sufficient conditions for tractability of Linear and Linear Tensor Product Problems in various settings (for example worst case or average case). 
  • ii. Necessary and sufficient conditions for tractability of Linear and Linear Tensor Product Problems under new tractability and error criteria.
  • iii. Necessary and sufficient conditions for tractability of Weighted problems.
  • iv. Necessary and sufficient conditions for tractability of Weighted Problems under new tractability and error criteria.

3. Topics in Scientific Computation:

  • i. Randomness, Pseudorandomness, MC and QMC methods and their applications (Finance, etc)

Yuri Pritykin, 245 Carl Icahn Lab

  • Research interests: Computational biology; Cancer immunology; Regulation of gene expression; Functional genomics; Single-cell technologies.
  • Potential research projects: Development, implementation, assessment and/or application of algorithms for analysis, integration, interpretation and visualization of multi-dimensional data in molecular biology, particularly single-cell and spatial genomics data.

Benjamin Raphael, Room 309  

  • Research interests: Computational biology and bioinformatics; Cancer genomics; Algorithms and machine learning approaches for analysis of large-scale datasets
  • Implementation and application of algorithms to infer evolutionary processes in cancer
  • Identifying correlations between combinations of genomic mutations in human and cancer genomes
  • Design and implementation of algorithms for genome sequencing from new DNA sequencing technologies
  • Graph clustering and network anomaly detection, particularly using diffusion processes and methods from spectral graph theory

Vikram Ramaswamy, 035 Corwin Hall

  • Research areas: Interpretability of AI systems, Fairness in AI systems, Computer vision.
  • Constructing a new method to explain a model / create an interpretable by design model
  • Analyzing a current model / dataset to understand bias within the model/dataset
  • Proposing new fairness evaluations
  • Proposing new methods to train to improve fairness
  • Developing synthetic datasets for fairness / interpretability benchmarks
  • Understanding robustness of models

Ran Raz, Room 240

  • Research Area: Computational Complexity
  • Independent Research Topics: Computational Complexity, Information Theory, Quantum Computation, Theoretical Computer Science

Szymon Rusinkiewicz, Room 406

  • Research Areas: computer graphics; computer vision; 3D scanning; 3D printing; robotics; documentation and visualization of cultural heritage artifacts
  • Research ways of incorporating rotation invariance into computer visiontasks such as feature matching and classification
  • Investigate approaches to robust 3D scan matching
  • Model and compensate for imperfections in 3D printing
  • Given a collection of small mobile robots, apply control policies learned in simulation to the real robots.

Olga Russakovsky, Room 408

  • Research Areas: computer vision, machine learning, deep learning, crowdsourcing, fairness&bias in AI
  • Design a semantic segmentation deep learning model that can operate in a zero-shot setting (i.e., recognize and segment objects not seen during training)
  • Develop a deep learning classifier that is impervious to protected attributes (such as gender or race) that may be erroneously correlated with target classes
  • Build a computer vision system for the novel task of inferring what object (or part of an object) a human is referring to when pointing to a single pixel in the image. This includes both collecting an appropriate dataset using crowdsourcing on Amazon Mechanical Turk, creating a new deep learning formulation for this task, and running extensive analysis of both the data and the model

Sebastian Seung, Princeton Neuroscience Institute, Room 153

  • Research Areas: computational neuroscience, connectomics, "deep learning" neural networks, social computing, crowdsourcing, citizen science
  • Gamification of neuroscience (EyeWire  2.0)
  • Semantic segmentation and object detection in brain images from microscopy
  • Computational analysis of brain structure and function
  • Neural network theories of brain function

Jaswinder Pal Singh, Room 324

  • Research Areas: Boundary of technology and business/applications; building and scaling technology companies with special focus at that boundary; parallel computing systems and applications: parallel and distributed applications and their implications for software and architectural design; system software and programming environments for multiprocessors.
  • Develop a startup company idea, and build a plan/prototype for it.
  • Explore tradeoffs at the boundary of technology/product and business/applications in a chosen area.
  • Study and develop methods to infer insights from data in different application areas, from science to search to finance to others. 
  • Design and implement a parallel application. Possible areas include graphics, compression, biology, among many others. Analyze performance bottlenecks using existing tools, and compare programming models/languages.
  • Design and implement a scalable distributed algorithm.

Mona Singh, Room 420

  • Research Areas: computational molecular biology, as well as its interface with machine learning and algorithms.
  • Whole and cross-genome methods for predicting protein function and protein-protein interactions.
  • Analysis and prediction of biological networks.
  • Computational methods for inferring specific aspects of protein structure from protein sequence data.
  • Any other interesting project in computational molecular biology.

Robert Tarjan, 194 Nassau St., Room 308

  • Research Areas: Data structures; graph algorithms; combinatorial optimization; computational complexity; computational geometry; parallel algorithms.
  • Implement one or more data structures or combinatorial algorithms to provide insight into their empirical behavior.
  • Design and/or analyze various data structures and combinatorial algorithms.

Olga Troyanskaya, Room 320

  • Research Areas: Bioinformatics; analysis of large-scale biological data sets (genomics, gene expression, proteomics, biological networks); algorithms for integration of data from multiple data sources; visualization of biological data; machine learning methods in bioinformatics.
  • Implement and evaluate one or more gene expression analysis algorithm.
  • Develop algorithms for assessment of performance of genomic analysis methods.
  • Develop, implement, and evaluate visualization tools for heterogeneous biological data.

David Walker, Room 211

  • Research Areas: Programming languages, type systems, compilers, domain-specific languages, software-defined networking and security
  • Independent Research Topics:  Any other interesting project that involves humanitarian hacking, functional programming, domain-specific programming languages, type systems, compilers, software-defined networking, fault tolerance, language-based security, theorem proving, logic or logical frameworks.

Shengyi Wang, Postdoctoral Research Associate, Room 216

Available for Fall 2024 single-semester IW, only

  • Independent Research topics: Explore Escher-style tilings using (introductory) group theory and automata theory to produce beautiful pictures.

Kevin Wayne, Corwin Hall, Room 040

  • Research Areas: design, analysis, and implementation of algorithms; data structures; combinatorial optimization; graphs and networks.
  • Design and implement computer visualizations of algorithms or data structures.
  • Develop pedagogical tools or programming assignments for the computer science curriculum at Princeton and beyond.
  • Develop assessment infrastructure and assessments for MOOCs.

Matt Weinberg, 194 Nassau St., Room 222

  • Research Areas: algorithms, algorithmic game theory, mechanism design, game theoretical problems in {Bitcoin, networking, healthcare}.
  • Theoretical questions related to COS 445 topics such as matching theory, voting theory, auction design, etc. 
  • Theoretical questions related to incentives in applications like Bitcoin, the Internet, health care, etc. In a little bit more detail: protocols for these systems are often designed assuming that users will follow them. But often, users will actually be strictly happier to deviate from the intended protocol. How should we reason about user behavior in these protocols? How should we design protocols in these settings?

Huacheng Yu, Room 310

  • data structures
  • streaming algorithms
  • design and analyze data structures / streaming algorithms
  • prove impossibility results (lower bounds)
  • implement and evaluate data structures / streaming algorithms

Ellen Zhong, Room 314

Opportunities outside the department.

We encourage students to look in to doing interdisciplinary computer science research and to work with professors in departments other than computer science.  However, every CS independent work project must have a strong computer science element (even if it has other scientific or artistic elements as well.)  To do a project with an adviser outside of computer science you must have permission of the department.  This can be accomplished by having a second co-adviser within the computer science department or by contacting the independent work supervisor about the project and having he or she sign the independent work proposal form.

Here is a list of professors outside the computer science department who are eager to work with computer science undergraduates.

Maria Apostolaki, Engineering Quadrangle, C330

  • Research areas: Computing & Networking, Data & Information Science, Security & Privacy

Branko Glisic, Engineering Quadrangle, Room E330

  • Documentation of historic structures
  • Cyber physical systems for structural health monitoring
  • Developing virtual and augmented reality applications for documenting structures
  • Applying machine learning techniques to generate 3D models from 2D plans of buildings
  •  Contact : Rebecca Napolitano, rkn2 (@princeton.edu)

Mihir Kshirsagar, Sherrerd Hall, Room 315

Center for Information Technology Policy.

  • Consumer protection
  • Content regulation
  • Competition law
  • Economic development
  • Surveillance and discrimination

Sharad Malik, Engineering Quadrangle, Room B224

Select a Senior Thesis Adviser for the 2020-21 Academic Year.

  • Design of reliable hardware systems
  • Verifying complex software and hardware systems

Prateek Mittal, Engineering Quadrangle, Room B236

  • Internet security and privacy 
  • Social Networks
  • Privacy technologies, anonymous communication
  • Network Science
  • Internet security and privacy: The insecurity of Internet protocols and services threatens the safety of our critical network infrastructure and billions of end users. How can we defend end users as well as our critical network infrastructure from attacks?
  • Trustworthy social systems: Online social networks (OSNs) such as Facebook, Google+, and Twitter have revolutionized the way our society communicates. How can we leverage social connections between users to design the next generation of communication systems?
  • Privacy Technologies: Privacy on the Internet is eroding rapidly, with businesses and governments mining sensitive user information. How can we protect the privacy of our online communications? The Tor project (https://www.torproject.org/) is a potential application of interest.

Ken Norman,  Psychology Dept, PNI 137

  • Research Areas: Memory, the brain and computation 
  • Lab:  Princeton Computational Memory Lab

Potential research topics

  • Methods for decoding cognitive state information from neuroimaging data (fMRI and EEG) 
  • Neural network simulations of learning and memory

Caroline Savage

Office of Sustainability, Phone:(609)258-7513, Email: cs35 (@princeton.edu)

The  Campus as Lab  program supports students using the Princeton campus as a living laboratory to solve sustainability challenges. The Office of Sustainability has created a list of campus as lab research questions, filterable by discipline and topic, on its  website .

An example from Computer Science could include using  TigerEnergy , a platform which provides real-time data on campus energy generation and consumption, to study one of the many energy systems or buildings on campus. Three CS students used TigerEnergy to create a  live energy heatmap of campus .

Other potential projects include:

  • Apply game theory to sustainability challenges
  • Develop a tool to help visualize interactions between complex campus systems, e.g. energy and water use, transportation and storm water runoff, purchasing and waste, etc.
  • How can we learn (in aggregate) about individuals’ waste, energy, transportation, and other behaviors without impinging on privacy?

Janet Vertesi, Sociology Dept, Wallace Hall, Room 122

  • Research areas: Sociology of technology; Human-computer interaction; Ubiquitous computing.
  • Possible projects: At the intersection of computer science and social science, my students have built mixed reality games, produced artistic and interactive installations, and studied mixed human-robot teams, among other projects.

David Wentzlaff, Engineering Quadrangle, Room 228

Computing, Operating Systems, Sustainable Computing.

  • Instrument Princeton's Green (HPCRC) data center
  • Investigate power utilization on an processor core implemented in an FPGA
  • Dismantle and document all of the components in modern electronics. Invent new ways to build computers that can be recycled easier.
  • Other topics in parallel computer architecture or operating systems

Facebook

Digital Commons @ University of South Florida

  • USF Research
  • USF Libraries

Digital Commons @ USF > College of Engineering > Computer Science and Engineering > Theses and Dissertations

Computer Science and Engineering Theses and Dissertations

Theses/dissertations from 2023 2023.

Refining the Machine Learning Pipeline for US-based Public Transit Systems , Jennifer Adorno

Insect Classification and Explainability from Image Data via Deep Learning Techniques , Tanvir Hossain Bhuiyan

Brain-Inspired Spatio-Temporal Learning with Application to Robotics , Thiago André Ferreira Medeiros

Evaluating Methods for Improving DNN Robustness Against Adversarial Attacks , Laureano Griffin

Analyzing Multi-Robot Leader-Follower Formations in Obstacle-Laden Environments , Zachary J. Hinnen

Secure Lightweight Cryptographic Hardware Constructions for Deeply Embedded Systems , Jasmin Kaur

A Psychometric Analysis of Natural Language Inference Using Transformer Language Models , Antonio Laverghetta Jr.

Graph Analysis on Social Networks , Shen Lu

Deep Learning-based Automatic Stereology for High- and Low-magnification Images , Hunter Morera

Deciphering Trends and Tactics: Data-driven Techniques for Forecasting Information Spread and Detecting Coordinated Campaigns in Social Media , Kin Wai Ng Lugo

Automated Approaches to Enable Innovative Civic Applications from Citizen Generated Imagery , Hye Seon Yi

Theses/Dissertations from 2022 2022

Towards High Performing and Reliable Deep Convolutional Neural Network Models for Typically Limited Medical Imaging Datasets , Kaoutar Ben Ahmed

Task Progress Assessment and Monitoring Using Self-Supervised Learning , Sainath Reddy Bobbala

Towards More Task-Generalized and Explainable AI Through Psychometrics , Alec Braynen

A Multiple Input Multiple Output Framework for the Automatic Optical Fractionator-based Cell Counting in Z-Stacks Using Deep Learning , Palak Dave

On the Reliability of Wearable Sensors for Assessing Movement Disorder-Related Gait Quality and Imbalance: A Case Study of Multiple Sclerosis , Steven Díaz Hernández

Securing Critical Cyber Infrastructures and Functionalities via Machine Learning Empowered Strategies , Tao Hou

Social Media Time Series Forecasting and User-Level Activity Prediction with Gradient Boosting, Deep Learning, and Data Augmentation , Fred Mubang

A Study of Deep Learning Silhouette Extractors for Gait Recognition , Sneha Oladhri

Analyzing Decision-making in Robot Soccer for Attacking Behaviors , Justin Rodney

Generative Spatio-Temporal and Multimodal Analysis of Neonatal Pain , Md Sirajus Salekin

Secure Hardware Constructions for Fault Detection of Lattice-based Post-quantum Cryptosystems , Ausmita Sarker

Adaptive Multi-scale Place Cell Representations and Replay for Spatial Navigation and Learning in Autonomous Robots , Pablo Scleidorovich

Predicting the Number of Objects in a Robotic Grasp , Utkarsh Tamrakar

Humanoid Robot Motion Control for Ramps and Stairs , Tommy Truong

Preventing Variadic Function Attacks Through Argument Width Counting , Brennan Ward

Theses/Dissertations from 2021 2021

Knowledge Extraction and Inference Based on Visual Understanding of Cooking Contents , Ahmad Babaeian Babaeian Jelodar

Efficient Post-Quantum and Compact Cryptographic Constructions for the Internet of Things , Rouzbeh Behnia

Efficient Hardware Constructions for Error Detection of Post-Quantum Cryptographic Schemes , Alvaro Cintas Canto

Using Hyper-Dimensional Spanning Trees to Improve Structure Preservation During Dimensionality Reduction , Curtis Thomas Davis

Design, Deployment, and Validation of Computer Vision Techniques for Societal Scale Applications , Arup Kanti Dey

AffectiveTDA: Using Topological Data Analysis to Improve Analysis and Explainability in Affective Computing , Hamza Elhamdadi

Automatic Detection of Vehicles in Satellite Images for Economic Monitoring , Cole Hill

Analysis of Contextual Emotions Using Multimodal Data , Saurabh Hinduja

Data-driven Studies on Social Networks: Privacy and Simulation , Yasanka Sameera Horawalavithana

Automated Identification of Stages in Gonotrophic Cycle of Mosquitoes Using Computer Vision Techniques , Sherzod Kariev

Exploring the Use of Neural Transformers for Psycholinguistics , Antonio Laverghetta Jr.

Secure VLSI Hardware Design Against Intellectual Property (IP) Theft and Cryptographic Vulnerabilities , Matthew Dean Lewandowski

Turkic Interlingua: A Case Study of Machine Translation in Low-resource Languages , Jamshidbek Mirzakhalov

Automated Wound Segmentation and Dimension Measurement Using RGB-D Image , Chih-Yun Pai

Constructing Frameworks for Task-Optimized Visualizations , Ghulam Jilani Abdul Rahim Quadri

Trilateration-Based Localization in Known Environments with Object Detection , Valeria M. Salas Pacheco

Recognizing Patterns from Vital Signs Using Spectrograms , Sidharth Srivatsav Sribhashyam

Recognizing Emotion in the Wild Using Multimodal Data , Shivam Srivastava

A Modular Framework for Multi-Rotor Unmanned Aerial Vehicles for Military Operations , Dante Tezza

Human-centered Cybersecurity Research — Anthropological Findings from Two Longitudinal Studies , Anwesh Tuladhar

Learning State-Dependent Sensor Measurement Models To Improve Robot Localization Accuracy , Troi André Williams

Human-centric Cybersecurity Research: From Trapping the Bad Guys to Helping the Good Ones , Armin Ziaie Tabari

Theses/Dissertations from 2020 2020

Classifying Emotions with EEG and Peripheral Physiological Data Using 1D Convolutional Long Short-Term Memory Neural Network , Rupal Agarwal

Keyless Anti-Jamming Communication via Randomized DSSS , Ahmad Alagil

Active Deep Learning Method to Automate Unbiased Stereology Cell Counting , Saeed Alahmari

Composition of Atomic-Obligation Security Policies , Yan Cao Albright

Action Recognition Using the Motion Taxonomy , Maxat Alibayev

Sentiment Analysis in Peer Review , Zachariah J. Beasley

Spatial Heterogeneity Utilization in CT Images for Lung Nodule Classication , Dmitrii Cherezov

Feature Selection Via Random Subsets Of Uncorrelated Features , Long Kim Dang

Unifying Security Policy Enforcement: Theory and Practice , Shamaria Engram

PsiDB: A Framework for Batched Query Processing and Optimization , Mehrad Eslami

Composition of Atomic-Obligation Security Policies , Danielle Ferguson

Algorithms To Profile Driver Behavior From Zero-permission Embedded Sensors , Bharti Goel

The Efficiency and Accuracy of YOLO for Neonate Face Detection in the Clinical Setting , Jacqueline Hausmann

Beyond the Hype: Challenges of Neural Networks as Applied to Social Networks , Anthony Hernandez

Privacy-Preserving and Functional Information Systems , Thang Hoang

Managing Off-Grid Power Use for Solar Fueled Residences with Smart Appliances, Prices-to-Devices and IoT , Donnelle L. January

Novel Bit-Sliced In-Memory Computing Based VLSI Architecture for Fast Sobel Edge Detection in IoT Edge Devices , Rajeev Joshi

Edge Computing for Deep Learning-Based Distributed Real-time Object Detection on IoT Constrained Platforms at Low Frame Rate , Lakshmikavya Kalyanam

Establishing Topological Data Analysis: A Comparison of Visualization Techniques , Tanmay J. Kotha

Machine Learning for the Internet of Things: Applications, Implementation, and Security , Vishalini Laguduva Ramnath

System Support of Concurrent Database Query Processing on a GPU , Hao Li

Deep Learning Predictive Modeling with Data Challenges (Small, Big, or Imbalanced) , Renhao Liu

Countermeasures Against Various Network Attacks Using Machine Learning Methods , Yi Li

Towards Safe Power Oversubscription and Energy Efficiency of Data Centers , Sulav Malla

Design of Support Measures for Counting Frequent Patterns in Graphs , Jinghan Meng

Automating the Classification of Mosquito Specimens Using Image Processing Techniques , Mona Minakshi

Models of Secure Software Enforcement and Development , Hernan M. Palombo

Functional Object-Oriented Network: A Knowledge Representation for Service Robotics , David Andrés Paulius Ramos

Lung Nodule Malignancy Prediction from Computed Tomography Images Using Deep Learning , Rahul Paul

Algorithms and Framework for Computing 2-body Statistics on Graphics Processing Units , Napath Pitaksirianan

Efficient Viewshed Computation Algorithms On GPUs and CPUs , Faisal F. Qarah

Relational Joins on GPUs for In-Memory Database Query Processing , Ran Rui

Micro-architectural Countermeasures for Control Flow and Misspeculation Based Software Attacks , Love Kumar Sah

Efficient Forward-Secure and Compact Signatures for the Internet of Things (IoT) , Efe Ulas Akay Seyitoglu

Detecting Symptoms of Chronic Obstructive Pulmonary Disease and Congestive Heart Failure via Cough and Wheezing Sounds Using Smart-Phones and Machine Learning , Anthony Windmon

Toward Culturally Relevant Emotion Detection Using Physiological Signals , Khadija Zanna

Theses/Dissertations from 2019 2019

Beyond Labels and Captions: Contextualizing Grounded Semantics for Explainable Visual Interpretation , Sathyanarayanan Narasimhan Aakur

Empirical Analysis of a Cybersecurity Scoring System , Jaleel Ahmed

Phenomena of Social Dynamics in Online Games , Essa Alhazmi

A Machine Learning Approach to Predicting Community Engagement on Social Media During Disasters , Adel Alshehri

Interactive Fitness Domains in Competitive Coevolutionary Algorithm , ATM Golam Bari

Measuring Influence Across Social Media Platforms: Empirical Analysis Using Symbolic Transfer Entropy , Abhishek Bhattacharjee

A Communication-Centric Framework for Post-Silicon System-on-chip Integration Debug , Yuting Cao

Authentication and SQL-Injection Prevention Techniques in Web Applications , Cagri Cetin

Multimodal Emotion Recognition Using 3D Facial Landmarks, Action Units, and Physiological Data , Diego Fabiano

Robotic Motion Generation by Using Spatial-Temporal Patterns from Human Demonstrations , Yongqiang Huang

A GPU-Based Framework for Parallel Spatial Indexing and Query Processing , Zhila Nouri Lewis

A Flexible, Natural Deduction, Automated Reasoner for Quick Deployment of Non-Classical Logic , Trisha Mukhopadhyay

An Efficient Run-time CFI Check for Embedded Processors to Detect and Prevent Control Flow Based Attacks , Srivarsha Polnati

Force Feedback and Intelligent Workspace Selection for Legged Locomotion Over Uneven Terrain , John Rippetoe

Detecting Digitally Forged Faces in Online Videos , Neilesh Sambhu

Malicious Manipulation in Service-Oriented Network, Software, and Mobile Systems: Threats and Defenses , Dakun Shen

Advanced Search

  • Email Notifications and RSS
  • All Collections
  • USF Faculty Publications
  • Open Access Journals
  • Conferences and Events
  • Theses and Dissertations
  • Textbooks Collection

Useful Links

  • Rights Information
  • SelectedWorks
  • Submit Research

Home | About | Help | My Account | Accessibility Statement | Language and Diversity Statements

Privacy Copyright

ScholarWorks@UMass Amherst

Home > CICS > CS > CS_DISS

Computer Science

Computer Science Department Dissertations Collection

Dissertations from 2024 2024.

Enabling Privacy and Trust in Edge AI Systems , Akanksha Atrey, Computer Science

Generative Language Models for Personalized Information Understanding , Pengshan Cai, Computer Science

Towards Automatic and Robust Variational Inference , Tomas Geffner, Computer Science

Multi-SLAM Systems for Fault-Tolerant Simultaneous Localization and Mapping , Samer Nashed, Computer Science

Policy Gradient Methods: Analysis, Misconceptions, and Improvements , Christopher P. Nota, Computer Science

Data to science with AI and human-in-the-loop , Gustavo Perez Sarabia, Computer Science

Question Answering By Case-Based Reasoning With Textual Evidence , Dung N. Thai, Computer Science

Dissertations from 2023 2023

An Introspective Approach for Competence-Aware Autonomy , Connor Basich, Computer Science

Foundations of Node Representation Learning , Sudhanshu Chanpuriya, Computer Science

Learning to See with Minimal Human Supervision , Zezhou Cheng, Computer Science

IMPROVING USER EXPERIENCE BY OPTIMIZING CLOUD SERVICES , Ishita Dasgupta, Computer Science

Automating the Formal Verification of Software , Emily First, Computer Science

Learning from Sequential User Data: Models and Sample-efficient Algorithms , Aritra Ghosh, Computer Science

Human-Centered Technologies for Inclusive Collection and Analysis of Public-Generated Data , Mahmood Jasim, Computer Science

Rigorous Experimentation For Reinforcement Learning , Scott M. Jordan, Computer Science

Towards Robust Long-form Text Generation Systems , Kalpesh Krishna, Computer Science

Emerging Trustworthiness Issues in Distributed Learning Systems , Hamid Mozaffari, Computer Science

TOWARDS RELIABLE CIRCUMVENTION OF INTERNET CENSORSHIP , Milad nasresfahani, Computer Science

Evidence Assisted Learning for Clinical Decision Support Systems , Bhanu Pratap Singh Rawat, Computer Science

DESIGN AND ANALYSIS OF CONTENT CACHING SYSTEMS , Anirudh Sabnis, Computer Science

Quantifying and Enhancing the Security of Federated Learning , Virat Vishnu Shejwalkar, Computer Science

Effective and Efficient Transfer Learning in the Era of Large Language Models , Tu Vu, Computer Science

Data-driven Modeling and Analytics for Greening the Energy Ecosystem , John Wamburu, Computer Science

Bayesian Structural Causal Inference with Probabilistic Programming , Sam A. Witty, Computer Science

LEARNING TO RIG CHARACTERS , Zhan Xu, Computer Science

GRAPH REPRESENTATION LEARNING WITH BOX EMBEDDINGS , Dongxu Zhang, Computer Science

Dissertations from 2022 2022

COMBINATORIAL ALGORITHMS FOR GRAPH DISCOVERY AND EXPERIMENTAL DESIGN , Raghavendra K. Addanki, Computer Science

MEASURING NETWORK INTERFERENCE AND MITIGATING IT WITH DNS ENCRYPTION , Seyed Arian Akhavan Niaki, Computer Science

Few-Shot Natural Language Processing by Meta-Learning Without Labeled Data , Trapit Bansal, Computer Science

Communicative Information Visualizations: How to make data more understandable by the general public , Alyxander Burns, Computer Science

REINFORCEMENT LEARNING FOR NON-STATIONARY PROBLEMS , Yash Chandak, Computer Science

Modeling the Multi-mode Distribution in Self-Supervised Language Models , Haw-Shiuan Chang, Computer Science

Nonparametric Contextual Reasoning for Question Answering over Large Knowledge Bases , Rajarshi Das, Computer Science

Languages and Compilers for Writing Efficient High-Performance Computing Applications , Abhinav Jangda, Computer Science

Controllable Neural Synthesis for Natural Images and Vector Art , Difan Liu, Computer Science

Probabilistic Commonsense Knowledge , Xiang Li, Computer Science

DISTRIBUTED LEARNING ALGORITHMS: COMMUNICATION EFFICIENCY AND ERROR RESILIENCE , Raj Kumar Maity, Computer Science

Practical Methods for High-Dimensional Data Publication with Differential Privacy , Ryan H. McKenna, Computer Science

Incremental Non-Greedy Clustering at Scale , Nicholas Monath, Computer Science

High-Quality Automatic Program Repair , Manish Motwani, Computer Science

Unobtrusive Assessment of Upper-Limb Motor Impairment Using Wearable Inertial Sensors , Brandon R. Oubre, Computer Science

Mixture Models in Machine Learning , Soumyabrata Pal, Computer Science

Decision Making with Limited Data , Kieu My Phan, Computer Science

Neural Approaches for Language-Agnostic Search and Recommendation , Hamed Rezanejad Asl Bonab, Computer Science

Low Resource Language Understanding in Voice Assistants , Subendhu Rongali, Computer Science

Enabling Daily Tracking of Individual’s Cognitive State With Eyewear , Soha Rostaminia, Computer Science

LABELED MODULES IN PROGRAMS THAT EVOLVE , Anil K. Saini, Computer Science

Reliable Decision-Making with Imprecise Models , Sandhya Saisubramanian, Computer Science

Data Scarcity in Event Analysis and Abusive Language Detection , Sheikh Muhammad Sarwar, Computer Science

Representation Learning for Shape Decomposition, By Shape Decomposition , Gopal Sharma, Computer Science

Metareasoning for Planning and Execution in Autonomous Systems , Justin Svegliato, Computer Science

Approximate Bayesian Deep Learning for Resource-Constrained Environments , Meet Prakash Vadera, Computer Science

ANSWER SIMILARITY GROUPING AND DIVERSIFICATION IN QUESTION ANSWERING SYSTEMS , Lakshmi Nair Vikraman, Computer Science

Dissertations from 2021 2021

Neural Approaches to Feedback in Information Retrieval , Keping Bi, Computer Science

Sociolinguistically Driven Approaches for Just Natural Language Processing , Su Lin Blodgett, Computer Science

Enabling Declarative and Scalable Prescriptive Analytics in Relational Data , Matteo Brucato, Computer Science

Neural Methods for Answer Passage Retrieval over Sparse Collections , Daniel Cohen, Computer Science

Utilizing Graph Structure for Machine Learning , Stefan Dernbach, Computer Science

Enhancing Usability and Explainability of Data Systems , Anna Fariha, Computer Science

Algorithms to Exploit Data Sparsity , Larkin H. Flodin, Computer Science

3D Shape Understanding and Generation , Matheus Gadelha, Computer Science

Robust Algorithms for Clustering with Applications to Data Integration , Sainyam Galhotra, Computer Science

Improving Evaluation Methods for Causal Modeling , Amanda Gentzel, Computer Science

SAFE AND PRACTICAL MACHINE LEARNING , Stephen J. Giguere, Computer Science

COMPACT REPRESENTATIONS OF UNCERTAINTY IN CLUSTERING , Craig Stuart Greenberg, Computer Science

Natural Language Processing for Lexical Corpus Analysis , Abram Kaufman Handler, Computer Science

Social Measurement and Causal Inference with Text , Katherine A. Keith, Computer Science

Concentration Inequalities in the Wild: Case Studies in Blockchain & Reinforcement Learning , A. Pinar Ozisik, Computer Science

Resource Allocation in Distributed Service Networks , Nitish Kumar Panigrahy, Computer Science

History Modeling for Conversational Information Retrieval , Chen Qu, Computer Science

Design and Implementation of Algorithms for Traffic Classification , Fatemeh Rezaei, Computer Science

SCALING DOWN THE ENERGY COST OF CONNECTING EVERYDAY OBJECTS TO THE INTERNET , Mohammad Rostami, Computer Science

Deep Learning Models for Irregularly Sampled and Incomplete Time Series , Satya Narayan Shukla, Computer Science

Traffic engineering in planet-scale cloud networks , Rachee Singh, Computer Science

Video Adaptation for High-Quality Content Delivery , Kevin Spiteri, Computer Science

Learning from Limited Labeled Data for Visual Recognition , Jong-Chyi Su, Computer Science

Human Mobility Monitoring using WiFi: Analysis, Modeling, and Applications , Amee Trivedi, Computer Science

Geometric Representation Learning , Luke Vilnis, Computer Science

Understanding of Visual Domains via the Lens of Natural Language , Chenyun Wu, Computer Science

Towards Practical Differentially Private Mechanism Design and Deployment , Dan Zhang, Computer Science

Audio-driven Character Animation , Yang Zhou, Computer Science

Dissertations from 2020 2020

Noise-Aware Inference for Differential Privacy , Garrett Bernstein, Computer Science

Motion Segmentation - Segmentation of Independently Moving Objects in Video , Pia Katalin Bideau, Computer Science

An Empirical Assessment of the Effectiveness of Deception for Cyber Defense , Kimberly J. Ferguson-Walter, Computer Science

Integrating Recognition and Decision Making to Close the Interaction Loop for Autonomous Systems , Richard Freedman, Computer Science

Improving Reinforcement Learning Techniques by Leveraging Prior Experience , Francisco M. Garcia, Computer Science

Optimization and Training of Generational Garbage Collectors , Nicholas Jacek, Computer Science

Understanding the Dynamic Visual World: From Motion to Semantics , Huaizu Jiang, Computer Science

Improving Face Clustering in Videos , SouYoung Jin, Computer Science

Reasoning About User Feedback Under Identity Uncertainty in Knowledge Base Construction , Ariel Kobren, Computer Science

Learning Latent Characteristics of Data and Models using Item Response Theory , John P. Lalor, Computer Science

Higher-Order Representations for Visual Recognition , Tsung-Yu Lin, Computer Science

Learning from Irregularly-Sampled Time Series , Steven Cheng-Xian Li, Computer Science

Dynamic Composition of Functions for Modular Learning , Clemens GB Rosenbaum, Computer Science

Improving Visual Recognition With Unlabeled Data , Aruni Roy Chowdhury, Computer Science

Deep Neural Networks for 3D Processing and High-Dimensional Filtering , Hang Su, Computer Science

Towards Optimized Traffic Provisioning and Adaptive Cache Management for Content Delivery , Aditya Sundarrajan, Computer Science

The Limits of Location Privacy in Mobile Devices , Keen Yuun Sung, Computer Science

ALGORITHMS FOR MASSIVE, EXPENSIVE, OR OTHERWISE INCONVENIENT GRAPHS , David Tench, Computer Science

System Design for Digital Experimentation and Explanation Generation , Emma Tosch, Computer Science

Advanced Search

  • Notify me via email or RSS
  • Collections
  • Disciplines

Author Corner

  • Login for Faculty Authors
  • Faculty Author Gallery
  • Expert Gallery
  • University Libraries
  • Computer Science Website
  • UMass Amherst

This page is sponsored by the University Libraries.

© 2009 University of Massachusetts Amherst • Site Policies

Privacy Copyright

  • Bibliography
  • More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
  • Automated transliteration
  • Relevant bibliographies by topics
  • Referencing guides

Dissertations / Theses on the topic 'Computer vision;Machine learning'

Create a spot-on reference in apa, mla, chicago, harvard, and other styles.

Consult the top 50 dissertations / theses for your research on the topic 'Computer vision;Machine learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

Chen, Zhilu. "Computer Vision and Machine Learning for Autonomous Vehicles." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-dissertations/488.

Öberg, Filip. "Football analysis using machine learning and computer vision." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85276.

Yang, Chen. "Machine Learning and Computer Vision for PCB Verification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290370.

Arteta, Carlos Federico. "Computer vision and machine learning for microscopy image analysis." Thesis, University of Oxford, 2015. https://ora.ox.ac.uk/objects/uuid:62a03c45-2616-49a4-8976-fb1ff481915f.

Landecker, Will. "Interpretable Machine Learning and Sparse Coding for Computer Vision." PDXScholar, 2014. https://pdxscholar.library.pdx.edu/open_access_etds/1937.

Burns, James Ian. "Agricultural Crop Monitoring with Computer Vision." Thesis, Virginia Tech, 2014. http://hdl.handle.net/10919/52563.

Mairal, Julien. "Sparse coding for machine learning, image processing and computer vision." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2010. http://tel.archives-ouvertes.fr/tel-00595312.

Jonsson, Erik. "Channel-Coded Feature Maps for Computer Vision and Machine Learning." Doctoral thesis, Linköping : Department of Electrical Engineering, Linköpings universitet, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11040.

Kendall, Alex Guy. "Geometry and uncertainty in deep learning for computer vision." Thesis, University of Cambridge, 2019. https://www.repository.cam.ac.uk/handle/1810/287944.

Borngrund, Carl. "Machine vision for automation of earth-moving machines : Transfer learning experiments with YOLOv3." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-75169.

La, Alex W. "Eigenblades: Application of Computer Vision and Machine Learning for Mode Shape Identification." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/7228.

Yu, Haiyue. "Quantitative analysis of TMA images using computer vision and machine learning approaches." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:e4cc472e-8c01-4121-b044-f3b4b19a8742.

Kulkarni, Amruta Kiran. "Classification of Faults in Railway Ties Using Computer Vision and Machine Learning." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/86522.

Lan, Xiangyuan. "Multi-cue visual tracking: feature learning and fusion." HKBU Institutional Repository, 2016. https://repository.hkbu.edu.hk/etd_oa/319.

Cao, Qiong. "Some topics on similarity metric learning." Thesis, University of Exeter, 2015. http://hdl.handle.net/10871/18662.

Miller, Erik G. (Erik Gundersen). "Learning from one example in machine vision by sharing probability densities." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/29902.

Kulkarni, Sanjeev R. (Sanjeev Ramesh). "Problems of computational and informational complexity in machine vision and learning." Thesis, Massachusetts Institute of Technology, 1991. http://hdl.handle.net/1721.1/13878.

Billings, Rachel Mae. "On Efficient Computer Vision Applications for Neural Networks." Thesis, Virginia Tech, 2021. http://hdl.handle.net/10919/102957.

Stoddart, Evan. "Computer Vision Techniques for Automotive Perception Systems." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555357244145006.

Thomure, Michael David. "The Role of Prototype Learning in Hierarchical Models of Vision." PDXScholar, 2014. https://pdxscholar.library.pdx.edu/open_access_etds/1665.

Papaioannou, Athanasios. "Component analysis of complex-valued data for machine learning and computer vision tasks." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/49235.

Niemi, Mikael. "Machine Learning for Rapid Image Classification." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-97375.

Beale, Dan. "Autonomous visual learning for robotic systems." Thesis, University of Bath, 2012. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.558886.

Cleland, Andrew Lewis. "Bounding Box Improvement with Reinforcement Learning." PDXScholar, 2018. https://pdxscholar.library.pdx.edu/open_access_etds/4438.

Mohan, Vandana. "Computer vision and machine learning methods for the analysis of brain and cardiac imagery." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/39628.

Kaloskampis, Ioannis. "Recognition of complex human activities in multimedia streams using machine learning and computer vision." Thesis, Cardiff University, 2013. http://orca.cf.ac.uk/59377/.

Harekoppa, Pooja Puttaswamygowda. "Application of Computer Vision Techniques for Railroad Inspection using UAVs." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/72273.

Arthur, Richard B. "Vision-Based Human Directed Robot Guidance." Diss., CLICK HERE for online access, 2004. http://contentdm.lib.byu.edu/ETD/image/etd564.pdf.

Serafini, Sara. "Machine Learning applied to OCR tasks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Annavarjula, Vaishnavi. "Computer-Vision Based Retinal Image Analysis for Diagnosis and Treatment." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14979.

Ding, Lei. "From Pixels to People: Graph Based Methods for Grouping Problems in Computer Vision." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1289845859.

Nichols, Scott A. "Improvement of the camera calibration through the use of machine learning techniques." [Gainesville, Fla.] : University of Florida, 2001. http://etd.fcla.edu/etd/uf/2001/anp1587/nichols%5Fthesis.pdf.

Kuang, Zhanghui, and 旷章辉. "Learning structural SVMs and its applications in computer vision." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/206663.

Agarwal, Ankur. "Machine Learning for Image Based Motion Capture." Phd thesis, Grenoble INPG, 2006. http://tel.archives-ouvertes.fr/tel-00390301.

Erlandsson, Niklas. "Utilizing machine learning in wildlife camera traps for automatic classification of animal species : An application of machine learning on edge devices." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-104952.

Stewart, Kendall Lee. "The Performance of Random Prototypes in Hierarchical Models of Vision." Thesis, Portland State University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1605894.

I investigate properties of HMAX, a computational model of hierarchical processing in the primate visual cortex. High-level cortical neurons have been shown to respond highly to particular natural shapes, such as faces. HMAX models this property with a dictionary of natural shapes, called prototypes, that respond to the presence of those shapes. The resulting set of similarity measurements is an effective descriptor for classifying images. Curiously, prior work has shown that replacing the dictionary of natural shapes with entirely random prototypes has little impact on classification performance. This work explores that phenomenon by studying the performance of random prototypes on natural scenes, and by comparing their performance to that of sparse random projections of low-level image features.

Pradeep, Subramanian. "A Computer-Aided Framework for Cell Phenotype Identification, Analysis and Classification." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/78859.

Komodakis, Nikos. "Graphical Model Inference and Learning for Visual Computing." Habilitation à diriger des recherches, Université Paris-Est, 2013. http://tel.archives-ouvertes.fr/tel-00866078.

Wallenberg, Marcus. "Embodied Visual Object Recognition." Doctoral thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-132762.

Zhu, Yuehan. "Automated Supply-Chain Quality Inspection Using Image Analysis and Machine Learning." Thesis, Högskolan Kristianstad, Fakulteten för naturvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-20069.

Bartoli, Giacomo. "Edge AI: Deep Learning techniques for Computer Vision applied to embedded systems." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16820/.

Almin, Fredrik. "Detection of Non-Ferrous Materials with Computer Vision." Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-175519.

Minakshi, Mona. "A Machine Learning Framework to Classify Mosquito Species from Smart-phone Images." Scholar Commons, 2018. https://scholarcommons.usf.edu/etd/7340.

Tydén, Amanda, and Sara Olsson. "Edge Machine Learning for Animal Detection, Classification, and Tracking." Thesis, Linköpings universitet, Reglerteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166572.

Han, Ji Wan. "A multi-level machine learning system for attention-based object recognition." Thesis, University of Hertfordshire, 2011. http://hdl.handle.net/2299/6377.

Jan, Asim. "Deep learning based facial expression recognition and its applications." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/15944.

Bohg, Jeannette. "Multi-Modal Scene Understanding for Robotic Grasping." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-49062.

QC 20111125

Häggström, Frida. "/Maybe/Probably/Certainly." Thesis, Konstfack, Grafisk design & illustration, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:konstfack:diva-7400.

Mohapatra, Akrit. "Natural Language Driven Image Edits using a Semantic Image Manipulation Language." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/83452.

Cogswell, Michael Andrew. "Understanding Representations and Reducing their Redundancy in Deep Networks." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/78167.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Computer vision

Computer vision is an interdisciplinary field that deals with how computers can be made to gain high-level understanding of digital images and videos.

Here are 26,796 public repositories matching this topic...

Opencv / opencv.

Open Source Computer Vision Library

  • Updated May 9, 2024

Developer-Y / cs-video-courses

List of Computer Science courses with video lectures.

d2l-ai / d2l-zh

《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

  • Updated May 8, 2024

microsoft / AI-For-Beginners

12 Weeks, 24 Lessons, AI for All!

  • Updated Apr 21, 2024
  • Jupyter Notebook

CMU-Perceptual-Computing-Lab / openpose

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

  • Updated Apr 15, 2024

eugeneyan / applied-ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

  • Updated May 2, 2024

google / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

  • Updated May 10, 2024

junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch

d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  • Updated Apr 22, 2024

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

  • Updated May 3, 2024

spmallick / learnopencv

Learn OpenCV : C++ and Python Examples

  • Updated May 7, 2024

huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

ShusenTang / Dive-into-DL-PyTorch

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。

  • Updated Oct 14, 2021

ashishpatel26 / 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

500 AI Machine learning Deep learning Computer vision NLP Projects with code

  • Updated Apr 10, 2024

HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

amusi / CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

  • Updated Apr 23, 2024

microsoft / AirSim

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

  • Updated Apr 11, 2024

pytorch / vision

Datasets, Transforms and Models specific to Computer Vision

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

  • Updated Apr 18, 2024

Related Topics

IMAGES

  1. Computer Science Dissertation Ideas for Master’s and UG Students

    computer vision dissertation topics

  2. (PDF) A Study on Computer Vision

    computer vision dissertation topics

  3. Computer Science Dissertation Projects (Code Development)

    computer vision dissertation topics

  4. 141 Impressive Computer Science Dissertation Ideas To Use

    computer vision dissertation topics

  5. List of Best Computer Science Dissertation Topics and Ideas

    computer vision dissertation topics

  6. Top 7 Computer Vision books to master your learning

    computer vision dissertation topics

VIDEO

  1. Top 3 #Dissertation topics of 2024

  2. Top 5 Computer Vision Interview Questions (Data Science)

  3. 5 Trending Project / Dissertation Ideas for Bsc & Msc Microbiology Students

  4. Transportation Dissertation Topics

  5. Variational Methods for Computer Vision

  6. UMT thesis display 2023 / Students uniqe ides / Inspiration work #fashiondesign #fashion

COMMENTS

  1. Research Topics

    Research Topics. Biomedical Imaging. The current plethora of imaging technologies such as magnetic resonance imaging (MR), computed tomography (CT), position emission tomography (PET), optical coherence tomography (OCT), and ultrasound provide great insight into the different anatomical and functional processes of the human body. Computer Vision.

  2. Theses

    A list of completed theses and new thesis topics from the Computer Vision Group. Are you about to start a BSc or MSc thesis? Please read our instructions for preparing and delivering your work. PhD Theses Master Theses Bachelor Theses Thesis Topics. Novel Techniques for Robust and Generalizable Machine Learning. PDF Abstract.

  3. Computer vision Research Topics Ideas

    List of Computer vision Research Topics Ideas for MS and PH.D. 1. Deep learning-enabled medical computer vision - Research questions. 2. Deep learning, computer vision, and entomology - Research questions. 3. Exploring human-nature interactions in national parks with social media photographs and computer vision. 4.

  4. Research Topics of the Computer Vision & Graphics Group

    Our research combines computer vision, computer graphics, and machine learning to understand images and video data. In our research, we focus on the combination of deep learning with strong models or physical constraints in order to combine the advantages of model-based and data-driven methods. Read more about current research projects in this ...

  5. Deep learning in computer vision: A critical review of emerging

    The features of big data could be captured by DL automatically and efficiently. The current applications of DL include computer vision (CV), natural language processing (NLP), video/speech recognition (V/SP), and finance and banking (F&B). Chai and Li (2019) provided a survey of DL on NLP and the advances on V/SP. The survey emphasized the ...

  6. Computer Vision

    Computer Vision. 4631 benchmarks • 1426 tasks • 2993 datasets • 47139 papers with code Semantic Segmentation Semantic Segmentation. 305 benchmarks 5243 papers with code Tumor Segmentation. 3 benchmarks ...

  7. 10 Cutting Edge Research Papers In Computer Vision & Image ...

    Ever since convolutional neural networks began outperforming humans in specific image recognition tasks, research in the field of computer vision has proceeded at breakneck pace. The basic architecture of CNNs (or ConvNets) was developed in the 1980s. Yann LeCun improved upon the original design in 1989 by using backpropagation to train models ...

  8. Concentrations of AI-related Topics in Research: Computer Vision

    After defining AI-related research topics across all of science in Defining Computer Vision, Natural Language Processing, and Robotics Research Clusters, here we explore the 1,105 research clusters that are labeled as computer vision RCs (as of February 2021). 1 RCs are assigned a CV label if they have at least 25 percent AI-related papers and 25 percent CV-related papers, with CV being the ...

  9. (PDF) Deep learning for computer vision: a comparison between

    The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent ...

  10. Learning to solve problems in computer vision with synthetic data

    This thesis considers the use of synthetic data to allow the use of DNN to solve problems in computer vision. First, we consider using synthetic data for problems where collection of real data is not feasible. We focus on the problem of magnifying small motion in videos. Using synthetic data allows us to train DNN models that magnify motion ...

  11. Computer Vision Group

    Research Areas Research Areas Our research group is working on a range of topics in Computer Vision and Image Processing, many of which are using Artifical Intelligence. Computer Vision is about interpreting images. More specifically the goal is to infer properties of the observed world from an image or a collection of images. Our work combines a range of mathematical domains including ...

  12. Undergraduate Research Topics

    How to Contact Faculty for IW/Thesis Advising. Send the professor an e-mail. When you write a professor, be clear that you want a meeting regarding a senior thesis or one-on-one IW project, and briefly describe the topic or idea that you want to work on. ... Computer vision. Independen Work Topics: Constructing a new method to explain a model ...

  13. Computer Science Dissertations and Theses

    Theses/Dissertations from 2019. PDF. A Secure Anti-Counterfeiting System using Near Field Communication, Public Key Cryptography, Blockchain, and Bayesian Games, Naif Saeed Alzahrani (Dissertation) PDF. Spectral Clustering for Electrical Phase Identification Using Advanced Metering Infrastructure Voltage Time Series, Logan Blakely (Thesis) PDF.

  14. Computer Science and Engineering Theses and Dissertations

    Design, Deployment, and Validation of Computer Vision Techniques for Societal Scale Applications, Arup Kanti Dey. PDF. AffectiveTDA: Using Topological Data Analysis to Improve Analysis and Explainability in Affective Computing, Hamza Elhamdadi. PDF. Automatic Detection of Vehicles in Satellite Images for Economic Monitoring, Cole Hill. PDF

  15. Computer vision research topic ideas (UNDERGRAD)

    Pattern recognition, Machine learning, Deep learning these are topics are recently have more works. Further these three topics one to one related to each other. retrieval systems like the search ...

  16. Computer Vision really cool ideas for a thesis? : r/computervision

    Your thesis could be based on UI and computer vision as they really are changing the land scape and help an open source project in the process. We also want to add image homography and feature tracking to the next release (1.3). We have quick release cycles as well (about every 3 months).

  17. Dissertations / Theses: 'Computer Science. Computer vision ...

    Video (online) Consult the top 50 dissertations / theses for your research on the topic 'Computer Science. Computer vision.'. Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA ...

  18. Struggling to find a research topic in computer vision for ...

    I am struggling to find a research topic for my masters thesis in Artificial Intelligence (computer vision topics). With a plethora of research already published and requirement of novelty, it's a real struggle finding a proper and practical research topic. Some topics I'v currently shortlisted are facial expression / emotion recognition ...

  19. PDF Lecture 1: Introduction to "Computer Vision"

    Our job is to interpret the cues! Depth cues: Linear perspective. Depth cues: Aerial perspective. Depth ordering cues: Occlusion. Shape cues: Texture gradient. Shape and lighting cues: Shading. Position and lighting cues: Cast shadows. Grouping cues: Similarity (color, texture, proximity) Grouping cues: "Common fate".

  20. Computer Science Department Dissertations Collection

    Geometric Representation Learning, Luke Vilnis, Computer Science. PDF. Understanding of Visual Domains via the Lens of Natural Language, Chenyun Wu, Computer Science. PDF. Towards Practical Differentially Private Mechanism Design and Deployment, Dan Zhang, Computer Science. PDF. Audio-driven Character Animation, Yang Zhou, Computer Science

  21. Dissertations / Theses: 'Computer vision;Machine learning ...

    As an important and active research topic in computer vision community, visual tracking is a key component in many applications ranging from video surveillance and robotics to human computer. In this thesis, we propose new appearance models based on multiple visual cues and address several research issues in feature learning and fusion for ...

  22. computer-vision · GitHub Topics · GitHub

    Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge. python data-science machine-learning natural-language-processing reinforcement-learning computer-vision deep-learning mxnet book notebook tensorflow keras pytorch kaggle hyperparameter-optimization recommender-system gaussian-processes jax.

  23. Top 25 Computer Vision Project Ideas for 2023

    For example: with a round shape, you can detect all the coins present in the image. The project is good to understand how to detect objects with different kinds of shapes. 4. Collage Mosaic Generator. Computer Vision Project Idea - A collage mosaic is an image that is made up of thousands of small images.