• {{link.text}}

Speech Processing

Our goal in Speech Technology Research is twofold: to make speaking to devices around you (home, in car), devices you wear (watch), devices with you (phone, tablet) ubiquitous and seamless.

Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.

We are also in a unique position to deliver very user-centric research. Researchers have the wealth of millions of users talking to Voice Search or the Android Voice Input every day. and can conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we keep our users very close to make sure we solve real problems and have real impact.

We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 55 languages and we keep expanding our reach to more and more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.

In terms of a challenge, indexing and transcribing the web’s audio content is another challenge we have set for ourself, and is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and of course... cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The payoff is immense: imagine making every lecture on the web accessible to every language; this is the kind of impact we are striving for.

264 Publications

(Almost) Zero-Shot Cross-Lingual Spoken Language Understanding

Shyam Upadhyay, Manaal Faruqui , Gokhan Tur , Dilek Hakkani-Tur , Larry Heck

Proceedings of the IEEE ICASSP (2018)

An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model

Anjuli Kannan , Yonnghui Wu , Patrick Nguyen, Tara N. Sainath , Zhifeng Chen , Rohit Prabhavalkar

ICASSP (2018)

Decoding the auditory brain with canonical component analysis

Alain de Cheveigné, Daniel D. E. Wong, Giovanni M. Di Liberto, Jens Hjortkjaer, Malcolm Slaney , Edmund Lalor

NeuroImage (2018)

Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

Rohit Prabhavalkar , Tara Sainath , Yonghui Wu , Patrick Nguyen, Zhifeng Chen , Chung-Cheng Chiu , Anjuli Kannan

ICASSP 2018 (to appear)

Multilingual Speech Recognition with a Single End-to-End Model

Shubham Toshniwal, Tara N. Sainath , Ron Weiss , Bo Li , Pedro Moreno , Eugene Weinsten , Kanishka Rao

ON USING BACKPROPAGATION FOR SPEECH TEXTURE GENERATION AND VOICE CONVERSION

Jan Chorowski, Ron J. Weiss , Rif A. Saurous , Samy Bengio

Sound source separation using phase difference and reliable mask selection

Chanwoo Kim , Anjali Menon, Michiel Bacchiani , Richard M. Stern

ICASSP (2018) (to appear)

Spectral distortion model for training phase-sensitive deep-neural networks for far-field speech recognition

Chanwoo Kim , Tara Sainath , Arun Narayanan , Ananya Misra , Rajeev Nongpiur, Michiel Bacchiani

ICASSP 2018 (2018)

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Chung-Cheng Chiu , Tara Sainath , Yonghui Wu , Rohit Prabhavalkar , Patrick Nguyen, Zhifeng Chen , Anjuli Kannan , Ron J. Weiss , Kanishka Rao , Katya Gonina, Navdeep Jaitly, Bo Li , Jan Chorowski, Michiel Bacchiani

A Cascade Architecture for Keyword Spotting on Mobile Devices

Alexander Gruenstein , Raziel Alvarez , Chris Thornton, Mohammadali Ghodrat

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017)

A Comparison of Sequence-to-Sequence Models for Speech Recognition

Rohit Prabhavalkar , Kanishka Rao , Tara Sainath , Bo Li , Leif Johnson , Navdeep Jaitly

Interspeech 2017, ISCA (2017)

A Segmental Framework for Fully-Unsupervised Large-Vocabulary Speech Recognition

Herman Kamper, Aren Jansen , Sharon Goldwater

Computer Speech and Language (2017) (to appear)

A more general method for pronunciation learning

Antoine Bruguier , Dan Gnanapragasam , Francoise Beaufays , Kanishka Rao , Leif Johnson

Interspeech 2017 (2017)

Acoustic Modeling for Google Home

Bo Li , Tara Sainath , Arun Narayanan , Joe Caroselli, Michiel Bacchiani , Ananya Misra , Izhak Shafran , Hasim Sak , Golan Pundak , Kean Chin, Khe Chai Sim, Ron J. Weiss , Kevin Wilson , Ehsan Variani , Chanwoo Kim , Olivier Siohan , Mitchel Weintraub, Erik McDermott , Rick Rose , Matt Shannon

INTERSPEECH 2017 (2017)

An Analysis of "Attention" in Sequence-to-Sequence Models

Rohit Prabhavalkar , Tara Sainath , Bo Li , Kanishka Rao , Navdeep Jaitly

Approaches for Neural-Network Language Model Adaptation

Fadi Biadsy , Michael Alexander Nirschl , Min Ma, Shankar Kumar

Interspeech 2017, Stockholm, Sweden (2017)

Areal and Phylogenetic Features for Multilingual Speech Synthesis

Alexander Gutkin , Richard Sproat

Proc. of Interspeech 2017, ISCA, August 20–24, 2017, Stockholm, Sweden, pp. 2078-2082

Attention-Based Models for Text-Dependent Speaker Verification

F A Rezaur Rahman Chowdhury, Quan Wang , Ignacio Lopez Moreno , Li Wan

Binaural processing for robust speech recognition of degraded speech

Anjali Menon, Chanwoo Kim , Umpei Kurokawa, Richard M. Stern

IEEE Automatic Speech Recognition and Understanding Workshop (2017)

Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals

Fadi Biadsy , Mohammadreza Ghodsi , Diamantino Caseiro

Interpspeech 2017 (2017)

Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models

Chanwoo Kim , Ehsan Variani , Arun Narayanan , Michiel Bacchiani

arxiv (2017)

End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow

Ehsan Variani , Tom Bagby, Erik McDermott , Michiel Bacchiani

Endpoint detection using grid long short-term memory networks for streaming speech recognition

Bo Li , Carolina Parada , Gabor Simko , Shuo-yiin Chang , Tara Sainath

In Proc. Interspeech 2017 (to appear)

Generalized End-to-End Loss for Speaker Verification

Li Wan , Quan Wang , Alan Papir , Ignacio Lopez Moreno

Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home

Chanwoo Kim , Ananya Misra , Kean Chin, Thad Hughes , Arun Narayanan , Tara Sainath , Michiel Bacchiani

interspeech 2017 (2017), pp. 379-383

Generative Model-Based Text-to-Speech Synthesis

Google's next-generation real-time unit-selection synthesizer using sequence-to-sequence LSTM-based autoencoders

Vincent Wan , Yannis Agiomyrgiannakis , Hanna Silen, Jakub Vit

Interspeech (2017)

Highway-LSTM and Recurrent Highway Networks for Speech Recognition

Golan Pundak , Tara Sainath

Proc. Interspeech 2017, ISCA

Human and Machine Hearing: Extracting Meaning from Sound

Richard F. Lyon

Cambridge University Press (2017)

Improved end-of-query detection for streaming speech recognition

Carolina Parada , Gabor Simko , Matt Shannon, Shuo-yiin Chang

Proc. Interspeech 2017 (2017) (to appear)

Incoherent idempotent ambisonics rendering

W. Bastiaan Kleijn, Andrew Allen , Jan Skoglund , Felicia Lim

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)

Joint Wideband Source Localization and Acquisition Based on a Grid-Shift Approach

Christos Tzagkarakis, Bastiaan Kleijn, Jan Skoglund

Keyword Spotting for Google Assistant Using Contextual Speech Recognition

Assaf Michaely , Carolina Parada , Frank Zhang, Gabor Simko , Petar Aleksic

ASRU 2017, IEEE

Language Modeling in the Era of Abundant Data

Ciprian Chelba

AI With the Best online conference. (2017)

Latent Sequence Decompositions

William Chan , Yu Zhang , Quoc Le , Navdeep Jaitly

ICLR (2017)

Multi-Accent Speech Recognition with Hierarchical Grapheme Based Models

Hasim Sak , Kanishka Rao

ICASSP 2017 (to appear)

Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition

Tara Sainath , Ron J. Weiss , Kevin Wilson , Bo Li , Arun Narayanan , Ehsan Variani , Michiel Bacchiani , Izhak Shafran , Andrew Senior , Kean Chin, Ananya Misra , Chanwoo Kim

IEEE /ACM Transactions on Audio, Speech, and Language Processing, vol. 25 (2017), pp. 965 - 979

On Lattice Generation for Large Vocabulary Speech Recognition

David Rybach , Johan Schalkwyk, Michael Riley

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan (2017)

Optimizing expected word error rate via sampling for speech recognition

Matt Shannon

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals , Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis Carlos Cobo Rus, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen , Nal Kalchbrenner, Heiga Zen , Alexander Graves, Helen King, Thomas Walters , Dan Belov, Demis Hassabis

NA, Google Deepmind, NA (2017)

Practically Efficient Nonlinear Acoustic Echo Cancellers Using Cascaded Block RLS and FLMS Adaptive Filters

Yiteng (Arden) Huang, Jan Skoglund , Alejandro Luebs

ICASSP (2017)

Raw Multichannel Processing Using Deep Neural Networks

Tara N. Sainath , Ron J. Weiss , Kevin W. Wilson , Arun Narayanan , Michiel Bacchiani , Bo Li , Ehsan Variani , Izhak Shafran , Andrew Senior , Kean Chin, Ananya Misra , Chanwoo Kim

New Era for Robust Speech Recognition: Exploiting Deep Learning, Springer (2017)

Robust Speech Recognition Based on Binaural Auditory Processing

Anjali Menon, Chanwoo Kim , Richard M. Stern

INTERSPEECH 2017 (2017), pp. 3872-3876

Robust and low-complexity blind source separation for meeting rooms

W. Bastiaan Kleijn, Felicia Lim

Proceedings Fifth Joint Workshop on Hands-free Speech Communication and Microphone Arrays (2017)

Sparse Non-negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap

Ciprian Chelba , Diamantino Caseiro, Fadi Biadsy

The 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, pp. 2725-2729 (to appear)

Speaker Diarization with LSTM

Quan Wang , Carlton Downey, Li Wan , Philip Andrew Mansfield, Ignacio Lopez Moreno

Streaming Small-Footprint Keyword Spotting Using Sequence-to-Sequence Models

Yanzhang (Ryan) He, Rohit Prabhavalkar , Kanishka Rao , Wei Li, Anton Bakhtin , Ian McGraw

Automatic Speech Recognition and Understanding (ASRU), 2017 IEEE Workshop on

Syllable-Based Acoustic Modeling with CTC-SMBR-LSTM

Zhongdi Qu, Parisa Haghani, Eugene Weinstein , Pedro Moreno

Tacotron: Towards End-to-End Speech Synthesis

Yuxuan Wang , RJ Skerry-Ryan , Daisy Stanton , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly, Zongheng Yang, Ying Xiao , Zhifeng Chen , Samy Bengio , Quoc Le , Yannis Agiomyrgiannakis , Rob Clark , Rif A. Saurous

Trainable Frontend For Robust and Far-Field Keyword Spotting

Yuxuan Wang , Pascal Getreuer , Thad Hughes , Richard F. Lyon , Rif A. Saurous

Proc. IEEE ICASSP 2017, New Orleans, LA

Uncovering Latent Style Factors for Expressive Speech Synthesis

Yuxuan Wang , RJ Skerry-Ryan , Ying Xiao , Daisy Stanton , Joel Shor , Eric Battenberg , Rob Clark , Rif A. Saurous

NIPS Workshop on Machine Learning for Audio Signal Processing (ML4Audio) (2017) (to appear)

Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages

Alexander Gutkin

Proc. of Interspeech 2017, ISCA, August 20–24, Stockholm, Sweden, pp. 2183-2187

Very Deep Convolutional Networks for End-to-End Speech Recognition

Yu Zhang , William Chan , Navdeep Jaitly

Wavenet based low rate speech coding

W. Bastiaan Kleijn, Felicia S. C. Lim , Alejandro Luebs , Jan Skoglund , Florian Stimberg, Quan Wang , Thomas C. Walters

arXiv preprint arXiv:1712.01120 (2017)

A subband-based stationary-component suppression method using harmanics and power ratio for reverberant speech recognition

Byung Joon Cho, Haeyong Kwon, Ji-Won Cho, Chanwoo Kim , Richard M. Stern, Hyung-Min Park

IEEE SIGNAL PROCESSING LETTERS, vol. 23 (2016), pp. 780-784

AN ACOUSTIC KEYSTROKE TRANSIENT CANCELER FOR SPEECH COMMUNICATION TERMINALS USING A SEMI-BLIND ADAPTIVE FILTER MODEL

Herbert Buchner, Simon Godsill, Jan Skoglund

ICASSP (2016)

AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

Brian Patton , Yannis Agiomyrgiannakis , Michael Terry, Kevin Wilson , Rif A. Saurous , D. Sculley

NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop (to appear)

Automatic Optimization of Data Perturbation Distributions for Multi-Style Training in Speech Recognition

Mortaza Doulaty, Richard Rose , Olivier Siohan

Proceedings of the IEEE 2016 Workshop on Spoken Language Technology (SLT2016)

BI-MAGNITUDE PROCESSING FRAMEWORK FOR NONLINEAR ACOUSTIC ECHO CANCELLATION ON ANDROID DEVICES

Yiteng (Arden) Huang , Jan Skoglund , Alejandro Luebs

International Workshop on Acoustic Signal Enhancement 2016 (IWAENC2016)

Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla

Alexander Gutkin , Linne Ha, Martin Jansche , Oddur Kjartansson, Knot Pipatsrisawat, Richard Sproat

SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, 09-12 May 2016, Yogyakarta, Indonesia; Procedia Computer Science, Elsevier B.V., pp. 194-200

Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling

Ehsan Variani , Tara N. Sainath , Izhak Shafran , Michiel Bacchiani

Interspeech 2016 (2016)

Contextual prediction models for speech recognition

Yoni Halpern, Keith Hall , Vlad Schogol, Michael Riley , Brian Roark , Gleb Skobeltsyn , Martin Baeuml

Proceedings of Interspeech 2016

Cross-lingual projection for class-based language models

Beat Gfeller, Vlad Schogol, Keith Hall

Directly Modeling Voiced and Unvoiced Components in Speech Waveforms by Neural Networks

Keiichi Tokuda, Heiga Zen

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2016), pp. 5640-5644

Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition

Austin Waters , Yevgen Chebotar

Interspeech (2016)

Distributed representation and estimation of WFST-based n-gram models

Cyril Allauzen , Michael Riley , Brian Roark

Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (StatFSM) (2016), pp. 32-41

End-to-End Text-Dependent Speaker Verification

Georg Heigold , Ignacio Moreno , Samy Bengio , Noam M. Shazeer

International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)

Factored Spatial and Spectral Multichannel Raw Waveform CLDNNs

Tara N. Sainath , Ron J. Weiss , Kevin W. Wilson , Arun Narayanan , Michiel Bacchiani

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Heiga Zen , Yannis Agiomyrgiannakis , Niels Egberts, Fergus Henderson , Przemysław Szczepaniak

Proc. Interspeech, San Francisco, CA, USA (2016)

Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection

Ruben Zazo, Tara N. Sainath , Gabor Simko , Carolina Parada

Flatstart-CTC: a new acoustic model training procedure for speech recognition

Andrew Senior , Hasim Sak , Kanishka Rao

ICASSP 2016

GLOBALLY OPTIMIZED LEAST-SQUARES POST-FILTERING FOR MICROPHONE ARRAY SPEECH ENHANCEMENT

Yiteng (Arden) Huang , Alejandro Luebs , Jan Skoglund , W. Bastiaan Kleijn

High quality agreement-based semi-supervised training data for acoustic modeling

Félix de Chaumont Quitry , Asa Oines, Pedro Moreno , Eugene Weinstein

2016 IEEE Workshop on Spoken Language Technology

Learning Compact Recurrent Neural Networks

Zhiyun Lu, Vikas Sindhwani , Tara Sainath

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016

Learning N-gram Language Models from Uncertain Data

Vitaly Kuznetsov , Hank Liao , Mehryar Mohri , Michael Riley , Brian Roark

Learning Personalized Pronunciations for Contact Names Recognition

Tony Bruguier , Fuchun Peng , Francoise Beaufays

Interspeech 2016 (to appear)

Listen, Attend and Spell: A Neural Network for Large Vocabulary Conversational Speech Recognition

William Chan , Navdeep Jaitly, Quoc V. Le , Oriol Vinyals

Lower Frame Rate Neural Network Acoustic Models

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

Tara N. Sainath , Bo Li

Proc. Interspeech, ISCA (2016) (to appear)

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN based Statistical Parametric Speech Synthesis

Bo Li , Heiga Zen

Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition

Bo Li , Tara N. Sainath , Ron J. Weiss , Kevin W. Wilson , Michiel Bacchiani

Proc. Interspeech, ISCA (2016)

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

Hagen Soltau, Hank Liao , Hasim Sak

ArXiv e-prints (2016)

ON PRE-FILTERING STRATEGIES FOR THE GCC-PHAT ALGORITHM

Hong-Goo Kang, Michael Graczyk, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2016 (IWAENC 2016)

On The Compression Of Recurrent Neural Networks With An Application To LVCSR Acoustic Modeling For Embedded Speech Recognition

Rohit Prabhavalkar , Ouais Alsharif , Antoine Bruguier , Ian McGraw

Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)

On the Efficient Representation and Execution of Deep Acoustic Models

Raziel Alvarez , Rohit Prabhavalkar , Anton Bakhtin

Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) (2016)

Personalized Speech Recognition On Mobile Devices

Ian McGraw, Rohit Prabhavalkar , Raziel Alvarez , Montse Gonzalez Arenas, Kanishka Rao , David Rybach , Ouais Alsharif , Hasim Sak , Alexander Gruenstein , Françoise Beaufays , Carolina Parada

Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

Chanwoo Kim , Richard M. Stern

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING,, vol. 24 (2016), pp. 1315-1329

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Daan van Esch, Kanishka Rao , Mason Chua

Proceedings of InterSpeech 2016 (to appear)

Pynini: A Python library for weighted finite-state grammar compilation

Kyle Gorman

Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (2016), pp. 75-80

Recent Advances in Google Real-time HMM-driven Unit Selection Synthesizer

Xavi Gonzalvo , Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin , Hanna Silen

INTERSPEECH 2016, Sep 8-12, San Francisco, USA, pp. 2238-2242

Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction

Tara N. Sainath , Arun Narayanan , Ron J. Weiss , Ehsan Variani , Kevin W. Wilson , Michiel Bacchiani , Izhak Shafran

Robust Estimation of Reverberation Time Using Polynomial Roots

Ian Kelly , Francis Boland, Jan Skoglund

AES 60th Conference on Dereverberation and Reverberation of Audio, Music, and Speech, Google Ireland Ltd. (2016)

Selection and Combination of Hypotheses for Dialectal Speech Recognition

Victor Soto, Olivier Siohan , Mohamed Elfeky , Pedro J. Moreno

Semantic Model for Fast Tagging of Word Lattices

Leonid Velikovich

IEEE Spoken Language Technology (SLT) Workshop (2016) (to appear)

THE MATCHING-MINIMIZATION ALGORITHM, THE INCA ALGORITHM AND A MATHEMATICAL FRAMEWORK FOR VOICE CONVERSION WITH UNALIGNED CORPORA.

Yannis Agiomyrgiannakis

ICASSP, IEEE (2016)

TTS for Low Resource Languages: A Bangla Synthesizer

Alexander Gutkin , Linne Ha, Martin Jansche , Knot Pipatsrisawat, Richard Sproat

10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, European Language Resources Association (ELRA), Portorož, Slovenia, pp. 2005-2010

Towards Acoustic Model Unification Across Dialects

Austin Waters , Meysam Bastani, Mohamed G. Elfeky , Pedro Moreno , Xavier Velez

Unsupervised Context Learning For Speech Recognition

Assaf Michaely , Justin Scheiner, Mohammadreza Ghodsi , Petar Aleksic , Zelin Wu

Spoken Language Technology (SLT) Workshop, IEEE (2016)

Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings

Aren Jansen , Herman Kamper, Sharon Goldwater

IEEE Transactions on Audio, Speech, and Language Processing (2016)

Using instantaneous frequency and aperiodicity detection to estimate FO for high-quality speech synthesis

Hideki Kawahara, Yannis Agiomyrgiannakis , Heiga Zen

Proc. ISCA SSW9 (2016), pp. 238-245

VOICE MORPHING THAT IMPROVES TTS QUALITY USING AN OPTIMAL DYNAMIC FREQUENCY WARPING-AND-WEIGHTING TRANSFORM

Yannis Agiomyrgiannakis , Zoe Roupakia

A 6 µW per Channel Analog Biomimetic Cochlear Implant Processor Filterbank Architecture With Across Channels AGC

Guang Wang, Richard F. Lyon , Emmanuel M. Drakakis

IEEE Transactions on Biomedical Circuits and Systems, vol. 9 (2015), pp. 72-86

A Gaussian Mixture Model Layer Jointly Optimized with Discriminative Features within A Deep Neural Network Architecture

Ehsan Variani , Erik McDermott , Georg Heigold

ICASSP, IEEE (2015)

Acoustic Modeling for Speech Synthesis: from HMM to RNN

IEEE ASRU, Scottsdale, Arizona, U.S.A. (2015)

Acoustic Modeling in Statistical Parametric Speech Synthesis - From HMM to LSTM-RNN

Proc. MLSLP (2015)

Acoustic Modelling with CD-CTC-SMBR LSTM RNNS

Andrew Senior , Hasim Sak , Felix de Chaumont Quitry , Tara N. Sainath , Kanishka Rao

ASRU (2015)

Automatic Gain Control and Multi-style Training for Robust Small-Footprint Keyword Spotting with Deep Neural Networks

Rohit Prabhavalkar , Raziel Alvarez , Carolina Parada , Preetum Nakkiran, Tara Sainath

Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2015), pp. 4704-4708

Automatic Pronunciation Verification for Speech Recognition

Kanishka Rao , Fuchun Peng , Françoise Beaufays

ICASSP (2015)

Bringing Contextual Information to Google Speech Recognition

Petar Aleksic , Mohammadreza Ghodsi , Assaf Michaely , Cyril Allauzen , Keith Hall , Brian Roark , David Rybach , Pedro Moreno

Interspeech 2015, International Speech Communications Association

Composition-based on-the-fly rescoring for salient n-gram biasing

Keith Hall , Eunjoon Cho, Cyril Allauzen , Francoise Beaufays , Noah Coccaro, Kaisuke Nakajima, Michael Riley , Brian Roark , David Rybach , Linda Zhang

Compressing Deep Neural Networks using a Rank-Constrained Topology

Preetum Nakkiran, Raziel Alvarez , Rohit Prabhavalkar , Carolina Parada

Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), ISCA (2015), pp. 1473-1477

Context dependent phone models for LSTM RNN acoustic modelling

Andrew W. Senior , Hasim Sak , Izhak Shafran

ICASSP (2015), pp. 4585-4589

Convolutional Neural Networks for Small-Footprint Keyword Spotting

Tara Sainath , Carolina Parada

Interspeech (2015)

Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks

Tara Sainath , Oriol Vinyals , Andrew Senior , Hasim Sak

DETECTION AND SUPPRESSION OF KEYBOARD TRANSIENT NOISE IN AUDIO STREAMS WITH AUXILIARY KEYBED MICROPHONE

Simon Godsill, Herbert Buchner, Jan Skoglund

ICASSP 2015, IEEE

DIRECT-TO-REVERBERANT RATIO ESTIMATION USING A NULL-STEERED BEAMFORMER

James Eaton, Alastair Moore, Patrick Naylor, Jan Skoglund

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling, Shiyin Kang, Heiga Zen , Andrew Senior , Mike Schuster , Xiao-Jun Qian, Helen Meng, Li Deng

IEEE Signal Processing Magazine, vol. 32 (2015), pp. 35-52

Directly Modeling Speech Waveforms by Neural Networks for Statistical Parametric Speech Synthesis

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4215-4219

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

Hasim Sak , Andrew W. Senior , Kanishka Rao , Françoise Beaufays

CoRR, vol. abs/1507.06947 (2015)

Fix It Where It Fails: Pronunciation Learning by Mining Error Corrections from Speech Logs

Zhenzhen Kou, Daisy Stanton , Fuchun Peng , Françoise Beaufays , Trevor Strohman

Garbage Modeling for On-device Speech Recognition

Christophe Van Gysel, Leonid Velikovich , Ian McGraw, Françoise Beaufays

Interspeech 2015, International Speech Communications Association (to appear)

Geo-location for Voice Search Language Modeling

Ciprian Chelba , Xuedong Zhang, Keith Hall

Interspeech 2015, International Speech Communications Association, pp. 1438-1442

Grapheme-to-Phoneme Conversion Using Long Short-Term Memory Recurrent Neural Networks

Kanishka Rao , Fuchun Peng , Hasim Sak , Françoise Beaufays

Improved recognition of contact names in voice commands

Petar Aleksic , Cyril Allauzen , David Elson, Aleks Kracun, Diego Melendo Casado, Pedro J. Moreno

ICASSP 2015

Stanford Information Theory Forum (2015)

Large Vocabulary Automatic Speech Recognition for Children

Hank Liao , Golan Pundak , Olivier Siohan , Melissa Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath , Andrew Senior , Françoise Beaufays , Michiel Bacchiani

Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR

Arun Narayanan , Ananya Misra , Kean Chin

INTERSPEECH-2015, ISCA, pp. 3571-3575

Learning acoustic frame labeling for speech recognition with recurrent neural networks

Hasim Sak , Andrew W. Senior , Kanishka Rao , Ozan Irsoy, Alex Graves, Françoise Beaufays, Johan Schalkwyk

ICASSP (2015), pp. 4280-4284

Learning the Speech Front-end with Raw Waveform CLDNNs

Tara Sainath , Ron J. Weiss , Kevin Wilson , Andrew W. Senior , Oriol Vinyals

Listen, Attend and Spell

CoRR, vol. abs/1508.01211 (2015)

Locally-Connected and Convolutional Neural Networks for Small Footprint Speaker Recognition

Yu-hsin Chen, Ignacio Lopez Moreno , Tara Sainath , Mirkó Visontai, Raziel Alvarez , Carolina Parada

Long Short-Term Memory Language Models with Additive Morphological Features for Automatic Speech Recognition

Daniel Renshaw, Keith B. Hall

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)

Multi-Dialectical Languages Effect on Speech Recognition

Mohamed Elfeky , Pedro J. Moreno , Victor Soto

International Conference on Natural Language and Speech Processing (2015)

Multitask learning and system combination for automatic speech recognition

Olivier Siohan , David Rybach

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Pruning Sparse Non-negative Matrix N-gram Language Models

Joris Pelemans, Noam M. Shazeer, Ciprian Chelba

Proceedings of Interspeech 2015, ISCA, pp. 1433-1437

Query-by-Example Keyword Spotting Using Long Short-Term Memory Networks

Guoguo Chen, Carolina Parada , Tara N. Sainath

Rapid Vocabulary Addition to Context-Dependent Decoder Graphs

Cyril Allauzen , Michael Riley

Interspeech 2015

Sequence-based Class Tagging for Robust Transcription in ASR

Lucy Vasserman , Vlad Schogol, Keith Hall

Sound source separation algorithm using phase difference and angle distribution modeling near the target

Chanwoo Kim , Kean Chin

INTERSPEECH 2015, pp. 751-755

Sparse Non-negative Matrix Language Modeling for Geo-annotated Query Session Data

Ciprian Chelba , Noam M. Shazeer

Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Proceedings, IEEE, to appear (to appear)

Speaker Location and Microphone Spacing Invariant Acoustic Modeling from Raw Multichannel Waveforms

Tara N. Sainath , Ron J. Weiss , Kevin Wilson , Arun Narayanan , Michiel Bacchiani , Andrew Senior

Speech Acoustic Modeling from Raw Multichannel Waveforms

Yedid Hoshen, Ron Weiss , Kevin W Wilson

International Conference on Acoustics, Speech, and Signal Processing, IEEE (2015)

Statistical parametric speech synthesis: from HMM to LSTM-RNN

RTTH Summer School on Speech Technology -- A Deep Learning Perspective, Barcelona, Spain (2015)

Telluride Decoding Toolbox

Sahar Akram, Alain de Cheveigné, Peter Udo Diehl, Emily Graber, Carina Graversen, Jens Hjortkjaer, Nima Mesgarani, Lucas Parra, Ulrich Pomper, Shihab Shamma, Jonathan Simon, Malcolm Slaney , Daniel Wong

Institute for Neuroinformatics (2015)

Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis

Heiga Zen , Hasim Sak

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4470-4474

ViSQOL: an objective speech quality model

Andrew Hines, Jan Skoglund , Anil Kokaram , Naomi Harte

EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015 (13) (2015), pp. 1-18

Vocaine the Vocoder and Applications in Speech Synthesis

ICASSP, IEEE (2015) (to appear)

A big data approach to acoustic model training corpus selection

Olga Kapralova , John Alex, Eugene Weinstein , Pedro Moreno , Olivier Siohan

Conference of the International Speech Communication Association (Interspeech) (2014)

An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

Alastair H Moore, Patrick A Naylor, Jan Skoglund

Proceedings of European Signal Processing Conference (EUSIPCO) 2014

Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks

Georg Heigold , Erik McDermott , Vincent Vanhoucke , Andrew Senior , Michiel Bacchiani

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Firenze, Italy (2014)

Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks: Towards Big Data

Erik McDermott , Georg Heigold , Pedro Moreno , Andrew Senior, Michiel Bacchiani

Interspeeech, ISCA (2014)

Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition

M. Bacchiani , A. Senior , G. Heigold

Proceedings of the European Conference on Speech Communication and Technology (2014) (to appear)

Automatic Language Identification Using Deep Neural Networks

Ignacio Lopez-Moreno , Javier Gonzalez-Dominguez, Oldrich Plchot

Proc. ICASSP, IEEE (2014)

Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks

Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno , Hasim Sak

Interspeech (2014)

Autoregressive Product of Multi-frame Predictions Can Improve the Accuracy of Hybrid Models

Navdeep Jaitly, Vincent Vanhoucke , Geoffrey Hinton

Proceedings of Interspeech 2014

Backoff Inspired Features for Maximum Entropy Language Models

Fadi Biadsy , Keith Hall , Pedro Moreno , Brian Roark

Proceedings of Interspeech, ISCA (2014)

Computer-aided quality assurance of an Icelandic pronunciation dictionary

Martin Jansche

LREC 2014, Reykjavik

Context Dependent State Tying for Speech Recognition using Deep Neural Network Acoustic Models

M. Bacchiani , D. Rybach

Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)

Deep Mixture Density Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

Heiga Zen , Andrew Senior

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2014), pp. 3872-3876

Deep Neural Networks for Small Footprint Text-dependent Speaker Verification

Ehsan Variani , Xin Lei, Erik McDermott , Ignacio Lopez Moreno , Javier Gonzalez-Dominguez

Direct construction of compact context-dependency transducers from data

David Rybach , Michael Riley , Chris Alberti

Computer Speech & Language, vol. 28 (2014), pp. 177-191

Discriminative pronunciation modeling for dialectal speech recognition

Maider Lehr, Kyle Gorman , Izhak Shafran

Proc. Interspeech (2014) (to appear)

Encoding Linear Models As Weighted Finite-State Transducers

Ke Wu, Cyril Allauzen , Keith Hall , Michael Riley , Brian Roark

Interspeech 2014, ISCA, pp. 1258-1262

Fine Context, Low-rank, Softplus Deep Neural Networks for Mobile Speech Recognition

Andrew Senior , Xin Lei

Proc. ICASSP (2014) (to appear)

Frame by Frame Language Identification in Short Utterances using Deep Neural Networks

Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno , Pedro J. Moreno , Joaquin Gonzalez-Rodriguez

Neural Networks Special Issue: Neural Network Learning in Big Data (2014)

GMM-Free DNN Training

A. Senior , G. Heigold , M. Bacchiani , H. Liao

Improving DNN Speaker Independence with I-vector Inputs

Andrew Senior , Ignacio Lopez-Moreno

JustSpeak: Enabling Universal Voice Control on Android

Yu Zhong , T. V. Raman , Casey Burkhardt , Fadi Biadsy , Jeffrey P. Bigham

Large-Scale Speaker Identification

Ludwig Schmidt, Matthew Sharifi, Ignacio Lopez-Moreno

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Hasim Sak , Andrew W. Senior , Françoise Beaufays

CoRR, vol. abs/1402.1128 (2014)

Long short-term memory recurrent neural network architectures for large scale acoustic modeling

INTERSPEECH (2014), pp. 338-342

Pronunciation Learning for Named-Entities through Crowd-Sourcing

Attapol Rutherford, Fuchun Peng , Françoise Beaufays

Proceedings of Interspeech (2014)

Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression

Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim , Richard M. Stern

INTERSPEECH (2014), pp. 2715-2718

Robust speech recognition using temporal masking and thresholding algorithm

Chanwoo Kim , Kean Chin, Michiel Bacchiani , R. M. Stern

INTERSPEECH-2014, pp. 2734-2738

Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks

Hasim Sak , Oriol Vinyals , Georg Heigold , Andrew Senior, Erik McDermott , Rajat Monga , Mark Mao

Sinusoidal Interpolation Across Missing Data

W. Bastiaan Kleijn, Turaj Zakizadeh Shabestary, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2014 (IWAENC 2014), pp. 71-75

Small-Footprint Keyword Spotting using Deep Neural Networks

Guoguo Chen, Carolina Parada , Georg Heigold

ICASSP, IEEE (2014)

Statistical Parametric Speech Synthesis

UKSpeech Conference, Edinburgh, UK (2014)

Text-To-Speech with cross-lingual Neural Network-based grapheme-to-phoneme models

Xavi Gonzalvo , Monika Podsiadlo

Training Data Selection Based On Context-Dependent State Matching

Olivier Siohan

Proceedings of ICASSP 2014

Word Embeddings for Speech Recognition

Samy Bengio , Georg Heigold

Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech (2014)

A FREQUENCY-WEIGHTED POST-FILTERING TRANSFORM FOR COMPENSATION OF THE OVER-SMOOTHING EFFECT IN HMM-BASED SPEECH SYNTHESIS

Yannis Agiomyrgiannakis , Florian Eyben

ICASSP, IEEE (2013)

Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices

Xin Lei, Andrew Senior , Alexander Gruenstein , Jeffrey Sorensen

Interspeech (2013)

An Empirical study of learning rates in deep neural networks for speech recognition

Andrew Senior , Georg Heigold , Marc'aurelio Ranzato, Ke Yang

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013) (to appear)

Deep Learning in Speech Synthesis

8th ISCA Speech Synthesis Workshop, Barcelona, Spain (2013)

Deep Neural Networks with Auxiliary Gaussian Mixture Models for Real-Time Speech Recognition

Xin Lei, Hui Lin , Georg Heigold

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)

Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search

Ciprian Chelba , Johan Schalkwyk

Mobile Speech and Advanced Natural Language Solutions, Springer Science+Business Media, New York (2013), pp. 197-229

Language Model Verbalization for Automatic Speech Recognition

Hasim Sak , Françoise Beaufays , Kaisuke Nakajima, Cyril Allauzen

Proc ICASSP, IEEE (2013)

Language Modeling Capitalization

Françoise Beaufays , Brian Strope

Proc ICASSP, IEEE (2013) (to appear)

Large Scale Distributed Acoustic Modeling With Back-off N-grams

Ciprian Chelba , Peng Xu , Fernando Pereira , Thomas Richardson

IEEE Transactions on Audio, Speech and Language Processing, vol. 21 (2013), pp. 1158-1169

ICSI, Berkeley, California (2013)

Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription

Hank Liao , Erik McDermott , Andrew Senior

ASRU (2013)

Mixture of mixture n-gram language models

Hasim Sak , Cyril Allauzen , Kaisuke Nakajima, Françoise Beaufays

ASRU (2013), pp. 31-36

Monitoring the Effects of Temporal Clipping on VoIP Speech Quality

Interspeech 2013, pp. 1188-1192

Multiframe Deep Neural Networks for Acoustic Modeling

Vincent Vanhoucke , Matthieu Devin , Georg Heigold

Multilingual acoustic models using distributed deep neural networks

Georg Heigold , Vincent Vanhoucke , Andrew Senior , Patrick Nguyen, Marc'aurelio Ranzato, Matthieu Devin , Jeff Dean

On Rectified Linear Units For Speech Processing

M.D. Zeiler, M. Ranzato, R. Monga , M. Mao, K. Yang , Q.V. Le , P. Nguyen, A. Senior , V. Vanhoucke , J. Dean , G.E. Hinton

38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013)

Pre-Initialized Composition for Large-Vocabulary Speech Recognition

Interspeech 2013, 666 – 670

RAPID ADAPTATION FOR MOBILE SPEECH APPLICATIONS

M. Bacchiani

Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2013)

Rate-Distortion Optimization for Multichannel Audio Compression

Minyue Li, Jan Skoglund , W. Bastiaan Kleijn

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Recurrent Neural Networks for Voice Activity Detection

Thad Hughes , Keir Mierle

ICASSP, IEEE (2013), pp. 7378-7382

Robustness of Speech Quality Metrics to Background Noise and Network Degradations: Comparing VISQOL, PESQ and POLQA

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 3697-3701

Search Results Based N-Best Hypothesis Rescoring With Maximum Entropy Classification

Fuchun Peng , Scott Roy, Ben Shahshahani, Françoise Beaufays

Proceedings of ASRU (2013)

Smoothed marginal distribution constraints for language modeling

Brian Roark , Cyril Allauzen , Michael Riley

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013), pp. 43-52

Speaker Adaptation of Context Dependent Deep Neural Networks

International Conference of Acoustics, Speech, and Signal Processing. (2013)

Speech and Natural Language: Where Are We Now And Where Are We Headed?

Mobile Voice Conference, San Francisco (2013)

Statistical Parametric Speech Synthesis Using Deep Neural Networks

Heiga Zen , Andrew Senior , Mike Schuster

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 7962-7966

Written-Domain Language Modeling for Automatic Speech Recognition

Hasim Sak , Yun-hsuan Sung , Françoise Beaufays , Cyril Allauzen

iVector-based Acoustic Data Selection

Olivier Siohan , Michiel Bacchiani

Proceedings of Interspeech (2013)

Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition

Navdeep Jaitly, Patrick Nguyen, Andrew Senior , Vincent Vanhoucke

Proceedings of Interspeech 2012

Buildling adaptive dialogue systems via Bayes-adaptive POMDP

Shaowei Png , Joelle Pineau, B. Chaib-draa

IEEE Journal of Selected Topics in Signal Processing, vol. vol.6(8). 2012. (2012), pp. 917-927

Chapter 17: Uncertainty Decoding, In Virtanen, Singh, & Raj (Eds.) Techniques for Noise Robustness in Automatic Speech Recognition.

Wiley (2012), pp. 463-485

Continuous Space Discriminative Language Modeling

Puyang Xu, Sanjeev Khudanpur, Maider Lehr, Emily Prud’hommeaux, Nathan Glenn, Damianos Karakos, Brian Roark , Kenji Sagae, Murat Saraclar, Izhak Shafran , Dan Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall , Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley

ICASSP 2012

Deep Neural Networks for Acoustic Modeling in Speech Recognition

Geoffrey Hinton , Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior , Vincent Vanhoucke , Patrick Nguyen, Tara Sainath , Brian Kingsbury

Signal Processing Magazine (2012)

Distributed Acoustic Modeling with Back-off N-grams

Proceedings of ICASSP 2012, IEEE, pp. 4129-4132

Distributed Discriminative Language Models for Google Voice Search

Preethi Jyothi, Leif Johnson , Ciprian Chelba , Brian Strope

Proceedings of ICASSP 2012, IEEE, pp. 5017-5021

Estimating Word-Stability During Incremental Speech Recognition

Ian McGraw, Alexander Gruenstein

Interspeech (2012)

Exemplar-Based Processing for Speech Recognition: An Overview

Tara N. Sainath , Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Dirk Van Compernolle, Kris Demuynck, Jort F. Gemmeke , Jerome R. Bellegarda, Shiva Sundaram

IEEE Signal Process. Mag., vol. 29 (2012), pp. 98-113

Google's Cross-Dialect Arabic Voice Search

Fadi Biadsy , Pedro J. Moreno , Martin Jansche

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 4441-4444

Hallucinated N-Best Lists for Discriminative Language Modeling

Kenji Sagae, Maider Lehr, Emily Tucker Prud’hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark , Murat Saraçlar, Izhak Shafran , Daniel M. Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall , Eva Hassler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)

Haptic Voice Recognition Grand Challenge

K. Sim, S. Zhao, K. Yu, H. Liao

14th ACM International Conference on Multimodal Interaction. (2012)

IMPROVED PREDICTION OF NEARLY-PERIODIC SIGNALS

Bastiaan Kleijn, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)

Investigations on Exemplar-Based Features for Speech Recognition Towards Thousands of Hours of Unsupervised, Noisy Data

Georg Heigold , Patrick Nguyen, Mitchel Weintraub, Vincent Vanhoucke

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Kyoto, Japan (2012), pp. 4437-4440

Japanese and Korean Voice Search

Mike Schuster , Kaisuke Nakajima

International Conference on Acoustics, Speech and Signal Processing, IEEE (2012), pp. 5149-5152

Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice

Ciprian Chelba , Johan Schalkwyk, Boulos Harb , Carolina Parada , Cyril Allauzen , Leif Johnson , Michael Riley , Peng Xu , Preethi Jyothi, Thorsten Brants, Vida Ha, Will Neveitt

University of Toronto (2012)

Large Scale Language Modeling in Automatic Speech Recognition

Ciprian Chelba , Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar

Google (2012)

Large-scale Discriminative Language Model Reranking for Voice Search

Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Association for Computational Linguistics, pp. 41-49

Learning improved linear transforms for speech recognition

Andrew Senior , Youngmin Cho, Jason Weston

ICASSP, IEEE (2012)

Music Models for Music-Speech Separation

Thad Hughes , Trausti Kristjansson

ICASSP, IEEE (2012), pp. 4917-4920

Optimal Size, Freshness and Time-frame for Voice Search Vocabulary

Maryam Kamvar , Ciprian Chelba

Recognition of Multilingual Speech in Mobile Applications

Hui Lin , Jui-Ting Huang, Francoise Beaufays , Brian Strope, Yun-hsuan Sung

ICASSP (2012)

Recurrent Neural Networks for Noise Reduction in Robust ASR

Andrew Maas, Quoc V. Le , Tyler M. O’Neil, Oriol Vinyals , Patrick Nguyen, Andrew Y. Ng

INTERSPEECH (2012)

Semi-supervised Discriminative Language Modeling for Turkish ASR

Murat Saraçlar, Daniel M. Bikel, Keith Hall , Kenji Sagae

2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, IEEE, Kyoto, Japan

Spectral Intersections for Non-Stationary Signal Separation

Trausti Kristjansson, Thad Hughes

Proceedings of InterSpeech 2012, Portland, OR

Speech/Nonspeech Segmentation in Web Videos

Ananya Misra

Proceedings of InterSpeech 2012

VISQOL: THE VIRTUAL SPEECH QUALITY OBJECTIVE LISTENER

Voice Query Refinement

Cyril Allauzen , Edward Benson, Ciprian Chelba , Michael Riley , Johan Schalkwyk

A Web-Based Tool for Developing Multilingual Pronunciation Lexicons

Samantha Ainsley , Linne Ha, Martin Jansche , Ara Kim, Masayuki Nanzawa

12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 3331-3332

Bayesian Language Model Interpolation for Mobile Speech Input

Interspeech 2011, pp. 1429-1432

Deploying Google Search by Voice in Cantonese

Yun-hsuan Sung , Martin Jansche , Pedro Moreno

12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 2865-2868

Discriminative Features for Language Identification

C. Alberti, M. Bacchiani

INTERSPEECH (2011)

Improving the speed of neural networks on CPUs

Vincent Vanhoucke , Andrew Senior , Mark Z. Mao

Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011

Ciprian Chelba , Johan Schalkwyk, Boulos Harb , Carolina Parada , Cyril Allauzen , Michael Riley , Peng Xu , Thorsten Brants, Vida Ha, Will Neveitt

OGI/OHSU Seminar Series, Portland, Oregon, USA (2011)

Recognizing English Queries in Mandarin Voice Search

Hung-An Chang, Yun-hsuan Sung , Brian Strope, Francoise Beaufays

ICASSP (2011)

Speech Retrieval

Ciprian Chelba , Timothy J. Hazen, Bhuvana Ramabhadran, Murat Saraçlar

Spoken Language Understanding, John Wiley and Sons, Ltd (2011), pp. 417-446

Summary of Opus listening test results

Christian Hoene, Jean-Marc Valin, Koen Vos, Jan Skoglund

IETF, IETF (2011)

TechWare: Mobile Media Search Resources [Best of the Web]

Z. Liu, M. Bacchiani

IEEE Signal Processing Magazine, vol. 28 (2011), pp. 142-145

Unsupervised Testing Strategies for ASR

Brian Strope, Doug Beeferman, Alexander Gruenstein , Xin Lei

Interspeech 2011, pp. 1685-1688

Challenges in Automatic Speech Recognition

Ciprian Chelba , Johan Schalkwyk, Michiel Bacchiani

Interspeech 2010

Decision Tree State Clustering with Word and Syllable Features

Hank Liao , Chris Alberti , Michiel Bacchiani , Olivier Siohan

Interspeech, ISCA (2010), 2958 – 2961

Discriminative Topic Segmentation of Text and Speech

Mehryar Mohri , Pedro Moreno , Eugene Weinstein

International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)

Google Search by Voice: A Case Study

Johan Schalkwyk, Doug Beeferman, Francoise Beaufays , Bill Byrne , Ciprian Chelba , Mike Cohen, Maryam Garrett , Brian Strope

Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, Springer (2010), pp. 61-90

On-Demand Language Model Interpolation for Mobile Speech Input

Brandon Ballinger, Cyril Allauzen , Alexander Gruenstein , Johan Schalkwyk

Interspeech (2010), pp. 1812-1815

Search by Voice in Mandarin Chinese

Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche , Pedro J. Moreno

Interspeech 2010, pp. 354-357

Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models

Francoise Beaufays , Vincent Vanhoucke , Brian Strope

Proc Interspeech (2010)

A new quality measure for topic segmentation of text and speech

Mehryar Mohri , Pedro J. Moreno , Eugene Weinstein

Conference of the International Speech Communication Association (Interspeech) (2009)

Restoring Punctuation and Capitalization in Transcribed Speech

Agustín Gravano, Martin Jansche , Michiel Bacchiani

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4741-4744

Revisiting Graphemes with Increasing Amounts of Data

Yun-Hsuan Sung , Thad Hughes , Francoise Beaufays , Brian Strope

ICASSP, IEEE (2009)

Web-derived Pronunciations

Arnab Ghoshal, Martin Jansche , Sanjeev Khudanpur, Michael Riley , Morgan Ulinski

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4289-4292

Confidence Scores for Acoustic Model Adaptation

C. Gollan, M. Bacchiani

Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2008)

Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing

Michiel Bacchiani , Francoise Beaufays , Johan Schalkwyk, Mike Schuster , Brian Strope

Proc. ICASSP (2008)

Retrieval and Browsing of Spoken Content

Ciprian Chelba , Timothy J. Hazen, Murat Saraçlar

Signal Processing Magazine, IEEE, vol. 25 (2008), pp. 39-49

Speech Recognition with Weighted Finite-State Transducers

Mehryar Mohri , Fernando C. N. Pereira , Michael Riley

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

On the Security and Privacy Challenges of Virtual Assistants

1 School of Science, Environment and Engineering, The University of Salford, Salford M5 4WT, UK; [email protected] (T.B.); [email protected] (T.D.); [email protected] (S.B.)

Tooska Dargahi

Sana belguith, mabrook s. al-rakhami.

2 Research Chair of Pervasive and Mobile Computing, Information Systems Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

Ali Hassan Sodhro

3 Department of Computer and System Science, Mid Sweden University, SE-831 25 Östersund, Sweden; [email protected]

4 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518000, China

5 Department of Electrical Engineering, Sukkur IBA University, Sukkur 65200, Pakistan

Since the purchase of Siri by Apple, and its release with the iPhone 4S in 2011, virtual assistants (VAs) have grown in number and popularity. The sophisticated natural language processing and speech recognition employed by VAs enables users to interact with them conversationally, almost as they would with another human. To service user voice requests, VAs transmit large amounts of data to their vendors; these data are processed and stored in the Cloud. The potential data security and privacy issues involved in this process provided the motivation to examine the current state of the art in VA research. In this study, we identify peer-reviewed literature that focuses on security and privacy concerns surrounding these assistants, including current trends in addressing how voice assistants are vulnerable to malicious attacks and worries that the VA is recording without the user’s knowledge or consent. The findings show that not only are these worries manifold, but there is a gap in the current state of the art, and no current literature reviews on the topic exist. This review sheds light on future research directions, such as providing solutions to perform voice authentication without an external device, and the compliance of VAs with privacy regulations.

1. Introduction

Within the last decade, there has been an increasing interest by governments and industry in developing smart homes. Houses are equipped with several internet-connected devices, such as smart meters, smart locks, and smart speakers to offer a range of services to improve quality of life. Virtual assistants (VAs)—often termed ‘smart speakers’—such as Amazon’s Alexa, Microsoft’s Cortana, and Apple’s Siri, simply described, are software applications that can interpret human speech as a question or instruction, perform tasks and respond using synthesised voices. These applications can run on personal computers, smartphones, tablets, and their dedicated hardware [ 1 ]. The user can interact with the VA in a natural and conversational manner: “Cortana, what is the weather forecast for Manchester tomorrow?”, “Alexa, set a reminder for the dentist”. The process requires no keyboards, microphones, or touchscreens [ 1 ]. This friction-free mode of operation is certainly gaining traction with users. In December 2017 there were 37 million smart speakers installed in the US alone; 12 months later this figure had risen to 66 million [ 2 ].

VAs and the companies behind them are not without their bad publicity. In 2018 the Guardian reported that an Alexa user from Portland, Oregon, asked Amazon to investigate when her device recorded a private conversation between her and her husband on the subject of hardwood floors and sent the audio to a contact in her address book—all without her knowing [ 3 ]. In 2019, the Daily Telegraph reported that Amazon employees were listening to Alexa users’ audio—including that which was recorded accidentally—at a rate of up to 1000 recordings per day [ 4 ]. As well as concerns about snooping by the VA, there are several privacy and security concerns around the information that VA companies store on their servers. The software application on the VA device is only a client—the bulk of the assistant’s work is done on a remote server, and every transaction and recording is kept by the VA company [ 5 ]. VAs have little in the way of voice authentication; they will respond to any voice that utters the wake word, meaning that one user could quite easily interrogate another’s VA to mine the stored personal information [ 1 ]. Additionally, Internet of Things (IoT) malware is becoming more common and more sophisticated [ 6 ]. There have been no reports yet of malware specifically targeting VAs ‘in the wild’ but it is surely a matter of time. A systematic review of research literature written on the security and privacy challenges of VAs and a critical analysis of these studies would give an insight into the current state of the art, and provide an understanding of any future directions new research might take.

1.1. Background

The most popular VAs on the market are Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana, and Google’s Assistant [ 1 ]; these assistants, often found in portable devices such as smartphones or tablets, can each be considered a ‘speech-based natural user interface’ (NUI) [ 7 ]; a system that can be operated by a user via intuitive, natural behaviour, i.e., voice instructions. Detailed, accurate information about the exact system and software architecture of commercial VAs is hard to come by. Given the sales numbers involved, VA providers are perhaps keen to protect their intellectual property. Figure 1 shows a high-level overview of the system architecture of Amazon’s Alexa VA.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-02312-g001.jpg

Architecture of a voice assistant (Alexa) ( https://www.faststreamtech.com/blog/amazon-alexa-integrated-with-iot-ecosystem-service/ ). (access on 10 February 2021) [ 8 ].

An example request might follow these steps:

  • The VA client—the ‘Echo Device’ in the diagram—is always listening for a spoken ‘wake word’; only when this is heard does any recording take place.
  • The recording of the user’s request is sent to Amazon’s service platform where the speech is turned into text by speech recognition, and natural language processing is used to translate that text into machine-readable instructions.
  • The recording and its text translation are sent to cloud storage, where they are kept.
  • The service platform generates a voice recording response which is played to the user via a loudspeaker in the VA client. The request might activate a ‘skill’—a software extension—to play music via streaming service Spotify, for example.
  • Further skills offer integration with IoT devices around the home; these can be controlled by messages sent from the service platform, via the Cloud.
  • A companion smartphone app can see responses sent by the service platform; some smartphones can also act like a fully-featured client.

As with any distributed computing system, there are several technologies used. The endpoint of the system with which the user interacts, shown here as the Echo device, commonly takes the form of a dedicated smart speaker—a computer-driven by a powerful 32-bit ARM Cortex CPU. In addition, these speakers support WiFi, Bluetooth, and have internal memory and storage [ 9 ].

The speech recognition, natural language processing (NLP), and storage of interactions are based in the Cloud. Amazon’s speech recognition and NLP service, known collectively as Amazon Voice Services (AVS) is hosted on their platform-as-a-service provider, Amazon Web Services (AWS). As well as AVS, AWS also hosts the cloud storage in which data records of voice interactions, along with their audio, are kept [ 10 ]. Data are transferred between the user endpoint and AVS using Javascript Object Notation-encoded messages via, in Amazon’s case, an unofficial public REST API hosted at http://pitangui.amazon.com (access on 22 February 2021) [ 11 ].

1.2. Prior Research and Contribution

There is a very limited number of systematic literature reviews (SLRs) written on the subject of VAs. To the best of our knowledge, none appears to specifically address the security and privacy challenges associated with VAs. The nearest that could be found was an SLR written by de Barcelos Silva et al. [ 12 ], in which a review of all literature pertinent to VAs is studied, and a relatively broad set of questions is posited and answered. Topics include a review of the state of the art, VA usage and architectures, and a taxonomy of VA classification. From the perspective of VA users who are motor or visually impaired, Siebra et al. [ 8 ] provided a literature review in 2018 that analysed VAs as a resource of accessibility for mobile devices. The authors identified and analysed proposals for VAs that better enable smartphone interaction for blind, motor-impaired, dyslexic, and other users who might need assistance. The end goal of their research was to develop a VA with suitable functions to aid these users. The study concluded that the current state of the art did not provide such research and outlined a preliminary protocol as a springboard for future work.

The main aim of this paper is to answer a specific question: “Are there privacy, security, or usage challenges with virtual assistants?” through a systematic literature review. A methodology was established for selecting studies made on the broader subject of VAs, and categorising them into more specific subgroups, i.e., subject audience, security or privacy challenges, and research theme (including user behaviour, applications, exploits, snooping, authentication, and forensics). In total, 20 papers were selected as primary studies to answer the research questions posited in the following section.

1.3. Research Goals

The purpose of this research was to take suitable existing studies, analyse their findings, and summarise the research undertaken into the security and privacy bearings of popular virtual assistants. Considering the lack of existing literature reviews on this subject, we aimed, in this paper, to fill the gap in the current research by linking together those studies which have addressed the privacy and security aspects of VAs in isolation, whether they have been written with users or developers in mind. To that end, the research questions listed in Table 1 have been considered.

Research questions.

The rest of this paper is organised as follows: the research methodology used to select the studies is outlined in Section 2 , whereas Section 3 discusses the findings for the selection of studies, and categorises those papers. In Section 4 , the research questions are answered, followed by a discussion on the future research directions in Section 5 . Section 6 concludes the paper.

2. Research Methodology

In order to answer the research questions in Table 1 , the following stages were undertaken.

2.1. Selection of Primary Studies

A search for a set of primary studies was undertaken by searching the website of particular publishers and using the Google Scholar search engine. The set of keywords used was designed to elicit results pertaining to security and privacy topics associated with popular digital assistants, such as Apple’s Siri, Google’s Assistant, and Amazon’s Alexa. To ensure that no papers were missed that might otherwise have been of interest, the search term was widened to use three further common terms for a virtual assistant. Boolean operators were limited to AND and OR. The searches were limited to the keywords, abstracts, and titles of the documents. The search term used was:

(“digital assistant” OR “virtual assistant” OR “virtual personal assistant” OR “siri” OR “google assistant” OR “alexa”) AND (“privacy” OR “security”)

Alongside Google Scholar, the following databases were searched:

  • IEEE Xplore Library
  • ScienceDirect
  • ACM Digital Library

2.2. Inclusion and Exclusion Criteria

For a study to be included in this SLR, it must present empirical findings; these could be technical research on security or more qualitative work on privacy. The study could apply to end-users, application developers, or the emerging work on VA forensics. The outcome of the study must contain data relating to tangible, technical privacy, and/or security aspects of VAs. General legal and ethical studies, although interesting, were excluded. For a paper to be selected, it had to be fully peer-reviewed research; therefore, results that were taken from blogs, industry magazines, or individual studies were excluded. Table 2 outlines the exact criteria chosen.

Inclusion and exclusion criteria for study selection.

2.3. Results Selection

Using the initial search criteria, 381 studies were singled out. These are broken down as follows:

  • IEEE Xplore: 27
  • ScienceDirect: 43
  • ACM Digital Library: 117
  • Google Scholar: 194

The inclusion and exclusion criteria ( Table 2 ) were applied, and a checklist was assembled to assess the quality of each study:

  • Does the study clearly show the purpose of the research?
  • Does the study adequately describe the background of the research and place it in context?
  • Does the study present a research methodology?
  • Does the study show results?
  • Does the study describe a conclusion, placing the results in context?
  • Does the study recommend improvements or further works?

EX2 (grey literature) removed 310 results, the bulk of the initial hits. Only one foreign-language paper was found amongst the results, which was also excluded. Throughout this process, eight duplicates were also found and excluded. With 63 results remaining for further study, these were read. A table was created using Excel and exclusion criterion EX1 (off-topic studies) was applied; following this, all three inclusion criteria were applied. Finally, 20 primary studies remained. Figure 2 shows how many studies remained after each stage of the process.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-02312-g002.jpg

Attrition of papers at different processing stages.

2.4. Publications over Time

If we consider the first popular VA to be Apple’s Siri [ 13 ]—first made available with the release of the company’s iPhone model 4S in 2011—it is interesting to see that the remaining primary studies which reported concrete data only dated back to 2017, four years before this review. The potential reasons for this will be discussed in Section 4 . Figure 3 shows the number of publications by year.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-02312-g003.jpg

Number of primary studies against time.

3. Findings

From the initial searches, a large number of studies were found, perhaps surprisingly, given that VA technology is relatively young. It is only ten years since the introduction of the first popular VA, Apple’s Siri [ 13 ]. However, the attrition process described in Figure 2 reduced this number to 20.

Instead of a single set of broad topics into which each of these studies could be categorised, we decided to approach each paper on three different levels, in line with the research questions posed in Section 1.3 . The papers were divided into three categories: Subject Audience, Security and Privacy, and Research Theme. Figure 4 shows a visual representation of the breakdown of the individual categories.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-02312-g004.jpg

Visual representation of study classifications.

3.1. Category 1: Subject Audience

The first categorisation is based on whether the work of the study is focussed on end-users, developers, or both.

End-users and developers are defined as follows:

  • End-user—a person who uses the VA in everyday life. This person may not have the technical knowledge and may be thought of as a ‘customer’ of the company whose VA they have adopted.
  • Developer—one who writes software extensions, known as ‘skills’ (Amazon) and ‘apps’ (Google). These extensions are made available to the end-user via online marketplaces.

3.2. Category 2: Security or Privacy?

As this study covers both security (safeguarding data) and privacy (safeguarding user identity), each study was categorised as one or the other. Only three papers covered both security and privacy in the same paper [ 14 , 15 , 16 ].

3.3. Category 3: Research Theme

The third categorisation considers the research themes addressed in each paper as follows:

  • Behaviour—the reviewed study looks at how users perceive selected aspects of VAs, and factors influencing the adoption of VAs. All except one of the behavioural studies were carried out on a control group of users [ 11 ].
  • Apps—the paper focuses on the development of software extensions and associated security implications.
  • Exploit—the reviewed paper looks at malicious security attacks (hacking, malware) where a VA is the target of the threat actor.
  • Snooping—the study is concerned with unauthorised listening, where the uninvited listening is being carried out by the device itself, as opposed to ‘Exploit’, where said listening is performed by a malicious threat actor.
  • Authentication—the study looks at ways in which a user might authenticate to the device to ensure the VA knows whom it is interacting with.
  • Forensics—the study looks at ways in which digital forensic artefacts can be retrieved from the device and its associated cloud services, for the purposes of a criminal investigation.

A taxonomy tree showing these categories and how they relate to the studies to which they apply is shown in Figure 5 .

An external file that holds a picture, illustration, etc.
Object name is sensors-21-02312-g005.jpg

A taxonomy tree showing categories used to classify different reviewed papers.

It is worth noting that studies focusing on the theme of exploits—malware and hacking—were categorised as such if the VA was the target of the threat actor. Further classifying these studies’ audiences as end-users or developers also considers the nature of the exploit; both developers and end-users can be at risk from these attacks. When a malicious attack exploits a VA’s existing functionality, the study is categorised as ‘end-user’; it is the user who is affected by the exploit. Where the exploit requires new software to be written—for example, the creation of a malicious ‘Skill’—the study is categorised as both ‘developer’ and ‘end-user’ [ 10 , 17 , 18 ]. There was one study [ 19 ] that examined an exploit that required software to be written that exploited a vulnerability in other third-party software. Although the exploit may ultimately have affected the end-user, the focus there was on software development and so the paper was categorised as ‘developer’.

In terms of the subject audience, end-users were overwhelmingly the focus in 79% of papers; a further 11% included end-users with developers as the main focus, and 10% of papers were focussed only on developers. There was a fairly even split between security and privacy as the main thrust of the study; security was the subject of slightly more, at 47%, versus 42% for privacy. Few papers combined the study of both: only 11%. Examining the numbers in the research theme category, exploits were the focus of the majority of the studies; and behaviour was joint third alongside authentication as the focus of the remaining studies. The remainder—snooping, apps, and forensics—were split equally, with only one study dedicated to each. The primary studies are listed in Table 3 , along with their categorisations.

Key data reported by primary studies.

4. Discussion

A recurring theme throughout this review so far has been the relative immaturity of VA technology and the short timeframe in which it has become widely adopted. There is, however, an interesting spread of subjects amongst the primary studies. Another interesting prevalence amongst the studies was that of the particular VA used as the subject of the research; of the papers that focused only on a particular VA, Amazon’s Alexa was the most popular as a subject.

In order to answer the research questions, each paper was read and the results were analysed. Each question is restated below, with a summary of key findings and a more in-depth precis of the studies to add context to the key findings.

4.1. RQ 1: What Are the Emerging Security and Privacy Concerns Surrounding the Use of VAs?

4.1.1. key findings.

While reviewing the papers, the following main findings were deduced:

  • Successful malicious attacks have been demonstrated using VAs as the target [ 15 , 18 , 19 , 20 , 24 ]. These attacks are becoming more sophisticated, and some of them use remote vectors. These attacks are exploring different ideas, not just one vector.
  • Personally identifiable information can be extracted from an unsecured VA with ease.
  • The GDPR appears to be of limited help in safeguarding users in its current form.

4.1.2. Discussion

From malicious attacks designed to impersonate a genuine skill or to bypass device authentication, to attacks designed to bypass VA device authentication, trends have emerged in both the security of VAs and the privacy of their users. Any attack that allows a malicious user to impersonate the user risks that user’s data falling into the wrong hands; attacks with a remote vector are of particular concern due to the comparative ease with which they could be launched without arousing the user’s suspicion. The cloud service platforms which power VAs store a lot of data and, should that data fall into the wrong hands, a serious privacy risk is exposed. The fact that two of the bigger vendors of VAs—Amazon and Google—have skill stores which allow the uploading of malicious applications deliberately designed to access a user’s data means that the user is unable to rely on the fact that the skill they downloaded and use is safe—a serious security concern.

The dolphin attack, as demonstrated by Zhang et al. [ 24 ], shows how Alexa can be manipulated by voice commands that are modulated to frequencies beyond the upper range of human hearing—an attack that requires planning, sophisticated equipment, and physical proximity to the VA device and therefore realistically poses a limited threat to the user. Turner et al. [ 18 ] showed that phoneme morphing could use audio of a source voice and transform it into an audio utterance that could unlock a device that used voice authentication. The original recording need not be that of the device user, which presents a security risk, but one that still relies on physical access to the VA device.

A man-in-the-middle attack called Lyexa was demonstrated in [ 19 ] by Mitev et al., in which a remote attacker uses a compromised IoT device in the user’s home, capable of emitting ultrasound signals, to ‘talk’ to the user’s VA. To further develop this idea from the dolphin attack [ 24 ], a malicious Alexa skill was used in tandem to both provide plausible feedback to the user from the VA to prevent the arousal of suspicion, and make this attack remote, thus increasing its threat potential. Kumar et al. [ 15 ] demonstrated a skill attack that is predicated on Alexa misinterpreting speech. It was shown that Alexa, in testing, correctly interpreted 68.9% of 572,319 words; 24 of these words were misinterpreted consistently, and when used by a malicious skill could be used to confuse genuine skills, thus providing a reliable, repeatable remote attack vector. In [ 27 ], Kennedy et al. demonstrated a particularly advanced form of an exploit that uses machine learning to derive patterns or ‘fingerprints’ and compares them with encrypted traffic between the VA and the server. Certain voice commands could be inferred from the encrypted traffic. This attack is a remote attack and consequently poses a serious security concern.

In conclusion, it was found that the VA is becoming the target of malicious attacks just as other connected computing devices have been in the past. These attacks show an interesting pattern: they are evolving. For any malicious attack to be effective and dangerous to the end user, it must be simple enough to be carried out by someone who has not made an extensive study of the VA’s internal architecture. Furthermore, an attack is made more dangerous by the lack of the need to be proximate to the device. Finally, any attack must be repeatable—if it only works once, in laboratory conditions for example, it poses little threat to the end user. A ready-coded, malicious skill could be exploited remotely by a threat actor with limited knowledge of computer science and it surely, at this point, cannot be long before these attacks are more commonplace.

Furey et al. [ 22 ] studied firstly how much personally identifiable information could be extracted from an Alexa device that had no authentication set. The authors then examined this in the context of GDPR, and how much leeway Amazon might have to offload their compliance responsibilities with carefully written user terms and conditions. Loideain et al. investigated how the female gendering of VAs might pose societal harm “insofar as they reproduce normative assumptions about the role of women as submissive and secondary to men” [ 26 ]. In both cases, the GDPR as it currently stands was found to be only partially successful in protecting VA users. The GDPR, designed expressly to protect the end user and their data, has been shown by two studies in this group to be of limited utility. A study of the GDPR itself or an analysis of the psychological repercussions of VA voice gendering are beyond the scope of this document. However, any flaws in GDPR are a particular concern, given the amount of data collected by VAs, and the increase in interest in exploiting vulnerabilities in VAs and their extensions in order to obtain these data by nefarious means.

4.2. RQ2: To What Degree Do Users’ Concerns Surrounding the Privacy and Security Aspects of VAs Affect Their Choice of VA and Their Behaviour around the Device?

4.2.1. key findings.

The review of the selected papers led to the following main findings:

  • Rationalising of security and privacy concerns is more prevalent among those who choose to use a VA; those who don’t use one cite privacy and trust issues as factors affecting their decision.
  • Conversely, amongst those who do choose to use a VA, privacy is the main factor in the acceptance of a particular model.
  • ‘Unwanted’ recordings—those made by the VA without the user uttering the wake word—occur in significant numbers.
  • Children see no difference between a connected toy and a VA designed for adult use.

4.2.2. Discussion

Lau et al. [ 17 ] found that worries differ between people who do and do not use a VA. Those who do not use an assistant, refusing to see the purpose of such a device, are more likely to be the subjects for whom privacy and trust are an issue. These users were “…deeply uncomfortable with the idea of a ‘microphone-based’ device that a speaker company, or an ‘other’ with malicious intent, could ostensibly use to listen in on their homes”. Amongst those who do adopt a VA, users rationalised their lack of concern regarding privacy with the belief that the VA company could be trusted with their data, or that there was no way another user could see their history. Burbach et al. considered the acceptance factors of different VAs amongst a control group of users; a choice-based conjoint analysis was used, having three attributes: natural language processing (NLP) performance, price, and privacy. Privacy was found to be the biggest concern of the three [ 14 ]. These findings appear to conflict with those presented by Lau et al. [ 21 ]; however, the construction of the surveys was different, as privacy was the primary goal of the study. Moreover, Burbach et al. [ 11 ] wrote their study a year later; a year in which several news stories broke in the media regarding privacy concerns of VAs, which may account for the apparent increase in concern over privacy.

Javed et al. [ 21 ] performed an in-depth study of what Alexa was recording. AlthoughAmazon claims that ‘she’ only listens when the wake-word is uttered by the user, their research found that among the control group of users, 91% had experienced an unwanted recording. This was investigated and it was found that benign sounds such as radio and TV and background noise, were recorded in the majority of these cases. Alarmingly, however, 29.2% of the study group reported that some of their unwanted recordings contained sensitive information, which presents a privacy breach. McReynolds et al. studied connected toys (Hello Barbie, Jibo) in conjunction with VAs to determine, amongst other questions, if children relate to ‘traditional’ smart assistants in the same way they do their toys [ 29 ]. A key finding was that having surveyed groups of parents and their children, VAs were used by children who interacted with them in the same way they might interact with a connected toy. VAs, however, are not designed for children and are not examined—at least in the US—for regulatory compliance in the same way connected toys are.

Although there has been an increase in user privacy concerns, there is still a group of users who have faith that the data companies are trustworthy; interestingly, a group of those users for whom privacy is a concern are still using a VA. The fact that privacy is a worry is evidently not sufficient to dissuade the user from having a VA in the house. It might be interesting to see if studies made over the coming years show the trend of privacy awareness continuing, especially in the light of the simple fact that users find VAs recording without their knowledge. Children relate to VAs as they would a toy with similar capabilities and, again, it would be of interest to see if this fact increased privacy concerns amongst parents who use an ‘adult’ VA.

4.3. RQ3: What Are the Security and Privacy Concerns Affecting First-Party and Third-Party Application Development for VA Software?

4.3.1. key findings.

The study of the selected papers led us to deduce the following main findings:

  • The processes that check third-party extensions submitted to the app stores of both Amazon and Google do a demonstrably poor job of ensuring that the apps properly authenticate from the third-party server to the Alexa/Google cloud.
  • Several novel methods of user authentication to the VA device have been proposed, each using a different secondary device to offer a form of two-factor authentication [ 16 , 23 , 31 ].
  • Each of the user authentication methods do go some way to mitigating the voice/replay attacks outlined in the findings of RQ1.

4.3.2. Discussion

Zhang et al. [ 14 ] presented the only study which examined security vetting processes used by the VA manufacturers; these procedures are put in place to ensure that developers of third-party VA extensions (‘skills’, ‘apps’) are ensuring that proper security is implemented in their code. As their research demonstrates, vulnerable extensions—voice squatting attacks, written by the authors to specifically target a genuine skill—have been approved by both Amazon and Google. Combined with the findings in RQ1, in which several VA attacks were identified that relied on malicious extensions, this finding represents a significant security risk. The authors went so far as to inform both Amazon and Google of their findings and have consequently met with both companies in order to help the organisations better understand the novel security risks.

Moving away from extension application development, three novel approaches that might suggest a better way in which VA companies might improve security for end-users have been proposed. Feng et al. [ 23 ] presented what they call ‘VAuth’, a method of ‘continuous’ authentication, in which a wearable device collects unique body surface vibrations emanating from the user and matches them with the voice signal heard by the VA. Wang et al. [ 31 ] proposed another wearable that might provide two-factor authentication. In this approach, termed ‘WearID’, however, the wearable in this instance captures unique vibration patterns not from the user’s body but from the vibration domain of the user’s voice. These are then used in tandem with existing device authentication.

Cheng et al. [ 16 ] suggested ‘acoustic tagging’, whereby a secondary device emits a unique acoustic signal, or ‘watermark’, which is heard in tandem with the user’s voice. The VA—registered to the user—may then accept or reject voice audio instructions accordingly. All three of these methods of authentication go some way towards mitigating malicious attacks, such as the dolphin attack demonstrated by Zhang et al. [ 24 ]. They also provide an extra layer of security for those users concerned about privacy by making it much harder for another user to access a VA without permission. However, they can be considered a form of two-factor authentication, as each of the studies proposes a method that requires extra hardware. Two studies [ 23 , 31 ] involved the use of wearables which might not always be practical for multiple users, as well as adding extra expense and complication for the user.

To conclude, there are worrying security considerations around VAs. Methods of two-factor authentication with an external device, although sophisticated, are cumbersome for users. Interestingly, there were no works at the time of our study on authenticating a user entirely based on their voice fingerprint. Given the lack of vetting in the major vendors’ application stores, which is itself a vulnerability open to exploitation, securing the VA is absolutely essential.

5. Open Research Challenges and Future Directions

According to the results of this study, it can be seen that VAs, like any other computing device, are vulnerable to malicious attacks. A number of vulnerabilities have been studied, and several attacks have been crafted that take advantage of flaws in the design of the VA itself and its software extensions. It has also been shown that VAs can mishear their wake words and make recordings without the user’s knowledge and, even when the user is aware, the VA vendor is recording and storing a large amount of personal information. Therefore, the security and privacy of VAs are still challenging and require further investigation. Three main future research directions are identified and discussed in the following sections.

5.1. GDPR and the Extent of Its Protections

Although an increase in users’ privacy awareness can be seen, among significant numbers of users there is still an alarming—almost blind—reliance on vendors such as Amazon and Google to ‘do the right thing’ and treat the user’s data responsibly and fairly in accordance with GDPR or other local data regulations. Future work might examine whether or not the vendors are fully complying with data law or whether they are adhering to it as little as possible in order to make their businesses more profitable. The work might also study whether or not regulations, such as GDPR, are offering as much protection to the end-user as they should and, if not, where they are failing and need improvement.

5.2. Forensics

Although studies on the forensic aspects of VAs have to date concentrated on finding as much information as possible both from the device and the cloud service platform, little work appears to have been carried out on examining exactly what is stored. Future work could look at how VAs interact with their cloud service providers, and how open the interfaces between the device and server are. Furthermore, it is not clear how much the user is (or can be) aware of what is being stored. This presents an interesting imbalance; while it is possible for the user to see certain data that are stored, the vendors’ ‘privacy dashboards’ through which this information can be gleaned are not telling the whole story. Future work might study this imbalance and find ways in which the user might become more aware of the extent of the data that are being taken from them, stored, and manipulated for the vendors’ profit.

5.3. Voice Authentication without External Device

As discussed in this paper, VA user authentication is a concern, as with any other service that collects user data. A VA collects substantial amounts of personal data, as demonstrated in the forensics-focussed works studied in this paper. Several novel methods for authenticating a user to their device were presented in the primary studies. However, each used an external device to provide a form of two-factor authentication, which makes the resultant solution cumbersome and complicated for the user. An interesting future research direction could address this challenge by focusing on biometric voice analysis as a means of authenticating the user, rather than relying on an external device.

6. Conclusions

In this paper, based on a systematic literature review on the security and privacy challenges of virtual assistants, several gaps in the current research landscape were identified. Research has been carried out on the themes of user concerns, the threat of malicious attack, and improving authentication. However, these studies do not take an overarching view of how these themes may interact, leading to a potential disconnect between these areas. A number of studies concentrated on user behaviour, identifying privacy and security concerns; however, they did not mention how these concerns might be addressed, except [ 33 ], in which a few suggestions were provided for privacy and security design, including improvements to muting, privacy default settings, and audio log features, as well as adding security layers to voice recognition and providing offline capabilities. In addition, it was found that when one particular VA was the focus of the study, Amazon’s Alexa was the assistant that was chosen in the majority of these papers. Given Amazon’s sales dominance in the smart speaker sector, this is perhaps understandable. There are, however, many more VA systems that might be going uninvestigated as a consequence.

The results from answering research question 1 in this study showed that increasingly sophisticated malicious attacks on VAs are being demonstrated, and yet user awareness of this specific and worrying trend appears not to have been studied in any great detail. The three research questions posited were answered as follows. (1) There were several emerging security and privacy concerns, (2) security and privacy concerns do affect users’ adoption of VAs and adoption of a particular model of VA, and (3) there are worrying concerns and security lapses in the way third party software is vetted by manufacturers. It would be interesting to investigate further how these areas converge, as the current research, although it is of great use in its own subject area, can have a narrow focus. It would be fascinating if knock-on effects to other areas could be further researched by broadening the focus areas investigated.

Acknowledgments

The authors are grateful to the Deanship of Scientific Research, King Saud University for funding through Vice Deanship of Scientific Research Chairs, and grant of PIFI 2020 (2020VBC0002), China.

Author Contributions

T.B.; investigation, writing—original draft preparation, T.D. and S.B.; writing—review and supervision, M.S.A.-R. and A.H.S.; writing—editing. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Short Research on Voice Control System Based on Artificial Intelligence Assistant

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Photo of a person's hands typing on a laptop.

AI-assisted writing is quietly booming in academic journals. Here’s why that’s OK

research paper on google assistant

Lecturer in Bioethics, Monash University & Honorary fellow, Melbourne Law School, Monash University

Disclosure statement

Julian Koplin does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

Monash University provides funding as a founding partner of The Conversation AU.

View all partners

If you search Google Scholar for the phrase “ as an AI language model ”, you’ll find plenty of AI research literature and also some rather suspicious results. For example, one paper on agricultural technology says:

As an AI language model, I don’t have direct access to current research articles or studies. However, I can provide you with an overview of some recent trends and advancements …

Obvious gaffes like this aren’t the only signs that researchers are increasingly turning to generative AI tools when writing up their research. A recent study examined the frequency of certain words in academic writing (such as “commendable”, “meticulously” and “intricate”), and found they became far more common after the launch of ChatGPT – so much so that 1% of all journal articles published in 2023 may have contained AI-generated text.

(Why do AI models overuse these words? There is speculation it’s because they are more common in English as spoken in Nigeria, where key elements of model training often occur.)

The aforementioned study also looks at preliminary data from 2024, which indicates that AI writing assistance is only becoming more common. Is this a crisis for modern scholarship, or a boon for academic productivity?

Who should take credit for AI writing?

Many people are worried by the use of AI in academic papers. Indeed, the practice has been described as “ contaminating ” scholarly literature.

Some argue that using AI output amounts to plagiarism. If your ideas are copy-pasted from ChatGPT, it is questionable whether you really deserve credit for them.

But there are important differences between “plagiarising” text authored by humans and text authored by AI. Those who plagiarise humans’ work receive credit for ideas that ought to have gone to the original author.

By contrast, it is debatable whether AI systems like ChatGPT can have ideas, let alone deserve credit for them. An AI tool is more like your phone’s autocomplete function than a human researcher.

The question of bias

Another worry is that AI outputs might be biased in ways that could seep into the scholarly record. Infamously, older language models tended to portray people who are female, black and/or gay in distinctly unflattering ways, compared with people who are male, white and/or straight.

This kind of bias is less pronounced in the current version of ChatGPT.

However, other studies have found a different kind of bias in ChatGPT and other large language models : a tendency to reflect a left-liberal political ideology.

Any such bias could subtly distort scholarly writing produced using these tools.

The hallucination problem

The most serious worry relates to a well-known limitation of generative AI systems: that they often make serious mistakes.

For example, when I asked ChatGPT-4 to generate an ASCII image of a mushroom, it provided me with the following output.

It then confidently told me I could use this image of a “mushroom” for my own purposes.

These kinds of overconfident mistakes have been referred to as “ AI hallucinations ” and “ AI bullshit ”. While it is easy to spot that the above ASCII image looks nothing like a mushroom (and quite a bit like a snail), it may be much harder to identify any mistakes ChatGPT makes when surveying scientific literature or describing the state of a philosophical debate.

Unlike (most) humans, AI systems are fundamentally unconcerned with the truth of what they say. If used carelessly, their hallucinations could corrupt the scholarly record.

Should AI-produced text be banned?

One response to the rise of text generators has been to ban them outright. For example, Science – one of the world’s most influential academic journals – disallows any use of AI-generated text .

I see two problems with this approach.

The first problem is a practical one: current tools for detecting AI-generated text are highly unreliable. This includes the detector created by ChatGPT’s own developers, which was taken offline after it was found to have only a 26% accuracy rate (and a 9% false positive rate ). Humans also make mistakes when assessing whether something was written by AI.

It is also possible to circumvent AI text detectors. Online communities are actively exploring how to prompt ChatGPT in ways that allow the user to evade detection. Human users can also superficially rewrite AI outputs, effectively scrubbing away the traces of AI (like its overuse of the words “commendable”, “meticulously” and “intricate”).

The second problem is that banning generative AI outright prevents us from realising these technologies’ benefits. Used well, generative AI can boost academic productivity by streamlining the writing process. In this way, it could help further human knowledge. Ideally, we should try to reap these benefits while avoiding the problems.

The problem is poor quality control, not AI

The most serious problem with AI is the risk of introducing unnoticed errors, leading to sloppy scholarship. Instead of banning AI, we should try to ensure that mistaken, implausible or biased claims cannot make it onto the academic record.

After all, humans can also produce writing with serious errors, and mechanisms such as peer review often fail to prevent its publication.

We need to get better at ensuring academic papers are free from serious mistakes, regardless of whether these mistakes are caused by careless use of AI or sloppy human scholarship. Not only is this more achievable than policing AI usage, it will improve the standards of academic research as a whole.

This would be (as ChatGPT might say) a commendable and meticulously intricate solution.

  • Artificial intelligence (AI)
  • Academic journals
  • Academic publishing
  • Hallucinations
  • Scholarly publishing
  • Academic writing
  • Large language models
  • Generative AI

research paper on google assistant

Lecturer / Senior Lecturer - Marketing

research paper on google assistant

Communications and Engagement Officer, Corporate Finance Property and Sustainability

research paper on google assistant

Assistant Editor - 1 year cadetship

research paper on google assistant

Executive Dean, Faculty of Health

research paper on google assistant

Lecturer/Senior Lecturer, Earth System Science (School of Science)

Web publishers brace for carnage as Google adds AI answers

The tech giant is rolling out AI-generated answers that displace links to human-written websites, threatening millions of creators

Kimber Matherne’s thriving food blog draws millions of visitors each month searching for last-minute dinner ideas.

But the mother of three says decisions made at Google, more than 2,000 miles from her home in the Florida panhandle, are threatening her business. About 40 percent of visits to her blog, Easy Family Recipes , come through the search engine, which has for more than two decades served as the clearinghouse of the internet, sending users to hundreds of millions of websites each day.

research paper on google assistant

Podcast episode

As the tech giant gears up for Google I/O, its annual developer conference, this week, creators like Matherne are worried about the expanding reach of its new search tool that incorporates artificial intelligence. The product, dubbed “Search Generative Experience,” or SGE, directly answers queries with complex, multi-paragraph replies that push links to other websites further down the page, where they’re less likely to be seen.

The shift stands to shake the very foundations of the web.

The rollout threatens the survival of the millions of creators and publishers who rely on the service for traffic. Some experts argue the addition of AI will boost the tech giant’s already tight grip on the internet, ultimately ushering in a system where information is provided by just a handful of large companies.

“Their goal is to make it as easy as possible for people to find the information they want,” Matherne said. “But if you cut out the people who are the lifeblood of creating that information — that have the real human connection to it — then that’s a disservice to the world.”

Google calls its AI answers “overviews” but they often just paraphrase directly from websites. One search for how to fix a leaky toilet provided an AI answer with several tips, including tightening tank bolts. At the bottom of the answer, Google linked to The Spruce, a home improvement and gardening website owned by web publisher Dotdash Meredith, which also owns Investopedia and Travel + Leisure. Google’s AI tips lifted a phrase from The Spruce’s article word-for-word.

A spokesperson for Dotdash Meredith declined to comment.

The links Google provides are often half-covered, requiring a user to click to expand the box to see them all. It’s unclear which of the claims made by the AI come from which link.

Tech research firm Gartner predicts traffic to the web from search engines will fall 25 percent by 2026. Ross Hudgens, CEO of search engine optimization consultancy Siege Media, said he estimates at least a 10 to 20 percent hit, and more for some publishers. “Some people are going to just get bludgeoned,” he said.

Raptive, which provides digital media, audience and advertising services to about 5,000 websites, including Easy Family Recipes, estimates changes to search could result in about $2 billion in losses to creators — with some websites losing up to two-thirds of their traffic. Raptive arrived at these figures by analyzing thousands of keywords that feed into its network, and conducting a side-by-side comparison of traditional Google search and the pilot version of Google SGE.

Michael Sanchez, the co-founder and CEO of Raptive, says that the changes coming to Google could “deliver tremendous damage” to the internet as we know it. “What was already not a level playing field … could tip its way to where the open internet starts to become in danger of surviving for the long term,” he said.

When Google’s chief executive Sundar Pichai announced the broader rollout during an earnings call last month, he said the company is making the change in a “measured” way, while “also prioritizing traffic to websites and merchants.” Company executives have long argued that Google needs a healthy web to give people a reason to use its service, and doesn’t want to hurt publishers. A Google spokesperson declined to comment further.

“I think we got to see an incredible blossoming of the internet, we got to see something that was really open and freewheeling and wild and very exciting for the whole world,” said Selena Deckelmann, the chief product and technology officer for Wikimedia, the foundation that oversees Wikipedia.

“Now, we’re just in this moment where I think that the profits are driving people in a direction that I’m not sure makes a ton of sense,” Deckelmann said. “This is a moment to take stock of that and say, ‘What is the internet we actually want?’”

People who rely on the web to make a living are worried.

Jake Boly, a strength coach based in Austin, has spent three years building up his website of workout shoe reviews. But last year, his traffic from Google dropped 96 percent. Google still seems to find value in his work, citing his page on AI-generated answers about shoes. The problem is, people read Google’s summary and don’t visit his site anymore, Boly said.

“My content is good enough to scrape and summarize,” he said. “But it’s not good enough to show in your normal search results, which is how I make money and stay afloat.”

Google first said it would begin experimenting with generative AI in search last year, several months after OpenAI released ChatGPT. At the time, tech pundits speculated that AI chatbots could replace Google search as the place to find information. Satya Nadella, the CEO of Google’s biggest competitor, Microsoft, added an AI chatbot to his company’s search engine and in February 2023 goaded Google to “ come out and show that they can dance .”

The search giant started dancing. Though it had invented much of the AI technology enabling chatbots and had used it to power tools like Google Translate, it started putting generative AI tech into its other products. Google Docs, YouTube’s video-editing tools and the company’s voice assistant all got AI upgrades.

But search is Google’s most important product, accounting for about 57 percent of its $80 billion in revenue in the first quarter of this year. Over the years, search ads have been the cash cow Google needed to build its other businesses, like YouTube and cloud storage, and to stay competitive by buying up other companies .

Google has largely avoided AI answers for the moneymaking searches that host ads, said Andy Taylor, vice president of research at internet marketing firm Tinuiti.

When it does show an AI answer on “commercial” searches, it shows up below the row of advertisements. That could force websites to buy ads just to maintain their position at the top of search results.

Google has been testing the AI answers publicly for the past year, showing them to a small percentage of its billions of users as it tries to improve the technology.

Still, it routinely makes mistakes. A review by The Washington Post published in April found that Google’s AI answers were long-winded, sometimes misunderstood the question and made up fake answers.

The bar for success is high. While OpenAI’s ChatGPT is a novel product, consumers have spent years with Google and expect search results to be fast and accurate. The rush into generative AI might also run up against legal problems. The underlying tech behind OpenAI, Google, Meta and Microsoft’s AI was trained on millions of news articles, blog posts, e-books, recipes, social media comments and Wikipedia pages that were scraped from the internet without paying or asking permission of their original authors.

OpenAI and Microsoft have faced a string of lawsuits over alleged theft of copyrighted works .

“If journalists did that to each other, we’d call that plagiarism,” said Frank Pine, the executive editor of MediaNews Group, which publishes dozens of newspapers around the United States, including the Denver Post, San Jose Mercury News and the Boston Herald. Several of the company’s papers sued OpenAI and Microsoft in April, alleging the companies used its news articles to train their AI.

If news organizations let tech companies, including Google, use their content to make AI summaries without payment or permission, it would be “calamitous” for the journalism industry, Pine said. The change could have an even bigger effect on newspapers than the loss of their classifieds businesses in the mid-2000s or Meta’s more recent pivot away from promoting news to its users, he said.

The move to AI answers, and the centralization of the web into a few portals isn’t slowing down. OpenAI has signed deals with web publishers — including Dotdash Meredith — to show their content prominently in its chatbot.

Matherne, of Easy Family Recipes, says she’s bracing for the changes by investing in social media channels and email newsletters.

“The internet’s kind of a scary place right now,” Matherne said. “You don’t know what to expect.”

A previous version of this story said MediaNews Group sued OpenAI and Microsoft. In fact, it was several of the company's newspapers that sued the tech companies. This story has been corrected.

research paper on google assistant

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Welcome to the Purdue Online Writing Lab

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out-of-class instruction.

The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives. The Purdue OWL offers global support through online reference materials and services.

A Message From the Assistant Director of Content Development 

The Purdue OWL® is committed to supporting  students, instructors, and writers by offering a wide range of resources that are developed and revised with them in mind. To do this, the OWL team is always exploring possibilties for a better design, allowing accessibility and user experience to guide our process. As the OWL undergoes some changes, we welcome your feedback and suggestions by email at any time.

Please don't hesitate to contact us via our contact page  if you have any questions or comments.

All the best,

Social Media

Facebook twitter.

Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa

  • Conference paper
  • First Online: 05 January 2019
  • Cite this conference paper

research paper on google assistant

  • Amrita S. Tulshan 15 &
  • Sudhir Namdeorao Dhage 15  

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 968))

Included in the following conference series:

  • International Symposium on Signal Processing and Intelligent Recognition Systems

4210 Accesses

50 Citations

3 Altmetric

Virtual assistant is boon for everyone in this new era of 21st century. It has paved way for a new technology where we can ask questions to machine and can interact with IVAs as people do with humans. This new technology attracted almost whole world in many ways like smart phones, laptops, computers etc. Some of the significant VPs are like Siri, Google Assistant, Cortana, and Alexa. Voice recognition, contextual understanding and human interaction are the issues which are not solved yet in this IVAs. So, to solve those issues 100 users participated a survey for this research and shared their experiences. All users’ task was to ask questions from the survey to all personal assistants and from their experiences this research paper came up with the actual results. According to that results many services were covered by these assistants but still there are some improvements required in voice recognition, contextual understanding and hand free interaction. After addressing these improvements in IVAs will definitely increased its use is the main goal for this research paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gong, L.: San Francisco, CA (US) United States US 2003.01671.67A1 (12) Patent Application Publication c (10) Pub. No.: US 2003/0167167 A1 Gong (43) Pub. Date: 4 September 2003 for Intelligent Virtual Assistant

Google Scholar  

Sarikaya, R.: The technology behind personal digital assistants. IEEE Signal Process. Mag. 34 , 67–81 (2017). https://doi.org/10.1109/msp.2016.2617341

Article   Google Scholar  

Tsiao, J.C.-S., Tong, P.P., Chao, D.Y.: Natural-Language Voice-Activated Personal Assistant, United States Patent (10), Patent No.: US 7,216,080 B2 (45), 8 May 2007

Sirbi, K., Patankar, A.J.: Personal assistant with voice recognition intelligence. Int. J. Eng. Res. Technol. 10 (1), 416–419 (2017). ISSN 0974-3154

Kawamura, T., Ohsuga, A.: Flower voice: virtual assistant for open data

Elshafei, M.: Virtual personal assistant (VPA) for mobile users. Mitel Networks (2000–2002)

Chung, H., Iorga, M., Voas, J., Lee, S.: Alexa, can I trust you? In: 2017 IEEE Computer Security (2017)

Cowan, B.R.: What can i help you with?: infrequent users’ experiences of intelligent personal assistants. In: 2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015, Sri Lanka (2015)

Weeratunga, A.M., Jayawardana, S.A.U., Hasindu, P.M.A.K, Prashan, W.P.M., Thelijjagoda, S.: Project Nethra - an intelligent assistant for the visually disabled to interact with internet services. In: 2015 IEEE 10th International Conference on Industrial and Information System (2015)

López, G., Quesada, L., Guerrero, L.A.: Alexa vs. siri vs. cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I. (ed.) AHFE 2017. AISC, vol. 592, pp. 241–250. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60366-7_23

Chapter   Google Scholar  

Zhao, Y., Li, J., Zhang, S., Chen, L., Gong, Y.: Domain and speaker adaptation for Cortana speech recognition. In: ICASSP

Bellegarda, J.R.: Spoken language understanding for natural interaction: the siri experience. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds.) Natural Interaction with Robots, Knowbots and Smartphones, pp. 3–14. Springer, New York (2014). https://doi.org/10.1007/978-1-4614-8280-2_1

Google: Google Assistant. https://assisatnt.google.com

Purington, A., Taft, J.G., Sannon, S., Bazarova, N.N., Taylor, S.H.: Alexa is my new BFF: social roles, user satisfaction, and personification of the amazon echo. ACM, 6–11 May 2017. ISBN 978-1-4503-4656-6/17/05

Lopez, G., Quesada, L., Guerrero, L.A.: Alexa vs Siri vs Cortana vs Google assistant: a comparison of speech-based natural user interfaces. Conference Paper, January 2018

Kepuska, V., Bohouta, G.: Next generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In: IEEE Conference (2018)

Download references

Author information

Authors and affiliations.

Department of Computer Engineering, Sardar Patel Institute of Technology, Mumbai, 400058, India

Amrita S. Tulshan & Sudhir Namdeorao Dhage

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Amrita S. Tulshan .

Editor information

Editors and affiliations.

Indian Institute of Information Technology and Management, Kerala, India

Sabu M. Thampi

Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA

Oge Marques

Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada

Sri Krishnan

Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan

Kuan-Ching Li

University of Naples Federico II, Naples, Italy

Domenico Ciuonzo

Electrical Engineering Department, Indian Institute of Technology Patna, Patna, India

Maheshkumar H. Kolekar

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Tulshan, A.S., Dhage, S.N. (2019). Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa. In: Thampi, S., Marques, O., Krishnan, S., Li, KC., Ciuonzo, D., Kolekar, M. (eds) Advances in Signal Processing and Intelligent Recognition Systems. SIRS 2018. Communications in Computer and Information Science, vol 968. Springer, Singapore. https://doi.org/10.1007/978-981-13-5758-9_17

Download citation

DOI : https://doi.org/10.1007/978-981-13-5758-9_17

Published : 05 January 2019

Publisher Name : Springer, Singapore

Print ISBN : 978-981-13-5757-2

Online ISBN : 978-981-13-5758-9

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

share this!

May 13, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

written by researcher(s)

AI-assisted writing is quietly booming in academic journals—here's why that's OK

by Julian Koplin, The Conversation

AI-assisted writing is quietly booming in academic journals—here's why that's OK

If you search Google Scholar for the phrase " as an AI language model ," you'll find plenty of AI research literature and also some rather suspicious results. For example, one paper on agricultural technology says,

"As an AI language model, I don't have direct access to current research articles or studies. However, I can provide you with an overview of some recent trends and advancements …"

Obvious gaffes like this aren't the only signs that researchers are increasingly turning to generative AI tools when writing up their research. A recent study examined the frequency of certain words in academic writing (such as "commendable," "meticulously" and "intricate"), and found they became far more common after the launch of ChatGPT—so much so that 1% of all journal articles published in 2023 may have contained AI-generated text.

(Why do AI models overuse these words? There is speculation it's because they are more common in English as spoken in Nigeria, where key elements of model training often occur.)

The aforementioned study also looks at preliminary data from 2024, which indicates that AI writing assistance is only becoming more common. Is this a crisis for modern scholarship, or a boon for academic productivity?

Who should take credit for AI writing?

Many people are worried by the use of AI in academic papers. Indeed, the practice has been described as " contaminating " scholarly literature.

Some argue that using AI output amounts to plagiarism. If your ideas are copy-pasted from ChatGPT, it is questionable whether you really deserve credit for them.

But there are important differences between "plagiarizing" text authored by humans and text authored by AI. Those who plagiarize humans' work receive credit for ideas that ought to have gone to the original author.

By contrast, it is debatable whether AI systems like ChatGPT can have ideas, let alone deserve credit for them. An AI tool is more like your phone's autocomplete function than a human researcher.

The question of bias

Another worry is that AI outputs might be biased in ways that could seep into the scholarly record. Infamously, older language models tended to portray people who are female, black and/or gay in distinctly unflattering ways, compared with people who are male, white and/or straight.

This kind of bias is less pronounced in the current version of ChatGPT.

However, other studies have found a different kind of bias in ChatGPT and other large language models : a tendency to reflect a left-liberal political ideology.

Any such bias could subtly distort scholarly writing produced using these tools.

The hallucination problem

The most serious worry relates to a well-known limitation of generative AI systems: that they often make serious mistakes.

For example, when I asked ChatGPT-4 to generate an ASCII image of a mushroom, it provided me with the following output.

AI-assisted writing is quietly booming in academic journals—here's why that's OK

It then confidently told me I could use this image of a "mushroom" for my own purposes.

These kinds of overconfident mistakes have been referred to as "AI hallucinations" and " AI bullshit ." While it is easy to spot that the above ASCII image looks nothing like a mushroom (and quite a bit like a snail), it may be much harder to identify any mistakes ChatGPT makes when surveying scientific literature or describing the state of a philosophical debate.

Unlike (most) humans, AI systems are fundamentally unconcerned with the truth of what they say. If used carelessly, their hallucinations could corrupt the scholarly record.

Should AI-produced text be banned?

One response to the rise of text generators has been to ban them outright. For example, Science—one of the world's most influential academic journals—disallows any use of AI-generated text .

I see two problems with this approach.

The first problem is a practical one: current tools for detecting AI-generated text are highly unreliable. This includes the detector created by ChatGPT's own developers, which was taken offline after it was found to have only a 26% accuracy rate (and a 9% false positive rate ). Humans also make mistakes when assessing whether something was written by AI.

It is also possible to circumvent AI text detectors. Online communities are actively exploring how to prompt ChatGPT in ways that allow the user to evade detection. Human users can also superficially rewrite AI outputs, effectively scrubbing away the traces of AI (like its overuse of the words "commendable," "meticulously" and "intricate").

The second problem is that banning generative AI outright prevents us from realizing these technologies' benefits. Used well, generative AI can boost academic productivity by streamlining the writing process. In this way, it could help further human knowledge. Ideally, we should try to reap these benefits while avoiding the problems.

The problem is poor quality control, not AI

The most serious problem with AI is the risk of introducing unnoticed errors, leading to sloppy scholarship. Instead of banning AI, we should try to ensure that mistaken, implausible or biased claims cannot make it onto the academic record.

After all, humans can also produce writing with serious errors, and mechanisms such as peer review often fail to prevent its publication.

We need to get better at ensuring academic papers are free from serious mistakes, regardless of whether these mistakes are caused by careless use of AI or sloppy human scholarship. Not only is this more achievable than policing AI usage, it will improve the standards of academic research as a whole.

This would be (as ChatGPT might say) a commendable and meticulously intricate solution.

Provided by The Conversation

Explore further

Feedback to editors

research paper on google assistant

Machine learning model uncovers new drug design opportunities

2 hours ago

research paper on google assistant

Astronomers find the biggest known batch of planet ingredients swirling around young star

research paper on google assistant

How 'glowing' plants could help scientists predict flash drought

research paper on google assistant

New GPS-based method can measure daily ice loss in Greenland

research paper on google assistant

New candidate genes for human male infertility found by analyzing gorillas' unusual reproductive system

3 hours ago

research paper on google assistant

Study uncovers technologies that could unveil energy-efficient information processing and sophisticated data security

4 hours ago

research paper on google assistant

Scientists develop an affordable sensor for lead contamination

research paper on google assistant

Chemists succeed in synthesizing a molecule first predicted 20 years ago

research paper on google assistant

New optical tweezers can trap large and irregularly shaped particles

research paper on google assistant

An easy pill to swallow—new 3D printing research paves way for personalized medication

5 hours ago

Relevant PhysicsForums posts

Is "college algebra" really just high school "algebra ii", physics education is 60 years out of date.

21 hours ago

Plagiarism & ChatGPT: Is Cheating with AI the New Normal?

Physics instructor minimum education to teach community college.

May 11, 2024

Studying "Useful" vs. "Useless" Stuff in School

Apr 30, 2024

Why are Physicists so informal with mathematics?

Apr 29, 2024

More from STEM Educators and Teaching

Related Stories

research paper on google assistant

AI-generated academic science writing can be identified with over 99% accuracy

Jun 7, 2023

research paper on google assistant

ChatGPT maker fields tool for spotting AI-written text

Feb 1, 2023

research paper on google assistant

Is the genie out of the bottle? Can you trust ChatGPT in scientific writing?

Oct 19, 2023

research paper on google assistant

What is ChatGPT: Here's what you need to know

Feb 16, 2023

research paper on google assistant

Tool detects AI-generated text in science journals

Nov 7, 2023

research paper on google assistant

Could artificial intelligence help or hurt medical research articles?

Feb 6, 2024

Recommended for you

research paper on google assistant

Investigation reveals varied impact of preschool programs on long-term school success

May 2, 2024

research paper on google assistant

Training of brain processes makes reading more efficient

Apr 18, 2024

research paper on google assistant

Researchers find lower grades given to students with surnames that come later in alphabetical order

Apr 17, 2024

research paper on google assistant

Earth, the sun and a bike wheel: Why your high-school textbook was wrong about the shape of Earth's orbit

Apr 8, 2024

research paper on google assistant

Touchibo, a robot that fosters inclusion in education through touch

Apr 5, 2024

research paper on google assistant

More than money, family and community bonds prep teens for college success: Study

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

IMAGES

  1. Google Assistant is the most accurate digital assistant of 2018: Study

    research paper on google assistant

  2. Formatting Your Research Paper on Google Docs

    research paper on google assistant

  3. How to start your research paper on google documents

    research paper on google assistant

  4. How to use and find Research Papers on Google Scholar? 10 Tips for Mastering Google Scholar

    research paper on google assistant

  5. A Complete Guide to Google Assistant

    research paper on google assistant

  6. Google Research Paper

    research paper on google assistant

VIDEO

  1. How to start your research paper on google documents

  2. How to cite or reference with Google Scholars (APA, Harvard , Vancouver, MLA and etc style)

  3. How to find and read a scientific paper

  4. How to write an impactful research paper

  5. How to find unlocked research papers for free

  6. How to write an impactful research paper- Part 2

COMMENTS

  1. (PDF) Google Assistant

    Intelligent Virtual Assistant (IVA) is "an. software tha t e xploitation information, for example, the operator's voice… and ration al data to give he lp. by noticing inquiries in usual ...

  2. Ok Google: Using virtual assistants for data collection in

    Because of the increasing popularity of voice-controlled virtual assistants, such as Amazon's Alexa and Google Assistant, they should be considered a new medium for psychological and behavioral research. We developed Survey Mate, an extension of Google Assistant, and conducted two studies to analyze the reliability and validity of data collected through this medium. In the first study, we ...

  3. PDF An overview of Bard: an early experiment with generative AI

    An overview of Bard: an early experiment with generative AI James Manyika, SVP, Research, Technology and Society, and Sissie Hsiao, Vice President and General Manager, Google Assistant and Bard Editor's note: This is a living document and will be updated periodically as we continue to rapidly improve Bard's capabilities as well as

  4. Ok Google: Using virtual assistants for data collection in

    For instance, in many African countries, people do not own a computer but a smartphone (Pew Research Center, 2019), and surveys could be rolled out in multiple languages using the often preinstalled Google Assistant. Behavioral scientists routinely draw broad claims from Western, educated, industrialized, rich, and democratic (WEIRD) samples.

  5. Intelligent personal assistants: A systematic literature review

    1. Introduction. The communication with devices using the voice is nowadays a common task for many people. Intelligent Personal Assistants (IPA), such as Amazon Alexa, 1 Microsoft Cortana, 2 Google Assistant, 3 or Apple Siri, 4 allow people to search for various subjects, schedule a meeting, or to make a call from their car or house hands-free, no longer needing to hold any mobile devices.

  6. Proceedings

    Natural user interfaces are becoming popular. One of the most common natural user interfaces nowadays are voice activated interfaces, particularly smart personal assistants such as Google Assistant, Alexa, Cortana, and Siri. This paper presents the results of an evaluation of these four smart personal assistants in two dimensions: the correctness of their answers and how natural the responses ...

  7. Forensic Investigation of Google Assistant

    In this paper, we discussed the forensic analysis of Google Assistant, a virtual assistant developed by Google and primarily available on mobile and smart home IoT devices. We showed client-centric forensic artifacts stored in the main opa_history SQLite database on Android smartphones which contain all local copies of voice conversations ...

  8. Speech Processing

    Our goal in Speech Technology Research is twofold: to make speaking to devices around you (home, in car), devices you wear (watch), devices with you (phone, tablet) ubiquitous and seamless. Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and ...

  9. Humanizing voice assistant: The impact of voice assistant personality

    According to National Public Radio and Edison Research, 21% of Americans (53-million people) own smart speakers, growing quickly from the 14-million people who owned their first smart speakers in 2018. Huffman, Vice President of Google Assistant, announced that Google Assistant mobile application has been downloaded to 500-million devices.

  10. On the Security and Privacy Challenges of Virtual Assistants

    1.1. Background. The most popular VAs on the market are Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant []; these assistants, often found in portable devices such as smartphones or tablets, can each be considered a 'speech-based natural user interface' (NUI) []; a system that can be operated by a user via intuitive, natural behaviour, i.e., voice instructions.

  11. IoT-Enabled Intelligent Home Using Google Assistant

    This article proposes IoT-enabled smart home using Google Assistant. Internet of things (IoT) is the emerging technology used to correlate the computing devices by trans-receiving data using Internet. Smart home system ensures our individual home appliances get controlled by voice commands without actually turning on or off.

  12. Automation of Smart Home using Smart Phone via Google Assistant

    The paper uses the voice-controlled home automation concept into practice with the help of Google Assistant for voice recognition and control. The purpose of Google Assistant-controlled Home Automation is to provide voice control of household appliances. The NodeMCU (FS P32) microcontroller is utilised, and Wi-Fi is used for communication ...

  13. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.

  14. Short Research on Voice Control System Based on Artificial Intelligence

    This paper proposes a voice control system based on artificial intelligence (AI) assistant. The AI assistant system using Google Assistant, a representative service of open API artificial intelligence, and the conditional auto-run system, IFTTT(IF This, Then That) was designed. It cost-effectively implemented the system using Raspberry Pi, voice recognition module, and open software. The ...

  15. (PDF) Implementation of Seamless Assistance with Google Assistant

    the elasticity and flexibility of cloud computing enables [18]Google Assistant to serve users anytime, anywhere, whether on mobile devices or smart home devices. By combining with cloud computing ...

  16. Artificial Intelligence and Virtual Assistant—Working Model

    In twenty-first-century virtual assistant is playing a very crucial role in day to day activities of human. According to the survey report of Clutch in 2019, 27% of the people are using the AI-powered virtual assistant such as: Google Assistant, Amazon Alexa, Cortana, Apple Siri, etc., for performing a simple task, people are using virtual assistant designed with natural language processing.

  17. PDF JARVIS

    International Journal of Science and Research (IJSR) ISSN: 2319-7064 SJIF (2022): 7.942 Volume 11 Issue 5, May 2022 ... Voice Assistant, NLP, Neural Network, Google Search. 1. Introduction . AI voice assistant, also known as a virtual or digital ... continuous in research papers since 2000, except the year 2010 (Figure .

  18. AI-assisted writing is quietly booming in academic journals. Here's why

    If you search Google Scholar for the phrase "as an AI language model", you'll find plenty of AI research literature and also some rather suspicious results. For example, one paper on ...

  19. Hello GPT-4o

    Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio.

  20. As Google AI search rolls out to more people, websites brace for

    The tech giant is rolling out AI-generated answers that displace links to human-written websites, threatening millions of creators. By Gerrit De Vynck. and. Cat Zakrzewski. Updated May 13, 2024 at ...

  21. Welcome to the Purdue Online Writing Lab

    Mission. The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives.

  22. Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa

    This paper presents a usability of four Virtual assistant voic-baesd and contextual text (Google assistant, Coratan, Siri, Alexa) . Cortana can likewise read your messages, track your area, watch your perusing history, check your contact list, watch out for your date-book, and set up this information together to propose valuable data, on the ...

  23. AI-assisted writing is quietly booming in academic journals—here's why

    Many people are worried by the use of AI in academic papers. Indeed, the practice has been described as "contaminating" scholarly literature. Some argue that using AI output amounts to plagiarism ...