SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Inner Speech

Inner speech is known as the “little voice in the head” or “thinking in words.” It attracts philosophical attention in part because it is a phenomenon where several topics of perennial interest intersect: language, consciousness, thought, imagery, communication, imagination, and self-knowledge all appear to connect in some way or other to the little voice in the head. Specific questions about inner speech that have exercised philosophers include its similarities to, and differences from, outer speech; its relationship to reasoning and conceptual thought; its broader cognitive roles—especially within metacognition and self-knowledge; and the role it can play in explanations of auditory verbal hallucinations and “thought insertion”.

A more formal characterization of inner speech (yet one that still aims at theoretical neutrality) is to say that inner speech is a mental phenomenon that is both keyed to a natural language and often available to introspection . To say that inner speech is “available to introspection” is to say that each person has an introspective way of knowing about their own inner speech episodes that others lack (Schwitzgebel 2010 [2019]); our access to our own inner speech is—at least often—comparable to our access to others of our conscious mental episodes. To say that inner speech is “keyed to” a natural language is to say that it either occurs in a natural language (like words spoken aloud) or represents words of a natural language (like an audio recording of a speech), or that it does both. In specifying that the language in question is a “natural” language, we mean to include any language one may acquire through learning—such as English, Japanese, or American Sign Language—and to exclude any innate mental languages that may exist (such as a Fodorian [1975] Mentalese, or other innate “language of thought”). This characterization leaves open several questions of controversy including: (1) whether all inner speech is available to introspection; (2) whether inner speech is literally a form of speech, a form of thought, or both, or neither; and (3) whether inner speech occurs in a natural language, or represents items of a natural language, or both.

Inner speech is a subject of study in many distinct disciplines, including neuroscience, speech pathology, developmental psychology, psychiatry, computer science, and linguistics, as well as philosophy. For this reason, there are a variety of distinct theoretical tools and concepts one might use to describe its nature and cognitive roles, with correspondingly distinct aims, methods, and literatures. We focus here on the accounts contemporary philosophers have given of its nature and on the explanatory purposes to which inner speech is most commonly put in philosophical work. Nevertheless, much of the contemporary philosophical work on inner speech is itself interdisciplinary in nature and aims to be consistent with, and informed by, results in allied disciplines, including, especially, experimental psychology, linguistics, and neuroscience. We discuss those sources where relevant to the issues that have exercised philosophers, while directing readers most interested in the empirical work to other reviews, such as Alderson-Day & Fernyhough (2015), Langland-Hassan (2021), and Perrone-Bertolotti et al. (2014). For another philosophically-oriented review, see Vicente & Martínez-Manrique (2011). Also, we focus on inner speech linked to the auditory modality, as opposed to inner speech that may occur in a gestural or visual modality (as may be the case with gestural sign languages), because nearly all of the existing research on inner speech concerns a phenomenon that is in some way linked to audition.

Finally, the phenomenon of speaking to oneself audibly is usually referred to as “private speech”. Some parts of the discussion in this entry are easily transferable to private speech, e.g., the matter of whether we can perform speech acts in inner speech. Others are not. For example, the question of whether inner speech, as a mental phenomenon, is actually a kind of speech has no counterpart in the context of private speech, as private speech is uncontroversially a kind of speech. It is typically easy to determine whether a question about inner speech also applies to private speech, so we will not comment on this further.

1. Inner Speech as Actual Speech

2.1 inner speech and thought expression, 2.2 inner speech and thought facilitation, 2.3 inner speech as thought, 3.1 the phonological content view, 3.2 the semantic content view, 3.3.1 single-state mixed contents views, 3.3.2 multiple-state mixed contents views, 4.1 inner speech and speech acts, 4.2 inner speech and conversation, 5.1 metacognitive approaches, 5.2 inferentialist approaches, 5.3 inferentialism’s critics, 6. auditory verbal hallucinations and inserted thoughts, other internet resources, related entries.

The auditory-sensory character of inner speech is usually thought to be due to its involvement of auditory-verbal imagery (for an exception, see O’Brien 2013). Mental images (in any modality) are generally viewed as representations of particular things (or kinds of thing), not instances of those things. A visual image of a duck, for example, is a representation of a duck, not an actual duck. Likewise, it may seem that inner speech, insofar as it involves auditory imagery, is a representation of speech (and of its sounds, in particular), not actual speech. “Grass is green”, produced in inner speech, would then represent an utterance of the sentence, “Grass is green”, but it would not actually be an utterance of that sentence.

Notwithstanding this, many philosophers working on inner speech hold that inner speech really is a kind of speech. When we produce inner speech, we are literally speaking, albeit silently. We will call this view the “actual speech view”. Proponents include Carruthers (1996), Martínez-Manrique & Vicente (2010, 2015), Gauker (2011, 2018), O’Brien (2013), Jorba & Vicente (2014), Gerrans (2015), Gregory (2016, 2018) (though Gregory has indicated in more recent work (e.g., Gregory forthcoming) that he no longer holds the view), Machery (2018), Wilkinson & Fernyhough (2018), Wilkinson (2020), and Frankfort (2022). Martínez-Manrique & Vicente argued for the view in their 2010 paper; their 2015 paper, discussed in Section 3.3.2 , sets out an updated version of their theory which incorporates some further commitments. Historically, the view can be traced at least to the Soviet psychologist, Lev Vygotsky (1934 [1986]), and it was also held by Ryle (1949 [2009]). However, Gauker develops a somewhat different version of the view—one that sharply distinguishes inner speech from the auditory-verbal imagery typically associated with it (see Section 3.2 for discussion)—which has its origins in Sellars (1956).

If inner speech is a kind of speech, instances of inner speech could aptly be called “inner speech utterances”, as producing inner speech would really amount to saying something. However, in order to be neutral on the issue, we will use the term, “inner speech episodes”, in this section and throughout the entry. An important issue for a proponent of the actual speech view is to explain how inner speech can consist of genuine linguistic tokens, given that it seems to be an imagistic phenomenon—where, as noted, the images may appear to be representations of speech sounds. For, even if inner speech consists of images of speech sounds, this only suggests that it consists of representations of linguistic items, not linguistic items themselves.

One style of answer has been offered by Sam Wilkinson (2020), who draws a distinction between imagery and imagination. He holds that sensory imagining is a “personal-level phenomenon”, which has components (2020: 16). One of the components of sensory imagining (as opposed to propositional or “attitudinal” imagining, which is typically assumed to be non-imagistic in nature) is mental imagery. For example, if one sensorily imagines a duck, then one component of this personal-level mental state may be a mental image resembling the appearance of a duck. There might also be other components, such as a stipulation that the image is an image of a duck and not another bird of similar appearance. But, Wilkinson emphasizes, mental imagery can be involved in many personal-level mental attitudes apart from the attitude of imagining, such as remembering, judging, reasoning, and others. “In a similar way”, he claims,

imagery … may be involved in an inner assertion. That does not, however, make the inner assertion simply nothing more than the imagery involved in its production, still less an act of imagination. (2020: 16).

Imagery can play many roles, Wilkinson is saying, and there is no reason that one of those roles should not be as a medium for linguistic tokens. The inner assertion is “a genuine assertion”—an instance of language consisting of imagery.

It might be replied that, although imagery can play a role in many personal-level mental states apart from imagining, it plays a very similar role in all of them, viz., representing how a concrete object (whether actual or possible) appears or sounds. Mental imagery does not tend to play a role similar to that of a linguistic token. So, even if mental imagery is involved in a range of personal-level mental states, it is not obviously well-suited, in the case of inner speech, to play the specific role of actual linguistic tokens.

This challenge might be met by connecting the actual speech view with work on the metaphysics of word tokens, as proposed by Wade Munroe (2022a, 2023). Munroe holds that

what makes something, φ , a token of a word type, w , is that the process of generating φ is explained and guided by one’s (tacit) knowledge of w (or the morphological structure of w ), e.g., one’s semantic, syntactic, morphophonological/orthographic, knowledge of w stored in one’s mental lexicon. (2022a: 4)

This allows him to hold that inner speech episodes can involve word tokens, insofar as their generation is guided by the relevant kind of tacit knowledge. (Though Munroe himself does not hold the actual speech view; see Section 3.3.1 for discussion of his view.) Relatedly, J. T. M. Miller (2021) explicitly denies that word tokens are necessarily substances and holds, instead,

that particular or token words are objects, which are bundles of various sorts (most notably semantic, phonetic, orthographic, and grammatical) properties. [sic] (2021: 5737)

One might hold that inner speech episodes in fact consist in such bundles.

Although the matter of how inner speech episodes can involve genuine linguistic tokens is of great importance for the actual speech view, it is only beginning to receive attention. However, several arguments have been given in support of the theory generally. These include the following:

  • Inner speech may be a developmental descendant of a kind of external speech. Piaget (1923 [1926/1959]) observed that young children have a practice of speaking to themselves aloud. He described this kind of speech as “egocentric speech” (ibid, passim) (egocentric speech can be seen as one kind of private speech; see Introduction). Vygotsky (1934 [1986]) presented empirical evidence that inner speech develops in children as they internalize the practice of producing egocentric speech (though see Gregory (forthcoming) questioning this evidence). Vygotsky held that egocentric speech becomes silent, inner speech, but that it does not change in its fundamental nature, so it remains a kind of actual speech (see also Wilkinson & Fernyhough (2018), Wilkinson (2020)).
  • Introspectively, it seems like we can perform speech acts—e.g., make assertions and ask questions—in inner speech. But it would only be possible to perform speech acts in inner speech if inner speech is a kind of speech (Wilkinson 2020; Wilkinson & Fernyhough 2018). (This issue is addressed further in Section 4.1 .)
  • On the face of it, we produce inner speech for purposes such as focusing our attention, motivating ourselves, and evaluating our actions. These correspond to purposes which instances of external speech also often serve: focusing the attention of others, motivating them, and commenting on their actions. There are also parallels in terms of how inner speech episodes and instances of external speech are constructed. Both often take the form of short, sub-sentential items when this is sufficient (e.g., “Here!”, upon finding something which was lost) and more fully elaborated sentences when this is necessary (e.g., when carefully listing the considerations relevant to a difficult decision which needs to be made, whether by oneself or by a group). Marta Jorba, Agustín Vicente, and Fernando Martínez-Manrique have taken these systematic parallels as evidence that inner speech and external speech are simply different types of one phenomenon, namely, speech (Jorba & Vicente 2014; Martínez-Manrique & Vicente 2015).
  • There seems to be a contrast between imagining speaking and engaging in inner speech, as it is ordinarily understood. This contrast, Gregory (2016) suggests, parallels the contrast between two kinds of external actions which we can perform. When an actor says the lines in their script, what they are producing is a representation of speech that someone else might produce. The actor is, of course, speaking, but they are doing so in the context of a pretense. What the actor is doing contrasts with the speech which they produce in, e.g., an ordinary conversation with someone. The contrast between imagining speaking and producing inner speech seems to map neatly onto the contrast between what the actor does on the stage and what they do in an ordinary conversation. If this is so, then a natural analysis is that the contrast between imagined speech and inner speech is a contrast between a representation of speech and actual speech—which implies that inner speech is a kind of actual speech.

A couple of philosophers who hold the actual speech view but express it in different terms, or who hold very similar positions, should be mentioned. First, Philip Gerrans (2015) describes inner speech as involving “imaginary action” (2015: 296), but he is explicit that, by this, he means only to say that producing inner speech is an action performed covertly. He takes inner speech to involve speaking, but doing so silently.

Second, Johannes Roessler (2016) holds that there are different kinds of inner speech, one of which involves imagining speaking (rather than actually speaking), but in a particular way. He points out that we can imagine things, or imagine doing things, for different purposes. An act of imagining will then be successful to the extent that it achieves the purpose for which it is performed. So, one might, for example, imagine making an assertion, but do so with the intention of imagining making an assertion which is true and relevant to context. Then the act of imagining making the assertion “incurs the same liabilities” (2016: 548) that the act of actually making the assertion would incur. If you are puzzling over some question, and you imagine asserting a possible answer, then the act of imagining will be successful only if you have imagined asserting the correct answer. Although you have only imagined performing the speech act of making an assertion, your imagined assertion will be “in some ways tantamount to an assertion” (2016: 548).

It would be an open position, though not one Roessler takes, that all inner speech episodes could be analyzed in this way. On such a view, inner speech episodes would be something very similar to actual speech, yet without quite being speech acts, and thus without the commitment that producing inner speech involves producing actual linguistic items.

2. Inner Speech and Thought

A second question about inner speech is how it relates to thought. It seems that there must be some relationship, but it is an open question what that relationship is. In general, there are three views about the nature of the relationship: (1) inner speech episodes express thoughts; (2) inner speech episodes facilitate thoughts; and (3) inner speech episodes (at least sometimes) are thoughts of a certain kind.

The views are not mutually exclusive: one can certainly hold that inner speech is related to thought in multiple ways.

Langland-Hassan & Vicente (2018b: 10) observe that the view that inner speech (at least often) expresses thoughts that are distinct from the inner speech episodes themselves coheres with some larger theories about thought and language. If one is attracted to these theories, then they may well also be attracted to the view that inner speech merely expresses thought.

First, there is a natural connection between the language of thought hypothesis, most closely associated with Jerry Fodor (1975), and the view that inner speech expresses thought. On the language of thought hypothesis, our thoughts do take place in a language, but not in a natural language. Rather, our thoughts take place in a kind of mental language, often referred to as “Mentalese”. If the language of thought hypothesis is true, then, insofar as inner speech is keyed to a natural language, it seems that inner speech can at most serve to express the thoughts which occur in the mental language.

Second, on Willem Levelt’s influential theory about language production, speaking involves conveying a pre-existing “message” (1989: passim). The structure of this message is conceptual but not linguistic. Via several stages of processing, natural language sentences (or sub-sentential items) are formulated which, once articulated, express the conceptually structured message with which the process started. If one thinks that inner speech is actually a kind of speech, then one might incline to think that inner speech also expresses a pre-existing message.

Thus, Peter Carruthers (2009, 2018) approaches matters from a Fodorian and Leveltian angle when he proposes that

the first metacognitive access subjects have to the fact that they have a particular belief is via its verbal expression (whether overtly or in inner speech). (2009: 125)

For Carruthers, the inner speech episode is not a belief or judgment itself, but rather the expression thereof (see Section 5.2 ). In a similar way, Ray Jackendoff (1996, 2007, 2011, 2012) emphasizes the distinction between thought itself and the auditory imagery by which it may be expressed, identifying only the latter with inner speech (see Section 3.1 ). Likewise, José Luis Bermúdez (2003) and Jesse Prinz (2011) distinguish between conceptual thought itself and inner speech, while holding that we often come to know what we are thinking by attending to inner speech sentences that we might use to express such thoughts. They stop short of explicitly claiming that such sentences actually express thoughts, however, specifying instead that the inner episodes are sentences through which such thoughts “might be expressed” (Bermúdez 2003: 164), or that we “would use” to express them (Prinz 2011: 186) (see Section 5.1 ).

One can, however, hold that inner speech episodes express thoughts without committing to the view that a thought must be fully-formed prior to the production of the relevant inner speech episode. José Luis Bermúdez (2018), for example, holds that producing an inner speech episode can actually play a role in forming the thought which it expresses. For Bermúdez, a thought can be refined and precisified as an external utterance is being produced and, equally, a thought can be refined and precisified while an inner speech episode is being produced. Nonetheless, by the time an inner speech episode has been produced, it will express an existing thought.

Finally, it is worth noting the following point of contact between the actual speech view, discussed in Section 1 , and the question of whether inner speech expresses thought. If it is an essential feature of speech that it serves to express thought, then defenders of the actual speech view are likewise committed to the view that inner speech expresses thought. If, on the other hand, one holds that there can be (inner) speech that does not express thought, then the question arises as to what the difference between (inner) speaking and thinking in a natural language might be—and whether there is indeed a difference.

There have been several suggestions as to how inner speech might play a substantive role in facilitating thought or thought processes—a role that goes beyond merely expressing thought processes.

First, inner speech is often thought to play an important role in working memory. According to Alan Baddeley’s influential theory of working memory (e.g., Baddeley 1992), we can retain a series of words or numbers in working memory by reciting them in inner speech. A short series of items will be retained long enough to recite them again. One can iterate this process via a “phonological loop” for as long as desired.

Following Vygotsky (1934 [1986]), Clowes (2007) and Jorba & Vicente (2014) hold that inner speech can serve as a tool for directing our own attention, just as external speech can serve as a tool to direct the attention of others. In making this case, both draw on the Vygotskyan developmental account of inner speech, on which inner speech is derived from the external phenomenon. See also Martínez-Manrique & Vicente (2015), who make the same point but are less directly influenced by Vygotsky’s original (1934 [1986]) developmental account.

There is evidence that inner speech facilitates various executive function tasks, such as planning, task-switching, and inhibiting impulsive and inappropriate responses, without being essential to them. The evidence that inner speech can play a role in these tasks is primarily empirical. For reviews of the relevant literature, see Alderson-Day & Fernyhough (2015) and Petrolini, Jorba, & Vicente (2020).

Munroe (2022b; forthcoming) argues that inner speech plays a role in reasoning which goes beyond merely aiding or improving it. He notes that reasoning processes often involve preserving representations in working memory. In doing complex mental arithmetic, for example, one might recite in inner speech the word for a number which they have determined will be needed later in the process, e.g., when regrouping values (i.e., “carrying” and “borrowing”). The number word will be stored in working memory via the process described above. But, on Baddeley’s model of working memory, which Munroe is working with, only sensory representations can be stored in working memory. In the present context, this means that only auditory representations of the relevant word sounds can be stored, not the conceptual content which the word would have if spoken aloud (or, possibly, if it were produced in inner speech in a different context, depending on one’s view on the contents of inner speech—see Section 3 ). When one needs to use the number at a later stage in the process, they will need to interpret the sensory representation which they are producing. For example, if they are reciting a sound corresponding to the word, “six”, in inner speech, they will need to interpret that as the word referring to the number, six, so that six becomes the number that they now use to continue their calculations. If this is so, then interpreting the inner speech that one was producing, and thus the inner speech itself, was essential to the reasoning process, not merely a dispensable aid. Munroe holds that the same will apply in many reasoning processes performed that require making use of an intermediate conclusion.

A number of theorists—especially those working in neo-empiricist (Barsalou 1999; Prinz 2011, 2012) and embodied cognition traditions (Borghi et al. 2017; Dove 2014)—have also proposed that inner speech plays an important role in facilitating abstract thought, i.e., thought about objects or properties that are not easily perceived. Here the idea is that language perception and production abilities—and their internalization, via inner speech—provide means for explaining the acquisition and use of abstract concepts in broadly sensorimotor terms. In particular, Guy Dove (2014, 2018, 2020, 2022) develops a view where language—often in the form of inner speech—is used as a “scaffold” or “tool” for enabling thought about abstract entities, and where the capacity for abstract concept use is closely tied to the capacity for language.

Finally, if subsystems and modules in the mind function in isolation from one another to any significant extent, then inner speech may play an important role in integrating their output. Carruthers (2002, 2006) suggests that the process of language production generally, including the production of inner speech, is especially well suited to integrate the output of multiple modules, because of the combinatorial nature of language. In producing an episode of inner speech, one can thus express complex content, which is then distributed to mental modules and subsystems for further processing. Other sources relevant to inner speech and the integration of information produced by different parts of the mind include Baars (1988) and Dennett (1991).

A number of philosophers have argued that at least some inner speech episodes actually are thoughts or, at least, parts of thought processes. Gauker (2011, 2018) holds that all conceptual thought occurs in inner speech, where, as elaborated in Section 3.2 , he takes inner speech to involve the tokening of items of a natural language in neural states that are distinct from the auditory-verbal representations that many identify with inner speech. In his 2011 book, he responds to arguments that conceptual thought cannot occur in natural language.

With respect to inner speech understood as a partly sensory phenomenon, Keith Frankish (2018) describes how inner speech can be used to break a complicated problem into smaller problems, which can then be addressed by lower level, automatic thought processes. Deciding whether to accept an invitation from colleagues to attend a party, for example, one might produce the inner speech episode, “What will it be like?”. This more circumscribed question can be addressed by autonomous processes, such as recalling previous parties with colleagues. Along with other autonomous processes, this might generate the prediction that an annoying colleague, Henry, will likely be at the party. If this is significant, it could result in the inner speech episode, “Henry will probably be there”, in turn prompting a largely autonomous evaluation of the effort involved in enduring Henry’s company. The process could result, depending on the outcome of this evaluation, in producing the inner speech episode, “I can’t face that; I won’t go”. (Quotes from Frankish [2018: 234], though the example is slightly modified.) The inner speech episodes, Frankish believes, are critical to making the decision, and are thus rightly considered parts of the process of thinking itself. See Kompa (forthcoming) for a similar argument; cf. Munroe (2022b), discussed above, who also holds that an inner speech episode can be essential to a thought process but does not infer from this that an inner speech episode can actually be a part of the process, but see also discussion of Munroe (2023) below.

Frankish (2018) also holds that inner speech episodes can be thoughts in the form of conscious commitments, where these are “a distinct kind of mental attitude” (2018: 237), which cannot be analyzed in terms of other conscious mental states, such as conscious decisions, beliefs, or desires, or expressions of other mental states. They are simply commitments made to oneself to “regulat[e] our future activities, including our intentional reasoning, in line with the choice or view expressed” (2018: 237). For example, the inner speech episode, “I will go to the gym today”, is a commitment to go to the gym today, not just the expression of a decision to do so, because it also generates a kind of obligation to oneself, as it were, to do so. For Frankish, this follows from treating inner speech as an internalized version of interpersonal speech, in which commitments also generate obligations.

On Frankish’s account, an inner speech episode can be like a judgment, insofar as it may involve committing oneself to act and reason in a way which is consistent with the truth of the proposition expressed by the inner speech episode. Munroe (2023), by contrast, holds that an inner speech episode can actually function as a judgment. If an inner speech episode is accompanied by what has been called a “Feeling of Rightness” (Munroe cites Thomson et al. 2013 and Unkelbach & Greifender 2013), then it will play roles typically attributed to judgments such as “terminating inquiry and causing overt actions” (Munroe 2023: 309). Munroe connects his claim to a model proposed by Ackerman & Thompson (2015, 2017a, 2017b) on which the roles that mental states play is determined partly by metacognitive monitoring. The “Feeling of Rightness” is a cue to a metacognitive monitoring system that a particular mental state can appropriately play the roles of a judgment. Munroe’s claim is that inner speech episodes can function as judgments if this is deemed appropriate by the metacognitive monitoring system, on account of being accompanied by the appropriate “Feeling of Rightness” (or at least by a feeling of sufficient certainty).

Nikola Kompa (forthcoming) adds a quite different argument for the identity of (some) thoughts and (some) inner speech episodes. She operates with a broad notion of inner speech, on which any “inner episode that substantially engages the speech production system” is an instance of inner speech (forthcoming: 4, emphasis removed). On this understanding of inner speech, any thought with semantic content and syntactic structure will be an instance of inner speech, even if it does not become conscious. Kompa rejects the language of thought hypothesis, on which thoughts can have linguistic properties because they occur in a non-natural language. Accordingly, for Kompa, the only way that a thought can have semantic content and syntactic structure is if its formation substantially involves the speech production system (which she understands in Leveltian terms, citing Levelt 1989; Levelt et al. 1999; and Indefrey & Levelt 2004). Insofar as we have any thoughts that have semantic content and syntactic structure, then, these are, on her definition, instances of inner speech. If the production of such thoughts does not proceed further through the speech production process, such that they are morpho-phonologically encoded in addition to having semantic content and syntactic structure, they will occur as unconscious inner speech episodes.

Finally, it has been suggested that there is a close connection between inner speech and a phenomenon known as “unsymbolized thought”. Using the Descriptive Experience Sampling paradigm, Russell Hurlburt and Christopher Heavey (e.g., Hurlburt & Heavey 2002; Heavey & Hurlburt 2008) have gathered introspective data that they interpret as providing evidence that people sometimes have the experience of

thinking a particular, definite thought without the awareness of that thought’s being conveyed in words, images, or any other symbols. (Heavey & Hurlburt 2008: 802)

Martínez-Manrique & Vicente (2015), Vicente & Martínez-Manrique (2016), and Vicente & Jorba (2019) suggest that these “unsymbolized thoughts” occur when the production of an inner speech episode is aborted at the earliest stage of production, when only the content or message to be expressed has been formulated. Appealing to accounts on which we experience conscious representations of actions which we begin to perform but abort, they suggest that an unsymbolized thought is a representation of the message which one commenced expressing in inner speech, which becomes conscious because the process was aborted. Insofar as the process was aborted prior to the message being organized in phonetic form, the representation is entirely amodal. See also Kompa (forthcoming).

3. Content-Based Theories of Inner Speech

We have seen that there are a variety of views taken on whether inner speech is indeed a kind of speech, or a kind of thought, or both. A popular way to gain added leverage on those questions is to advance an account of the contents of inner speech. Focusing on questions concerning the contents of inner speech also helps to clarify the depth of some of the puzzles and controversies already introduced.

Most generally, the content of a representation is what the representation is of or about —it is what the representation represents. The content of the word “cat” is a certain type of animal (namely, a cat). And, the content of the sentence “cats are animals” is the proposition that cats are animals. Two distinct representations can have the same content. For instance, the French word “chat” has the same content as the English word “cat”; and the French sentence “les chats sont des animaux” has the same content as the English sentence “cats are animals”. Thus—to borrow analogies from Siegel (2005 [2021])—the contents of a mental state, in the present sense, are akin to the contents of a newspaper article and not akin to the contents of a bucket. Mental contents are not things that are contained within mental states themselves (just as cats are not contained within the word “cat”) but are, instead, what the mental states are of or about.

We will distinguish three broad classes of views about the contents of inner speech and several sub-views within them, noting their main motivations and relationships to questions concerning inner speech’s proposed cognitive roles. According to what we will call the “phonological content view”, inner speech episodes always and only have phonological contents. The competing content-based theories to be discussed hold either that inner speech only has semantic contents (the “semantic content view”, as we will call it) or that inner speech has phonological contents and semantic and/or other kinds of contents (the “mixed contents view”).

As we will see, the phonological content view is a natural fit with the view, discussed at the beginning of Section 1 , that inner speech is merely a representation of speech and not actually a kind of speech. This is because the phonological content view sees inner speech as consisting in imagistic representations of speech and as lacking the kinds of contents (or meanings) associated with word tokens themselves. Likewise, those who hold that inner speech is actually speech will typically hold either a mixed contents view or a semantic content view, as these views allow inner speech episodes to have the kinds of semantic contents that are typically viewed as essential to being a linguistic token.

To say that inner speech has phonological contents is to say that inner speech episodes represent phonemes (or phones ), where phonemes are the most basic meaningless building-blocks from which any word of a language can be built. There are 44 phonemes in English, different combinations of which account for the distinct sound each word has in relation to all other words from which it can be aurally distinguished. The notion of a phoneme is somewhat of an abstraction, however, as slightly different sounds (in terms of pitch, timbre, and frequency) can fall within the sonic range that constitutes a single phoneme type. These more specific, concrete sounds that can qualify as instantiations of a phoneme are known as phones. Whether inner speech episodes represent phonemes or, instead, the finer-grained property of being a phone is a matter of dispute among those who hold that inner speech episodes have phonological contents (Patel 2021; Langland-Hassan 2018; Hill 2022).

Note also that, while the phonemes of most natural languages are auditory in nature—and are thus perceived through the sense of hearing—the notion of a phoneme has also been applied to gestural languages, such as American Sign Language (Sandler 2012; Stokoe 2005). So, the concept of a phoneme is not specific to any modality. It refers to the smallest meaningless units of a language that can be arranged and recombined to form the smallest meaningful units of that language, no matter which modality the language occurs in. In spoken languages, however, the auditory modality takes precedence over the visual/written modality, insofar as the phonemes are typically held to be sounds, while the graphemes are held to be letters or groups of letters that represent phonemes. While most will not consider the visualization of graphemes and written words to be cases of inner speech, it bears noting that such visualizations satisfy the neutral characterization of inner speech provided at the outset.

There are several reasons one might hold that inner speech episodes have phonological contents. The first is phenomenological in nature. What it is like to have an inner speech episode is similar to what it is like to hear oneself saying the corresponding words aloud. One might explain this phenomenological similarity by appeal to the fact that inner speech episodes and the corresponding cases of hearing represent similar properties—either phonemes or phones of a certain sort—and, accordingly, have similar contents. A second reason appeals to the fact that we can use inner speech episodes to judge whether two visually dissimilar written words—such as “blood” and “mud”—rhyme. As rhyming is a relationship between the sounds of words, the usefulness of inner speech episodes in judging rhymes would be explained if inner speech episodes represented word sounds and thereby allowed us to compare those sounds (Langland-Hassan 2014). A third reason that has been proposed for thinking that inner speech has phonological contents is that it is the representation of those features that allows one to discern which language we are exploiting when engaged in inner speech (Langland-Hassan 2018). (See Patel 2021 for a rebuttal.)

Jackendoff (1996, 2007, 2011) proposes that auditory contents exhaust the contents of inner speech. Jackendoff’s view is motivated in part by a prior commitment to the thesis that we do not think in a natural language. Like many in cognitive science, he sees natural language primarily as a means for communicating thoughts that themselves occur unconsciously in some other medium (such as a Fodorian “Mentalese”). According to Jackendoff, thought itself is never conscious, nor is the use of concepts. By contrast, inner speech—what he calls the “talking voice in the head” (1996: 10)—occurs consciously and does not involve the use of concepts. In having inner speech, he explains, “[w]e experience organized sounds”, whereas,

the content of our experience, our understanding of the sounds, is a different organization … called conceptual structure. (emphasis original, 1996: 12–13)

“The organization of this content”, he holds, “is completely unconscious” (1996: 13). Jackendoff identifies the inner voice with a representation of “phonological structure”, a representation having phonological content, yet no conceptual or semantic content. Whereas, the mental states constituting our understanding of what the voice is saying, he notes, are distinct conceptual states that occur unconsciously:

What we experience as our inner monologue is actually the phonological structure linked to the thought,

he explains.

We are aware of our thinking because we hear the associated sounds in our head. (Jackendoff 2011: 613)

(See also Jackendoff [2007: 80–85] where he remarks on the counterintuitive nature of his view: “How can the contents of consciousness consist of just a string of sounds?” [2007: 85].)

It should be noted that Jackendoff also suggests that inner speech episodes “express” thoughts, which would seem to support the view that such episodes have the semantic contents of our thoughts (e.g., “the linguistic modality can make reasons as such available in consciousness” [1996: 19] and “only through language can such concepts form part of experience rather than just being the source of intuitive urges” [1996: 23]). On the other hand, he equally emphasizes the overlooked fact that “linguistic structure has three major departments: phonological, syntactic, and semantic/conceptual structure”, and that “the forms in awareness—the qualia—most closely mirror phonological structure” (2007: 81). Most recently, he has proposed a view where what we intuitively mark as “conscious thought” has three components: a “pronunciation” of the thought, a feeling of meaningfulness, and the meaning attached to the pronunciation. There he holds that only the first two are conscious and appears to identify inner speech with the “pronunciation” component. This is in keeping with the phonological content view, as the (semantic) meaning of the pronunciation is something separate from the pronunciation and is only represented “backstage” (i.e., unconsciously) (Jackendoff 2012: 84–5).

Langland-Hassan (2014) provides a qualified defense of a phonological content view, motivated by worries about how a single mental state—in particular an episode of inner speech—can be said to represent both word sounds and word meanings simultaneously. He notes that a word’s meaning and its sound are entirely distinct properties, related only by convention. If mental states are individuated by their contents, then it seems that distinct neural or functional states will be needed to represent these distinct properties. This has become known as the “binding problem” for inner speech (see Munroe 2023; Patel 2021; Bermúdez 2018 for different approaches to resolving it; see also Prinz 2011 for related remarks). In light of this problem, Langland-Hassan proposes that ordinary episodes of inner speech likely consist in two or more mental states triggered at roughly the same time (this would be a multiple-state version of the “mixed contents” view, discussed below). Yet he adds that, when inner speech has been divided into distinctly occurring states in this way, there are good reasons to identify inner speech solely with the component that represents word sounds. Doing so results in a phonological content view.

In contrast to the phonological content view, the semantic content view holds that inner speech episodes always and only have semantic contents. By “semantic contents”, we mean the kinds of contents had by ordinary words, phrases, and sentences of a natural language. Such contents are typically equated with the meaning of a word, phrase, or sentence.

One version of a semantic content view, defended by Christopher Gauker (2011, 2018), holds that inner speech episodes exclusively have semantic contents and entirely lack both auditory contents and auditory phenomenology. Gauker allows that episodes of auditory verbal imagery often accompany inner speech. However, on his view, this auditory imagery is not to be identified with inner speech itself. Rather, according to Gauker, inner speech is a non-sensory linguistic phenomenon occurring in the brain that is (often) represented by episodes of auditory verbal imagery. Just as we may use auditory representations to represent someone else’s speech that we are actually hearing, so too, for Gauker, our inner speech is often represented by verbal imagery—imagery that is in fact distinct from the (inner) speech itself. (Here Gauker develops related remarks of Wilfrid Sellars (1956).) Notably, Gauker (2018) grants that, in the case of inner speech, this auditory-verbal imagery misrepresents our inner speech as having sonic features (i.e., as instantiating phones or phonemes), given that the neural events that constitute inner speech episodes are themselves silent.

Gauker’s style of pure semantic content view is not widely endorsed. This may be because it clashes with the widespread view that inner speech has a sensory character similar to that of hearing speech. On the other hand, Gauker’s view can be said to have an advantage in providing a literal sense in which, when we engage in inner speech, we are thinking in words of a natural language and not merely about them. On Gauker’s (2011) view, the neural events that carry semantic content are themselves tokens of words and phrases of a natural language, and the question of how auditory-verbal images can also be linguistic tokens does not arise. His view is also motivated by an opposition to what he calls the “Lockean” view that sees conceptual thought as something prior to and separate from the speech that expresses it. One can see Gauker (2011) as trying to preserve the idea that abstract (conceptual) thought occurs in a language (and is often non-conscious), while divorcing it from the thesis that there exists an innate, Fodorian “language of thought” (and one that must be exploited in order to learn a natural language).

Bermúdez (2018) offers a different style of semantic content view that allows for inner speech to retain a characteristic auditory phenomenology. According to Bermúdez, the auditory sensory character of inner speech is a result of inner speech episodes having non-representational auditory properties. For Bermúdez, the only representational contents had by inner speech episodes are those pertaining to the meanings of words. In response to the those who argue that inner speech episodes must also have phonological contents (e.g., to explain why we can use inner speech to judge whether two words rhyme), he argues that there is no entailment from the fact that inner speech episodes can be useful in judging rhyme relations to the conclusion that they represent phonemes (2018: 216–7).

A third type of theory on which inner speech exclusively has semantic content proceeds by arguing that inner speech is a genuine form of speech. This argument is typically made on either phenomenological or functional grounds. From there it is inferred that inner speech must have the same kind of contents as external speech. If episodes of external speech—i.e., the words we hear when someone speaks—have semantic content but no phonological content (because they do not represent phonemes), so too must episodes of inner speech. This approach to theorizing about inner speech is discussed in more detail in Section 1 . Assuming that (unlike Gauker) proponents of such a view wish to maintain that inner speech episodes constitutively have auditory sensory character, they may concur with Bermúdez in his claim that the auditory phenomenology of inner speech does not entail the representation of auditory properties; or, alternatively, they may provide some other account of why, in many instances, inner speech seems to represent phonemes even if it does not really do so.

3.3 Mixed Contents Views

Mixed contents views hold that inner speech episodes typically have at least two kinds of content—phonological and semantic—simultaneously. On a mixed contents view, the inner speech episode “Dogs are mammals” represents both the sound of the sentence “Dogs are mammals”, as uttered aloud, and the proposition that dogs are mammals. We can distinguish two species of mixed contents view: single-state and multiple-state. Single-state views hold that what we intuitively mark as a single inner speech episode consists in a single mental state that has both auditory and semantic contents. Multiple-state views hold that the apparent unity of a single inner speech episode is in some sense illusory, as such episodes typically consist in the contemporaneous occurrence of two or more mental states, where one of the states represents phones or phonemes and another has semantic contents. (Some multiple-state views hold that inner speech episodes involve additional distinct states with articulatory and syntactic contents as well.) As earlier noted, some phonological content views hold that mental states with corresponding semantic contents occur contemporaneously with the representations of phonemes that are identified with inner speech. These phonological content views differ from multiple-state mixed contents views in that the former identify inner speech solely with the state that has phonological content, perhaps on the grounds that it is the only sort of state of which one is consciously aware (this appears to be Jackendoff’s motivation).

Carruthers (2011, 2018) defends a single-state mixed contents view, proposing that inner speech involves the generation of a representation of word sounds (i.e., phonemes) which—in a process akin to what occurs in outer speech perception—is then interpreted by one’s speech comprehension mechanisms so that a semantic content can then be assigned to the represented utterance. (He notes that a representation of the semantic content of the represented phrase—referred to as the “message” on Levelt’s [1989] speech-production framework—sometimes precedes the representation of the word sounds, albeit non-consciously.) Once the represented word sounds are interpreted, Carruthers suggests, the information that the represented utterance has a certain semantic content is “bound into” a single “event-file” that contains information both about the sound and the meaning of the represented utterance (2018: 41–42). (See Frankish [2004: 57; 2018] for a similar view.) Carruthers analogizes such binding to the way in which the color, shape, and category properties of a visually perceived object are said to be “bound into” a single object-file that accumulates multiple forms of information about a single object, despite those properties being represented in temporally distinct stages and in distinct neural regions. These event-files, when activated and globally-broadcast, are said to constitute a single conscious inner speech episode that has both auditory and semantic contents.

Munroe (2023) develops a similar style of single-state mixed contents view, arguing that, in addition to representing phonemic and semantic features, inner speech episodes also represent the likelihood that the content of the represented utterance is true. The latter is necessary, he holds, for inner speech episodes to qualify as judgments (see Section 2.3 ). These three distinct features are, for Munroe, bound into a single mental state in the sense that a single mental state predicates these three distinct properties of a single represented utterance (Munroe 2023: 304).

Other mixed contents views of inner speech—inspired by Levelt’s (1989) multi-stage model of speech production—attribute the different representational contents entertained during an inner speech episode to multiple distinct states that tend to co-occur. Martínez-Manrique & Vicente (2015) defend a multiple-state view under the moniker of the “activity view” of inner speech, highlighting the multi-component processes of both inner and outer speech. “It is quite natural”, they explain,

to try to understand inner speech in terms of all the representations that are mobilized in speech, i.e., semantic, syntactic, maybe articulatory …. The representations involved—from conceptual to phonological—form an integrated system. (2015: 8)

The view which Martínez-Manrique & Vicente set out in their 2015 paper bears clear similarities to the actual speech view, insofar as they hold that inner speech is functionally similar to external speech. What separates it from the actual speech view, however, is that they do not hold that inner speech consists of actual words and sentences which express semantic content, but of distinct representations of phonological and semantic (and other) content. (For complementary multiple-state mixed contents views in cognitive neuroscience, see Grandchamp et al. 2019 and Lœvenbruck et al. 2018.) While these representations are unified in the sense of occurring within a single system for language production, they remain distinct mental states—distinguished, in part, by their distinct contents, and their ability to occur in isolation of each other. (Note, however, that this way of categorizing the view assumes that each mental state is composed of exactly one mental representation. It may be possible to articulate a view where one mental state is composed of multiple mental representations. The question then becomes: in virtue of what do the multiple representations qualify as a single mental state, as opposed to components or stages of a single cognitive system?)

Christopher Hill (2022: 136–139) develops a similar multiple-state mixed content view, emphasizing that the representations of semantic content lack any associated phenomenology. The phenomenology of inner speech is, for Hill, entirely a function of its auditory-phonological contents. He further specifies that these phonological contents are (the more abstract) phonemes, and not phones, to account for the relatively impoverished sensory character of inner speech in comparison with speech perception. Patel (2021) also defends a multiple-state mixed contents view, on which, in addition to having some combination of semantic, syntactic, auditory, and articulatory contents, inner speech episodes have vocal contents. To have vocal contents is to represent some particular person’s voice as communicating some combination of semantic, syntactic, auditory, or articulatory information. According to Patel, whether we are representing the semantic, auditory, or articulatory contents, these mental events involve one’s representing a certain person’s voice as attempting to convey such information. This common representation of a voice, he argues, provides a kind of unity to the class of mental events that can be considered inner speech.

Because multiple-state views allow that the distinct components of inner speech can potentially occur in isolation, they face a question of which components need to occur for the episode to be properly counted as an instance of inner speech. Vicente & Jorba (2019), Martínez-Manrique & Vicente (2015), and Vicente & Martínez-Manrique (2016) see this as an advantage, insofar as it allows them to place different phenomena related to inner speech on a single continuum (see also Kompa & Mueller forthcoming and McCarthy-Jones & Fernyhough 2011). For instance, when the semantic and syntactic contents of ordinary inner speech are represented in the absence of any auditory-phonological contents, they propose, this can be understood as a case of so-called “unsymbolized thought” (Heavey & Hurlburt 2008; Heavey, Moynihan, et al. 2019). See Section 2.3 for further detail.

A notable feature of the surveyed mixed contents views (as well as the phonological content view) is that they need not (and often do not) hold that inner speech episodes occur in a natural language. Rather, on these views, inner speech episodes represent natural language utterances (in virtue of their phonological contents), without necessarily being instances of such utterances themselves. This is because, on mixed contents views, the semantic content of an inner speech episode may not be represented by tokens of a natural language. For instance, for Carruthers, the semantic contents of an inner speech episode are represented via symbols of an amodal language of thought (e.g., a Fodorian [1975] Mentalese ), which are coupled with sensory representations of the sound of the corresponding sentence as spoken aloud. One language (Mentalese) is used to represent the meaning of an expression in another (e.g., English). In this way, Carruthers (2010, 2018) deviates from Carruthers (1996), with the latter defending the idea that inner speech episodes literally occur in—and are expressions of—a natural language. Carruthers now emphasizes the point, raised also by Machery (2005), that introspection does not provide grounds for claims about the representational format of our inner speech episodes.

4. Inner Speech and Pragmatics

In general, the philosophy of language has focused primarily on language used interpersonally. It is natural to wonder to what extent this material is applicable to inner speech. This question can be asked whether or not one thinks that inner speech is actually a kind of speech, as no one denies that there is some interesting relationship between inner speech and interpersonal speech.

As mentioned in Section 1 , the intuition that we can perform speech acts in inner speech is the basis of an argument that inner speech is a kind of speech. There are different ways, however, that we might understand the claim, depending on how one thinks of speech acts.

On the traditional analysis of Austin (1962) and Searle (1969), performing a speech act is inherently something one does in accordance with conventions tacitly understood by both speaker and listener. For example, for Searle, asserting p involves (approximately) undertaking to someone that p is true, where the speaker does not know that the listener already knows that p is true. The reason that an assertion can be effective is precisely that both speaker and hearer understand that this is the nature of the transaction. It is hard to see how this kind of analysis could apply to inner speech. One would need to explain how one individual can have two distinct roles, as speaker and listener, such that the conventions that make interpersonal language-use possible can have any relevance (see Gregory 2017, 2020a for related discussion).

Not every version of speech act theory, however, emphasizes conventions. Drawing on some ideas from Strawson (1964) and Bach & Harnish (1970), though not adopting their theories in whole, Wilkinson (2020) holds that what is essential to speech acts is that they express particular mental states. An assertion, for example, is simply an utterance which expresses a belief; a question is an utterance which expresses a desire to acquire certain information; etc. On this view, understanding someone else’s utterance is simply a matter of grasping its content and knowing what kind of mental state the relevant type of utterance expresses. Setting aside the question of whether one needs to interpret their own inner speech, it may be that inner speech episodes can be speech acts if one thinks of speech acts merely as expressions of particular mental states, rather than as actions which depend on conventions in the way that Austin and Searle suggest. For another analysis of inner speech in terms of speech act theory, see Geurts (2018), who emphasizes that inner speech episodes can operate to generate commitments in a way characteristic of speech acts; see also Frankish (2018) and Fernández Castro (2019).

An issue which sits just behind the question of whether inner speech episodes are speech acts is whether they are actions at all. Gregory (2020b) argues that, in the vast majority of cases, inner speech episodes are not actions, because we cannot give reasons for them (which is the criterion for actionhood on Davidson’s (1963) causal theory); they are not subject to our control (the criterion on Harry Frankfurt’s [1978] guidance theory); and we do not try to produce them (the criterion on O’Shaughnessy’s [1973] theory and Hornsby’s [1980] theory). If inner speech episodes are not actions, then they cannot be speech acts.

Tom Frankfort (2022) takes the opposite view. He observes that a great deal of inner speech is involved in deliberation, where this is an expansive category including “reflecting, reasoning, considering, evaluating” (2022: 52). He then applies Mele’s (2009) distinction between actions which involve “trying to bring it about that one x -s” (Mele 2009: 18) and actions which are done in order to bring it about that one x ’s. Frankfort suggests that deliberating is an action in the first sense, insofar as it involves (for example) trying to make a decision, and inner speech episodes are actions in the second sense, insofar as they are produced in order to bring it about that one deliberates successfully and (for example) comes to a decision.

Jorba (forthcoming) also holds that inner speech episodes are typically actions, applying affordance theory. Affordances are opportunities for actions suggested by things in one’s environment. For example, an apple has the affordance of being edible; a cup has the affordance of being graspable. Some hold that affordances can also be things which suggest mental actions (Jorba cites McClelland 2020 and Jorba 2020). Jorba’s suggestion is that some mental states afford the production of inner speech episodes. For example, an inchoate thought affords being articulated clearly in inner speech, and an emotion can afford being labeled. Insofar as inner speech episodes are produced in response to affordances, they are actions and, specifically, speech acts. See Bar-On & Ochs (2018) for another account on which inner speech episodes can be “acts of innerly speaking our mind” (2018: 19, emphasis removed).

Closely related to the question of whether there can be speech acts in inner speech is the question of whether inner speech can involve a kind of dialogue or conversation. A theory which characterizes inner speech this way has been developed at length in psychology, primarily by Charles Fernyhough (e.g., 1996, 2008, 2009). However, the suggestion has been made in a variety of ways by philosophers as well, including by Machery (2018), Frankish (2018), Gauker (2018), and Wilkinson, in collaboration with Fernyhough (Wilkinson & Fernyhough 2018).

The idea that inner speech involves an internal dialogue or conversation clearly has intuitive appeal for some. One often finds inner speech described outside the philosophical context as the “inner dialogue”. But, if inner speech involves a kind of internal dialogue or conversation on more than a metaphorical level, then it is natural to wonder who the interlocutors are (Gregory 2020a). Machery (2018) and Frankish (2018) suggest that different parts of the brain communicate with one another via inner speech. Gauker (2018) suggests that inner speech involves conversing with oneself (see also the discussion of inner speech as a means for interaction between subsystems or modules in the mind in Section 2.2 ). One difficulty with both of these suggestions, however, is that philosophers of language generally (though not universally) think of conversation as fundamentally involving distinct human agents.

Gregory (2017) appeals to Grice’s (1975 [2013]) account of conversation to make this point. Grice argued that conversations are “characteristically … cooperative efforts” (p. 314). But cooperation requires multiple agents and there is only one agent in inner speech. That said, Gauker (2011, 2018) is working with an explicitly non-Gricean picture of conversation, motivated by an opposition to the doctrine that speech acts serve to express thoughts that are distinct from and precede the expressive utterance. He holds that speaking is,

in the first instance, something we do whenever there is no reason not to, because of the good it tends to do. (2018: 71)

In certain circumstances, where multiple individuals are present,

[a] conversation can be the occasion for each interlocutor to reflect on what he or she has experienced, … and on that basis to elicit a statement that is useful from the other. (2018: 72)

Insofar as we can generate inner speech episodes which cause us to reflect on some matter and then produce further inner speech episodes which are useful for us in the context, inner speech will be conversational. Gauker’s analysis here obviously reflects the expression-oriented approach to the question of whether there can be speech acts in inner speech.

In contrast to Gauker, Deamer (2021) argues that inner speech can be seen as being communicative in a Gricean sense. She holds that, to at least some extent, humans are “self-blind”: mental states such as our intentions are not always transparent to us. When we produce inner speech, we reveal our communicative intentions to ourselves, just as we reveal our communicative intentions to others when we converse with them.

While there is disagreement as to whether a series of inner speech episodes can be a dialogue in a literal sense, most agree that inner speech often closely resembles dialogue. As Gauker notes, one episode of inner speech will often prompt another, as happens in interpersonal dialogue. We can produce episodes of inner speech corresponding to different points of view, e.g., when thinking about the considerations for and against some course of action, in a way similar to two people with different opinions. Some participants in studies report that some of their inner speech episodes take place in the voices of others (McCarthy-Jones & Fernyhough 2011; Alderson-Day, Mitrenga, et al. 2018). This last consideration raises an important issue. We can certainly imagine conversing with others and we can certainly imagine others conversing. Such cases are usually taken to be distinct from inner speech (see Section 1 ). However, if inner speech can involve the voices of others, possibly expressing viewpoints other than our own, it becomes difficult to say how instances of inner speech with these characteristics differ from cases of imagining others speaking. How to delineate the extension of “inner speech” in a way that distinguishes inner speech acts from cases of (merely) imagining speech remains an underexplored issue.

5. Self-Knowledge and Metacognition

Inner speech plays an important role in a number of philosophical accounts of self-knowledge and metacognition. By “self-knowledge” we will mean knowledge of one’s own mental states, including both dispositional states—like beliefs, desires, and intentions—and occurrent states, such as thoughts, imaginings, decisions, and judgments. The notion of metacognition is somewhat broader, also encompassing judgments and non-cognitive assessments (e.g., “feelings of knowing”) concerning the validity of one’s own reasoning, the quality of one’s evidence, one’s degree of certainty, and so on (Proust 2013). While some theorists implicate inner speech in their accounts of both self-knowledge and metacognition (Jackendoff 1996; Clark 1998; Bermúdez 2003, 2018), others focus more narrowly on the question of how inner speech might facilitate self-knowledge (Byrne 2018; Carruthers 2011; Roessler 2016). A common thread among theorists who invoke inner speech in their accounts of metacognition or self-knowledge is the idea that certain others of our mental states—namely, those that our inner speech helps us to know about—are either less readily available to introspection or less well suited to serve a metacognitive role. Thus, these views all appear against a backdrop of broader commitments about the nature of mental states and our introspective access to them.

One approach sees inner speech as especially well suited to aid in metacognition due to its linguistic structure, or its link to public language more generally. According to Andy Clark (1998), the fact that inner speech occurs in a language—where such language is seen as abstracting away from the particularities of perception—allows it to play a special role in “second-order cognitive dynamics” (see also Prinz 2011, 2012). This, he holds, is because the natural language sentences featured in inner speech are “context resistant” and “modality transcending” in ways that facilitate a more objective and reliable assessment of the soundness of one’s own thought processes (Clark 1998: 178). Bermúdez (2003, 2018) builds on Clark’s proposal, specifying that awareness of inner speech is essential for enabling humans to become conscious of their own propositional thought processes, which are otherwise amodal and inaccessible to introspection. According to Bermúdez,

all the propositional thoughts that we consciously introspect … take the form of sentences in a public language. (emphasis original, 2003: 159–160)

While he does not identify these public language sentences with our core thought processes themselves—these, he holds, occur in a subconscious language of thought—Bermúdez argues that the linguistic structure of inner speech is needed to adequately represent the relationships of entailment and rational support that may (or may not) exist among the subconscious thoughts the inner speech episodes serve to express. As he puts it,

we think about thoughts through thinking about the sentences through which those thoughts might be expressed. (2003: 164)

Jackendoff (1996, 2007, 2011, 2012) and Prinz (2011, 2012) likewise hold that there is a level of conceptual thought that is not directly available to introspection and that inner speech is well suited for making us aware of such thoughts. Yet, for Jackendoff and Prinz, inner speech is able to play this role primarily because, like other imagistic mental states, inner speech occurs at an “intermediate” level of representation, which, on their theories, is the only level of representation at which mental states are consciously available to the subject. Thus Jackendoff’s comment that “we are aware of our thinking because we hear the associated sounds in our heads” (2011: 613). Echoing Bermúdez and Clark, Prinz finds it

likely that we often come to know what we are thinking by hearing inner statements of the sentences that we would use to express our thoughts (2011:. 186)

and judges inner speech to be “a way of registering complex thoughts in consciousness” (2011: 186). (See also Machery 2005, 2018.)

Several theorists, who we will term “inferentialists”, follow Ryle (1949 [2009]) in his claim that we often come to know what we are thinking by “overhear[ing]”, or “eavesdrop[ping] on … our own silent monologues” (1949 [2009: 165]). On these views, we come to know what we are thinking, or what we believe or desire, by drawing a kind of inference (the nature of which differs, depending on the theorist) from the fact that we “hear” ourselves say something in inner speech. The views of Clark, Bermúdez, Jackendoff, and Prinz, already reviewed, are inferentialist in nature. Yet there are other approaches that incorporate inner speech into a process that is even more explicitly inferential.

Carruthers (2009, 2010, 2011, 2018) is an inferentialist of this latter sort. While, in earlier work, he argued that thought itself occurs in inner speech (Carruthers 2002), Carruthers later abandoned that idea to hold that thoughts (including one’s beliefs) are always unconscious. On this view, inner speech episodes remain more or less directly available to introspection, yet only provide a kind of indirect evidence for what we are in fact judging or deciding (or believing, desiring, or intending) unconsciously. He emphasizes the fallible nature of such inferences, arguing on the basis of various empirical studies that many of the inferences people arrive at about their own beliefs and desires are in fact incorrect. Similarly to Jackendoff and Prinz, Carruthers holds that only sensory states are able to serve as inputs to the mental mechanism responsible for self- (and other-) directed mindreading. These inputs include visual and other forms of sensory imagery in addition to inner speech. However, in cases where we are having thoughts about abstract matters that are difficult to unambiguously represent with other forms of imagery—such as the thought that philosophy is a challenging subject—episodes of inner speech are held to provide an especially important source of information that one is having such thoughts. Carruthers emphasizes that the process becomes especially inferential in nature where other contextual information—such as that one sees oneself lingering over a choice of cereal box—combines with one’s inner speech to generate an all-things-considered appraisal of what one is currently judging or deciding. Cassam (2011, 2014) likewise implicates inner speech in a multi-faceted inferentialist account of self-knowledge, though not pitched in terms of “mindreading” mechanisms or other constructs from cognitive science.

Alex Byrne (2011, 2018) puts inner speech to somewhat different ends in his inferentialist account of how we know what we are thinking. For Byrne, there is no such thing as inner speech, strictly speaking, because there are no sounds (or voices) in the head. However, there are such things as auditory-phonological representations of voices. These give rise to an apparent perception of what we come to think of as the “inner voice”. By trying to attend to what the inner voice says, Byrne proposes, we can reliably form judgments about what we are thinking. The epistemic rule he proposes for doing so is:

THINK: If the inner voice speaks about x , believe that you are thinking about x .

As with Carruthers, a key motivation for Byrne’s account of how we know what we are thinking is a background view—motivated by the work of Shoemaker (1994), Dretske (2003), and others—that we have no other, more direct introspective method for knowing our own thoughts (i.e., we lack something like an “inner sense”). Note that Byrne’s approach is inferentialist in that he takes inner speech to be implicated in inferences that lead to knowledge of one’s own occurrent thoughts. Yet the sort of inference involved is quite different from that envisioned by Carruthers and Cassam, who both hold that inner speech episodes are just one kind of information among many that may be brought to bear in inferences about one’s standing and occurrent mental states. Importantly, the form of inference envisioned by Carruthers and Cassam is essentially the same in its first and third person applications, whereas Byrne’s THINK rule is of an inferential procedure that can only be used to reliably generate true beliefs about one’s own mental states. In Byrne’s view, this helps to explain the “peculiar” nature of introspection, where this peculiarity lies in the fact that our methods for knowing our own mental states are (intuitively) different from those we use to know others’ (Byrne 2011, 2018). Further, on Byrne’s version of inferentialism, the inferences we form by trying to follow THINK are extremely likely to amount to knowledge—thereby cohering with the intuition that knowledge of one’s own current thoughts is epistemically privileged. Whereas, the kinds of metacognitive inferences that Carruthers and Cassam envision to rely on inner speech are (by their own telling) epistemically on a par with our inferences about the mental states of others and far more susceptible to error.

Several philosophers object that inferentialist proposals leave us at too great an epistemic distance from our own thoughts (Bar-On & Ochs 2018; Roessler 2016) or have other unworkable features (Langland-Hassan 2014; Martínez-Manrique & Vicente 2010; Roessler 2016). Roessler (2016) pursues a non-observational account of the role of inner speech in generating self-knowledge. Rejecting the idea that we need to “eavesdrop on ourselves” by attending to our inner speech, Roessler suggests we follow remarks of Ryle (1949 [2009]) and Anscombe (1957) in understanding the knowledge gained through inner speech as a kind of “practical knowledge”, (or, for Ryle, “serial knowledge”), where knowing what one is thinking is understood as a special case of knowing what one is doing.

Bar-On & Ochs (2018) likewise take aim at what they term “Neo-Rylean” invocations of inner speech, arguing that Byrne’s THINK rule fails to identify a special role for inner speech in facilitating self-knowledge. Drawing on Bar-On’s (2004) broader expressivist approach to self-knowledge, Bar-On and Ochs hold that a proper account of inner speech’s role in self-knowledge should show how such knowledge is “distinctive and uniquely first-personal” in that it is

knowledge that one can be said to have in virtue of being in a privileged position to give direct voice to one’s thoughts. (2018: 20)

They do not, however, develop a positive account in detail.

Vicente & Martínez-Manrique (2005, 2008; Martínez-Manrique & Vicente 2010) have criticized Bermúdez’s and related inferentialist views on the grounds that the semantics of natural language sentences—and inner speech episodes, in particular—are underdetermined in ways incompatible with providing knowledge of one’s thoughts. For instance, the sentence “Jane’s cup is full”, is ambiguous in several ways, including the sense in which it is Jane’s cup (does she own it? is she just using it? is it the one she merely wants?) and the sense in which it is full (is it full of air? of liquid? of coins?). If the explicit meaning of a sentence is only extracted (and disambiguated) at the level of thought itself, they argue, it is unclear how awareness of semantically indeterminate inner speech utterances could suffice for awareness of one’s own—presumably explicit and unambiguous—propositional thoughts. Bermúdez replies in his 2018 paper.

Jorba & Vicente (2014) and Martínez-Manrique & Vicente (2015) criticize what they call the “format view” of inner speech (which they attribute to Jackendoff and others) which holds that we are conscious of our inner speech episodes only because of their sensory format (see also Fernández Castro 2016). If these criticisms succeed, they cast doubt on views, such as those of Carruthers (2010), Jackendoff (1996), and Prinz (2012), which link the metacognitive or introspective value of inner speech to its occurrence in a sensory format.

Langland-Hassan (2014) raises a different sort of challenge for inferentialist views. Recall that it is a common assumption of those views that propositional thought itself is amodal (i.e., non-sensory) and non-conscious. For theorists such as Carruthers, Prinz, Jackendoff, and Bermúdez, inner speech is a conscious mental process just because it has sensory features that render it the sort of state that is apt to be conscious. Langland-Hassan argues that there is a conflict in holding that an episode of inner speech is a single mental state with both sensory features (relating to the representation of phonemes) and semantic features (relating to the meanings of the corresponding words). If this criticism is correct, it creates problems for the proposal that inner speech is especially well suited (due to its sensory character) to serve as input to inferences about one’s non-conscious mental states. Bermúdez (2018), Carruthers (2018), and Munroe (2023) have articulated different ways of responding to this challenge (see also Prinz 2011 for relevant remarks).

Inner speech features prominently in philosophical and cognitive scientific discussions of auditory verbal hallucinations (AVHs) and thought insertion. Both are common symptoms of schizophrenia but can occur in other contexts (e.g., brain injury, drug use) as well. AVHs are hallucinatory experiences of another’s speech, while thought insertion is understood either as a non-veridical experience of having someone else’s thoughts in one’s mind (Wing et al. 1990), or simply as the delusional belief that someone else’s thoughts are in one’s mind (Andreasen 1984). Two central questions explored by theorists are, first, whether (abnormal) inner speech is indeed the basis of AVHs or thought insertion, and, second, what might lead an episode of inner speech to be experienced as an AVH or inserted thought.

On the first question, an initially plausible approach to AVHs is to hold that they are more a matter of hallucinatory speech perception than of (unwitting) speech production, and thus not well conceived as episodes of inner speech. Wu (2012) and Cho & Wu (2013, 2014) advance a theory of this kind, holding that AVHs result from the spontaneous activation of speech perception areas in the brain. On their account, inner speech—and, in particular, the neural regions implicated in speech production—are not implicated in AVHs. Despite the attractive simplicity of this account, most researchers have pursued options that explicitly involve inner speech, for several reasons. First, in formal surveys, patients often report that the phenomenological characteristics of their AVHs are different from those of hearing speech, insofar as their AVHs are not as subjectively “loud” as cases of hearing speech, are not equally rich in sensory features, and do not always seem to emanate from outside the head (Stephens & Graham 2000; Hoffman et al. 2008; Laroi et al. 2012; Nayani & David 1996; Stephane 2019). It appears that an explanation of the seemingly “alien” nature of these episodes, as well as of thought insertion, will require some other apparatus than an appeal to perception-like phenomenology. Given the need for such an alternative, one may hope to extend it also to cases of AVHs that are reported as having rich, perception-like phenomenological features (Langland-Hassan 2008; Moseley & Wilkinson 2014).

Second, neuroimaging has shown activation in both language perception and language production areas when patients are experiencing AVHs (Allen, Aleman, & Mcguire 2007; Allen, Modinos, et al. 2012; Bohlken, Hugdahl, & Sommer 2017). Here as in other areas of the study of inner speech, it is important to recognize that the neural regions underlying speech production (such as Broca’s area, in the left inferior frontal gyrus) are distinct from those governing speech perception (such as Wernicke’s area, in the superior temporal gyrus). This is why damage to one area but not the other (as in some cases of stroke) can result in markedly different language impairments. The fact that the mechanisms governing speech production and perception are dissociable in these ways provides an important means for assessing whether AVHs are best viewed as productive or perceptual (or both) in nature.

Nevertheless, those who see abnormal inner speech episodes as the basis for AVHs or thought insertion face a difficult task in explaining what would lead a person to not identify their own inner speech as their own, or to not feel in control of their own inner speech. Some have offered content-based explanations, where it is some feature of the semantic content of the inner speech that leads a person to disown it. For instance, Stephens and Graham (2000) argue that a patient may disown inner speech episodes with contents that are “intentionally inexplicable”, in the sense that they are not easily accommodated within a coherent self-narrative (see also Roessler (2013), Sollberger (2014), Bortolotti & Broome (2009), and Fernández (2010) on the idea that AVHs or inserted thoughts are episodes with contents the patient is unwilling to endorse). Challenges for this approach are patient reports of voices that are helpful or encouraging. As the Swiss psychiatrist Eugen Bleuler notes in early work on people with schizophrenia, “besides their persecutors, the patients often hear the voice of some protector”, and, occasionally, the hallucinatory voices “represent sound criticism of the [patient’s] delusional thoughts and pathological drives” (1911 [1950: 98]).

A popular alternative approach—sometimes known as the “comparator” or “sensory feedback” approach—builds on work in cognitive neuroscience concerning the mechanisms by which bodily movements are determined to be one’s own (Feinberg 1978; Frith 1992; Miall et al. 1993; Wolpert, Miall & Kawato 1998). The basic idea behind these approaches is that, below the level of consciousness, the brain is continually generating predictions about the likely sensory consequences of planned actions, which are then compared with actual sensory feedback. When there is a mismatch between the prediction and sensory feedback, one may have the phenomenological sense of not being in control of one’s actions (Frith 2012). A number of authors have proposed that the generation of both inner and outer speech may be attended by the same kind of prediction and comparison mechanisms, and that the malfunctioning of these mechanisms could lead to one’s own inner speech seeming not to be in one’s control (Blakemore, Smith, et al. 2000; Campbell 1999; Langland-Hassan 2016; Proust 2006). These proposals derive some support from the fact that people with schizophrenia have been shown to have broader deficits in automatically anticipating and adjusting for the sensory consequences of their own actions (Blakemore, Smith et al. 2000; Blakemore, Wolpert, & Frith 1998).

Nevertheless, the comparator approach to AVHs and thought insertion has come in for criticism on several grounds (Synofzik, Vosgerau & Newen 2008; Vicente 2014; Vosgerau & Newen 2007). One complaint has been that the lack of sensory features associated with inserted thoughts, in particular, makes sensory-feedback approaches ill-suited to their explanation (Vosgerau & Newen 2007). In response, some defenders have shifted to pitching the thesis in terms of predictive processing models of perception and action (Gerrans 2015; Swiney 2018; Swiney & Sousa 2014; Wilkinson 2014; Wilkinson & Fernyhough 2017), while others have developed other alternatives (Langland-Hassan forthcoming). The matter of how best to characterize the phenomenology and underlying etiology of AVHs and thought insertion—and the relation of each to inner speech—together with the precise relationship between predictive processing models and the comparator approach, remain active areas of research. See Wilkinson & Alderson-Day (2016) for an introduction to an edited special-issue on the topic oriented at philosophers; see López-Silva & McClelland (forthcoming) for a philosophically-oriented anthology on thought insertion. (Note: Parts of this section draw on a more in-depth overview in Langland-Hassan 2021).

  • Ackerman, Rakefet and Valerie A. Thompson, 2015, “Meta-Reasoning: What Can We Learn from Meta-Memory?”, in Reasoning as Memory , Aidan Feeney and Valerie A. Thompson (eds.), London: Psychology Press, 164–182.
  • –––, 2017a, “Meta-Reasoning: Monitoring and Control of Thinking and Reasoning”, Trends in Cognitive Sciences , 21(8): 607–617. doi:10.1016/j.tics.2017.05.004
  • –––, 2017b, “Meta-Reasoning: Shedding Metacognitive Light on Reasoning Research”, in Routledge International Handbook of Thinking and Reasoning , Linden J. Ball and Valerie A. Thompson (eds.), Oxford: Routledge, 1–15.
  • Alderson-Day, Ben and Charles Fernyhough, 2015, “Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology”, Psychological Bulletin , 141(5): 931–965. doi:10.1037/bul0000021
  • Alderson-Day, Ben, Kaja Mitrenga, Sam Wilkinson, Simon McCarthy-Jones, and Charles Fernyhough, 2018, “The Varieties of Inner Speech Questionnaire—Revised (VISQ-R): Replicating and Refining Links between Inner Speech and Psychopathology”, Consciousness and Cognition , 65: 48–58. doi:10.1016/j.concog.2018.07.001
  • Allen, Paul, Andre Aleman, and Philip K. Mcguire, 2007, “Inner Speech Models of Auditory Verbal Hallucinations: Evidence from Behavioural and Neuroimaging Studies”, International Review of Psychiatry , 19(4): 407–415. doi:10.1080/09540260701486498
  • Allen, Paul, Gemma Modinos, Daniela Hubl, Gregory Shields, Arnaud Cachia, Renaud Jardri, Pierre Thomas, Todd Woodward, Paul Shotbolt, Marion Plaze, and Ralph Hoffman, 2012, “Neuroimaging Auditory Hallucinations in Schizophrenia: From Neuroanatomy to Neurochemistry and Beyond”, Schizophrenia Bulletin , 38(4): 695–703. doi:10.1093/schbul/sbs066
  • Andreasen, Nancy C., 1984, Scale for the Assessment of Positive Symptoms (SAPS) , Iowa City: University of Iowa.
  • Anscombe, G. E. M., 1957, “Intention”, Proceedings of the Aristotelian Society , 57: 321–332. doi:10.1093/aristotelian/57.1.321
  • Austin, J. L., 1962, How To Do Things with Words , Cambridge, MA: Harvard University Press.
  • Baars, Bernard J., 1988, A Cognitive Theory of Consciousness , Cambridge/New York: Cambridge University Press.
  • Bach, Kent and Robert M. Harnish, 1979, Linguistic Communication and Speech Acts , Cambridge, MA: MIT Press.
  • Baddeley, Alan, 1992, “Working Memory”, Science , 255(5044): 556–559. doi:10.1126/science.1736359
  • Bar-On, Dorit, 2004, Speaking My Mind , New York: Oxford University Press.
  • Bar-On, Dorit and Jordan Ochs, 2018, “The Role of Inner Speech in Self-Knowledge: Against Neo-Rylean Views”, Teorema: Revista Internacional de Filosofía , 37(1): 5–22.
  • Barsalou, Lawrence W., 1999, “Perceptual Symbol Systems”, Behavioral and Brain Sciences , 22(4): 577–660. doi:10.1017/S0140525X99002149
  • Bermúdez, José Luis, 2003, Thinking without Words , (Philosophy of Mind Series), Oxford/New York: Oxford University Press.
  • –––, 2018, “Inner Speech, Determinacy, and Thinking Consciously About Thoughts”, in Langland-Hassan and Vicente 2018a: 199–220 (ch. 7).
  • Blakemore, Sarah-Jayne, J. Smith, R. Steel, E. C. Johnstone, and C. D. Frith, 2000, “The Perception of Self-Produced Sensory Stimuli in Patients with Auditory Hallucinations and Passivity Experiences: Evidence for a Breakdown in Self-Monitoring”, Psychological Medicine , 30(5): 1131–1139. doi:10.1017/S0033291799002676
  • Blakemore, Sarah-Jayne, Daniel M. Wolpert, and Chris D. Frith, 1998, “Central Cancellation of Self-Produced Tickle Sensation”, Nature Neuroscience , 1(7): 635–640. doi:10.1038/2870
  • Bleuler, Eugen, 1911 [1950], Dementia praecox: oder Gruppe der Schizophrenien , (Handbuch der Psychiatrie. Spezieller Teil, 4. Abt., 1. Hälfte), Leipzig: F. Deuticke. Translated as Dementia Praecox, or, The Group of Schizophrenias , Joseph Zinkin (trans.), (Monograph Series on Schizophrenia 1), New York: International Universities Press, 1950.
  • Bohlken, M. M., K. Hugdahl, and I. E. C. Sommer, 2017, “Auditory Verbal Hallucinations: Neuroimaging and Treatment”, Psychological Medicine , 47(2): 199–208. doi:10.1017/S003329171600115X
  • Borghi, Anna M., Ferdinand Binkofski, Cristiano Castelfranchi, Felice Cimatti, Claudia Scorolli, and Luca Tummolini, 2017, “The Challenge of Abstract Concepts”, Psychological Bulletin , 143(3): 263–292. doi:10.1037/bul0000089
  • Bortolotti, Lisa and Matthew Broome, 2009, “A Role for Ownership and Authorship in the Analysis of Thought Insertion”, Phenomenology and the Cognitive Sciences , 8(2): 205–224. doi:10.1007/s11097-008-9109-z
  • Byrne, Alex, 2011, “Knowing That I Am Thinking”, in Self-Knowledge , Anthony Hatzimoysis (ed.), Oxford/New York: Oxford University Press, pp. 105–124.
  • –––, 2018, Transparency and Self-Knowledge , Oxford: Oxford University Press. doi:10.1093/oso/9780198821618.001.0001
  • Campbell, John, 1999, “Schizophrenia, the Space of Reasons, and Thinking as a Motor Process”:, Monist , 82(4): 609–625. doi:10.5840/monist199982426
  • Carruthers, Peter, 1996, Language, Thought and Consciousness: An Essay in Philosophical Psychology , Cambridge/New York: Cambridge University Press. doi:10.1017/CBO9780511583360
  • –––, 2002, “The Cognitive Functions of Language”, Behavioral and Brain Sciences , 25(6): 657–674. doi:10.1017/S0140525X02000122
  • –––, 2006, The Architecture of the Mind: Massive Modularity and the Flexibility of Thought , Oxford: Clarendon Press. doi:10.1093/acprof:oso/9780199207077.001.0001
  • –––, 2009, “How We Know Our Own Minds: The Relationship between Mindreading and Metacognition”, Behavioral and Brain Sciences , 32(2): 121–138. doi:10.1017/S0140525X09000545
  • –––, 2010, “Introspection: Divided and Partly Eliminated”, Philosophy and Phenomenological Research , 80(1): 76–111. doi:10.1111/j.1933-1592.2009.00311.x
  • –––, 2011, The Opacity of Mind: An Integrative Theory of Self-Knowledge , Oxford/New York: Oxford University Press. doi:10.1093/acprof:oso/9780199596195.001.0001
  • –––, 2018, “The Causes and Contents of Inner Speech”, in Langland-Hassan and Vicente 2018a: 31–52 (ch. 1).
  • Cassam, Quassim, 2011, “Knowing What You Believe”, Proceedings of the Aristotelian Society , 111(1): 1–23. doi:10.1111/j.1467-9264.2011.00296.x
  • –––, 2014, Self-Knowledge for Humans , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199657575.001.0001
  • Cho, Raymond and Wayne Wu, 2013, “Mechanisms of Auditory Verbal Hallucination in Schizophrenia”, Frontiers in Psychiatry , 4: article 155. doi:10.3389/fpsyt.2013.00155
  • –––, 2014, “Is Inner Speech the Basis of Auditory Verbal Hallucination in Schizophrenia?”, Frontiers in Psychiatry , 5: article 75. doi:10.3389/fpsyt.2014.00075
  • Clark, Andy, 1998, “Magic Words: How Language Augments Human Computation”, in Language and Thought: Interdisciplinary Themes , Peter Carruthers and Jill Boucher (eds.), Cambridge/New York: Cambridge University Press, 162–183. doi:10.1017/CBO9780511597909.011
  • Clowes, Robert, 2007, “A Self-Regulation Model of Inner Speech and Its Role in the Organisation Of Human Conscious Experience”, Journal of Consciousness Studies , 14(7): 59–71.
  • Davidson, Donald, 1963, “Actions, Reasons, and Causes”, Journal of Philosophy , 60(23): 685–700. doi:10.2307/2023177
  • Deamer, Felicity, 2021, “Why Do We Talk To Ourselves?”, Review of Philosophy and Psychology , 12(2): 425–433. doi:10.1007/s13164-020-00487-5
  • Dennett, Daniel C., 1991, Consciousness Explained , Boston, MA: Little, Brown.
  • Dove, Guy, 2014, “Thinking in Words: Language as an Embodied Medium of Thought”, Topics in Cognitive Science , 6(3): 371–389. doi:10.1111/tops.12102
  • –––, 2018, “Language as a Disruptive Technology: Abstract Concepts, Embodiment and the Flexible Mind”, Philosophical Transactions of the Royal Society B: Biological Sciences , 373(1752): 20170135. doi:10.1098/rstb.2017.0135
  • –––, 2020, “More than a Scaffold: Language Is a Neuroenhancement”, Cognitive Neuropsychology , 37(5–6): 288–311. doi:10.1080/02643294.2019.1637338
  • –––, 2022, Abstract Concepts and the Embodied Mind: Rethinking Grounded Cognition , New York: Oxford University Press. doi:10.1093/oso/9780190061975.001.0001
  • Dretske, Fred, 2003, “How Do You Know You Are Not a Zombie?”, in Privileged Access: Philosophical Accounts of Self-Knowledge , Brie Gertler (ed.), Aldershot/Burlington, VT: Ashgate, 1–14.
  • Feinberg, Irwin, 1978, “Efference Copy and Corollary Discharge: Implications for Thinking and Its Disorders*”, Schizophrenia Bulletin , 4(4): 636–640. doi:10.1093/schbul/4.4.636
  • Fernández, Jordi, 2010, “Thought Insertion and Self-Knowledge”, Mind & Language , 25(1): 66–88. doi:10.1111/j.1468-0017.2009.01381.x
  • Fernández Castro, Víctor, 2016, “Inner Speech in Action”, Pragmatics & Cognition , 23(2): 238–258. doi:10.1075/pc.23.2.02cas
  • –––, 2019, “Inner Speech and Metacognition: A Defense of the Commitment-Based Approach”, Logos & Episteme , 10(3): 245–261. doi:10.5840/logos-episteme201910324
  • Fernyhough, Charles, 1996, “The Dialogic Mind: A Dialogic Approach to the Higher Mental Functions”, New Ideas in Psychology , 14(1): 47–62. doi:10.1016/0732-118X(95)00024-B
  • –––, 2008, “Getting Vygotskian about Theory of Mind: Mediation, Dialogue, and the Development of Social Understanding”, Developmental Review , 28(2): 225–262. doi:10.1016/j.dr.2007.03.001
  • –––, 2009, “Dialogic Thinking”, in Private Speech, Executive Functioning, and the Development of Verbal Self-Regulation , Adam Winsler, Charles Fernyhough, and Ignacio Montero (eds.), Cambridge/New York: Cambridge University Press, 42–52. doi:10.1017/CBO9780511581533.004
  • Fodor, Jerry A., 1975, The Language of Thought , New York: Crowell.
  • Frankfort, Tom, 2022, “Action and Reaction: The Two Voices of Inner Speech”, Teorema: Revista Internacional de Filosofía , 41(1): 51–69.
  • Frankfurt, Harry G., 1978, “The Problem of Action”, American Philosophical Quarterly , 15(2): 157–162.
  • Frankish, Keith, 2004, Mind and Supermind , Cambridge: Cambridge University Press. doi:10.1017/CBO9780511487507
  • –––, 2018, “Inner Speech and Outer Thought”, in Langland-Hassan and Vicente 2018a: 221–243 (ch. 8).
  • Frith, Chris D., 1992, The Cognitive Neuropsychology of Schizophrenia , Hove, UK: Lawrence Erlbaum.
  • –––, 2012, “Explaining Delusions of Control: The Comparator Model 20years On”, Consciousness and Cognition , 21(1): 52–54. doi:10.1016/j.concog.2011.06.010
  • Gauker, Christopher, 2011, Words and Images: An Essay on the Origin of Ideas , Oxford/New York: Oxford University Press. doi:10.1093/acprof:oso/9780199599462.001.0001
  • –––, 2018, “Inner Speech as the Internalization of Outer Speech”, in Langland-Hassan and A. Vicente 2018a: 53–77 (ch. 2).
  • Gerrans, Philip, 2015, “The Feeling of Thinking: Sense of Agency in Delusions of Thought Insertion.”, Psychology of Consciousness: Theory, Research, and Practice , 2(3): 291–300. doi:10.1037/cns0000060
  • Geurts, Bart, 2018, “Making Sense of Self Talk”, Review of Philosophy and Psychology , 9(2): 271–285. doi:10.1007/s13164-017-0375-y
  • Grandchamp, Romain, Lucile Rapin, Marcela Perrone-Bertolotti, Cédric Pichat, Célise Haldin, Emilie Cousin, Jean-Philippe Lachaux, Marion Dohen, Pascal Perrier, Maëva Garnier, Monica Baciu, and Hélène Lœvenbruck, 2019, “The ConDialInt Model: Condensation, Dialogality, and Intentionality Dimensions of Inner Speech Within a Hierarchical Predictive Control Framework”, Frontiers in Psychology , 10: article 2019. doi:10.3389/fpsyg.2019.02019
  • Gregory, Daniel, 2016, “Inner Speech, Imagined Speech, and Auditory Verbal Hallucinations”, Review of Philosophy and Psychology , 7(3): 653–673. doi:10.1007/s13164-015-0274-z
  • –––, 2017, “Is inner speech dialogic?”, Journal of Consciousness Studies , 24(1–2): 111–137.
  • –––, 2018, “The Feeling of Sincerity: Inner Speech and the Phenomenology of Assertion”, Thought: A Journal of Philosophy , 7(4): 225–236. doi:10.1002/tht3.391
  • –––, 2020a, “Review of Inner Speech: New Voices , edited by Peter Langland-Hassan and Agustín Vicente”, Analysis , 80(1): 164–173. doi:10.1093/analys/anz096
  • –––, 2020b, “Are Inner Speech Utterances Actions?”, Teorema: Revista Internacional de Filosofía , 39(3): 55–78.
  • –––, forthcoming, “How Not to Decide Whether Inner Speech Is Speech: Two Common Mistakes”, Phenomenology and the Cognitive Sciences , first online: 25 April 2022. doi:10.1007/s11097-022-09814-w
  • Grice, H. P., 1975 [2013], “Logic and Conversation”, in Syntax and Semantics, Volume 3 , Peter Cole and Jerry L. Morgan (eds.), New York: Academic Press, 41–58. Reprinted in The Philosophy of Language , A. P. Martinich and David Sosa (eds.), sixth edition, New York: Oxford University Press, 2013, 312–322.
  • Heavey, Christopher L. and Russell T. Hurlburt, 2008, “The Phenomena of Inner Experience”, Consciousness and Cognition , 17(3): 798–810. doi:10.1016/j.concog.2007.12.006
  • Heavey, Christopher L., Stefanie A. Moynihan, Vincent P. Brouwers, Leiszle Lapping-Carr, Alek E. Krumm, Jason M. Kelsey, Dio K. Turner, and Russell T. Hurlburt, 2019, “Measuring the Frequency of Inner-Experience Characteristics by Self-Report: The Nevada Inner Experience Questionnaire”, Frontiers in Psychology , 9: article 2615. doi:10.3389/fpsyg.2018.02615
  • Hill, Christopher S., 2022, Perceptual Experience , Oxford/New York: Oxford University Press. doi:10.1093/oso/9780192867766.001.0001
  • Hoffman, R. E., M. Varanko, J. Gilmore, and A. L. Mishara, 2008, “Experiential Features Used by Patients with Schizophrenia to Differentiate ‘Voices’ from Ordinary Verbal Thought”, Psychological Medicine , 38(8): 1167–1176. doi:10.1017/S0033291707002395
  • Hornsby, Jennifer, 1980, Actions , London: Routledge & Kegan Paul.
  • Hurlburt, Russell T. and Christopher L. Heavey, 2002, “Interobserver Reliability of Descriptive Experience Sampling”, Cognitive Therapy and Research , 26(1): 135–142. doi:10.1023/A:1013802006827
  • Indefrey, P and W.J.M Levelt, 2004, “The Spatial and Temporal Signatures of Word Production Components”, Cognition , 92(1–2): 101–144. doi:10.1016/j.cognition.2002.06.001
  • Jackendoff, Ray, 1996, “How Language Helps Us Think”, Pragmatics & Cognition , 4(1): 1–34. doi:10.1075/pc.4.1.03jac
  • –––, 2007, Language, Consciousness, Culture: Essays on Mental Structure , Cambridge, MA: MIT Press.
  • –––, 2011, “What is the Human Language Faculty? Two Views”, Language , 87(3): 586–624.
  • –––, 2012, A User’s Guide to Thought and Meaning , Oxford/New York: Oxford University Press.
  • Jorba, Marta, 2020, “Husserlian Horizons, Cognitive Affordances and Motivating Reasons for Action”, Phenomenology and the Cognitive Sciences , 19(5): 847–868. doi:10.1007/s11097-019-09648-z
  • –––, forthcoming, “El Habla Interna en el Marco de las Affordances”, in Hacia una Concepción Integral de la Mente: Aportaciones en Filosofía de la Mente y en Ciencia Cognitiva , D. P. Chico (ed.), Zaragoza: Prensas de la Universidad de Zaragoza.
  • Jorba, Marta and Agustín Vicente, 2014, “Cognitive Phenomenology, Access to Contents, and Inner Speech”, Journal of Consciousness Studies , 21(9–10): 74–99.
  • Kompa, Nikola A., forthcoming, “Inner Speech and ‘Pure’ Thought—Do We Think in Language?”, Review of Philosophy and Psychology , first online: 31 January 2023. doi:10.1007/s13164-023-00678-w
  • Kompa, Nikola A. and Jutta L. Mueller, forthcoming, “Inner Speech as a Cognitive Tool—or What Is the Point of Talking to Oneself?”, Philosophical Psychology , first online: 5 August 2022. doi:10.1080/09515089.2022.2112164
  • Langland-Hassan, Peter, 2008, “Fractured Phenomenologies: Thought Insertion, Inner Speech, and the Puzzle of Extraneity”, Mind & Language , 23(4): 369–401. doi:10.1111/j.1468-0017.2008.00348.x
  • –––, 2014, “Inner Speech and Metacognition: In Search of a Connection: Inner Speech and Metacognition”, Mind & Language , 29(5): 511–533. doi:10.1111/mila.12064
  • –––, 2015, “Imaginative Attitudes”, Philosophy and Phenomenological Research , 90(3): 664–686. doi:10.1111/phpr.12115
  • –––, 2016, “Hearing a Voice as One’s Own: Two Views of Inner Speech Self-Monitoring Deficits in Schizophrenia”, Review of Philosophy and Psychology , 7(3): 675–699. doi:10.1007/s13164-015-0250-7
  • –––, 2018, “From Introspection to Essence: The Auditory Nature of Inner Speech”, in Langland-Hassan and Vicente 2018a: 78–104 (ch. 3).
  • –––, 2021, “Inner Speech”, WIREs Cognitive Science , 12(2): e1544. doi:10.1002/wcs.1544
  • –––, forthcoming, “Thought Insertion as a Persecutory Delusion”, in López-Silva and McClelland forthcoming.
  • Langland-Hassan, Peter and Agustín Vicente (eds.), 2018a, Inner Speech: New Voices , Oxford/New York: Oxford University Press. doi:10.1093/oso/9780198796640.001.0001
  • –––, 2018b, “Introduction”, in Langland-Hassan and Vicente 2018a: 1–28.
  • Larøi, Frank, Iris E. Sommer, Jan Dirk Blom, Charles Fernyhough, Dominic H. ffytche, Kenneth Hugdahl, Louise C. Johns, Simon McCarthy-Jones, Antonio Preti, Andrea Raballo, et al., 2012, “The Characteristic Features of Auditory Verbal Hallucinations in Clinical and Nonclinical Groups: State-of-the-Art Overview and Future Directions”, Schizophrenia Bulletin , 38(4): 724–733. doi:10.1093/schbul/sbs061
  • Levelt, William J. M., 1989, Speaking: From Intention to Articulation , (ACL-MIT Press Series in Natural-Language Processing), Cambridge, MA: MIT Press. doi:10.7551/mitpress/6393.001.0001
  • Levelt, Willem J. M., Ardi Roelofs, and Antje S. Meyer, 1999, “A Theory of Lexical Access in Speech Production”, Behavioral and Brain Sciences , 22(1): 1–38. doi:10.1017/S0140525X99001776
  • Lœvenbruck, H., R. Grandchamp, L. Rapin, L. Nalborczyk, M. Dohen, P. Perrier, M. Baciu, and M. Perrone-Bertolotti, 2018, “A Cognitive Neuroscience View of Inner Language: To Predict and to Hear, See, Feel”, in Langland-Hassan and Vicente 2018a: 131–167 (ch. 5).
  • López-Silva, Pablo and Tom McClelland (ed.), forthcoming, Intruders in the Mind: Interdisciplinary Perspectives on Thought Insertion, Oxford: Oxford University Press.
  • Machery, Edouard, 2005, “You Don’t Know How You Think: Introspection and Language of Thought”, The British Journal for the Philosophy of Science , 56(3): 469–485. doi:10.1093/bjps/axi130
  • –––, 2018, “Know Thyself: Beliefs vs. Desires in Inner Speech”, in Langland-Hassan and Vicente 2018a: 261–275 (ch. 10).
  • Martínez-Manrique, Fernando and Agustín Vicente, 2010, “‘What The…!’ The Role of Inner Speech in Conscious Thought”, Journal of Consciousness Studies , 17(9–10): 141–167.
  • –––, 2015, “The Activity View of Inner Speech”, Frontiers in Psychology , 6: article 232. doi:10.3389/fpsyg.2015.00232
  • McCarthy-Jones, Simon and Charles Fernyhough, 2011, “The Varieties of Inner Speech: Links between Quality of Inner Speech and Psychopathological Variables in a Sample of Young Adults”, Consciousness and Cognition , 20(4): 1586–1593. doi:10.1016/j.concog.2011.08.005
  • McClelland, Tom, 2020, “The Mental Affordance Hypothesis”, Mind , 129(514): 401–427. doi:10.1093/mind/fzz036
  • Mele, Alfred, 2009, “Mental Action: A Case Study”, in Mental Actions , Lucy O’Brien and Matthew Soteriou (eds.), Oxford/New York: Oxford University Press, 17–37 (ch. 2). doi:10.1093/acprof:oso/9780199225989.003.0002
  • Miall, R. C., D. J. Weir, D. M. Wolpert, and J. F. Stein, 1993, “Is the Cerebellum a Smith Predictor?”, Journal of Motor Behavior , 25(3): 203–216. doi:10.1080/00222895.1993.9942050
  • Miller, J. T. M., 2021, “A Bundle Theory of Words”, Synthese , 198(6): 5731–5748. doi:10.1007/s11229-019-02430-3
  • Moseley, Peter and Sam Wilkinson, 2014, “Inner Speech Is Not so Simple: A Commentary on Cho and Wu (2013)”, Frontiers in Psychiatry , 5: article 42. doi:10.3389/fpsyt.2014.00042
  • Munroe, Wade, 2022a, “What It Takes to Make a Word (Token)”, Synthese , 200(4): article 287. doi:10.1007/s11229-022-03751-6
  • –––, 2022b, “Why Are You Talking to Yourself? The Epistemic Role of Inner Speech in Reasoning”, Noûs , 56(4): 841–866. doi:10.1111/nous.12385
  • –––, 2023, “Thinking through Talking to Yourself: Inner Speech as a Vehicle of Conscious Reasoning”, Philosophical Psychology , 36(2): 292–318. doi:10.1080/09515089.2022.2042505
  • –––, forthcoming, “Semiotics in the Head: Thinking about and Thinking through Symbols”, Philosophy and Phenomenological Research , first online: 8 November 2022. doi:10.1111/phpr.12923
  • Nayani, Tony H. and Anthony S. David, 1996, “The Auditory Hallucination: A Phenomenological Survey”, Psychological Medicine , 26(1): 177–189. doi:10.1017/S003329170003381X
  • O’Brien, Lucy, 2013, “Obsessive Thoughts and Inner Voices”, Philosophical Issues , 23: 93–108. doi:10.1111/phis.12005
  • O’Shaughnessy, Brain, 1973, “Trying (As the Mental ‘Pineal Gland’):”, Journal of Philosophy , 70(13): 365–386. doi:10.2307/2024676
  • Patel, Shivam, 2021, “From Speech to Voice: On the Content of Inner Speech”, Synthese , 199(3–4): 10929–10952. doi:10.1007/s11229-021-03274-6
  • Perrone-Bertolotti, M., L. Rapin, J.-P. Lachaux, M. Baciu, and H. Lœvenbruck, 2014, “What Is That Little Voice inside My Head? Inner Speech Phenomenology, Its Role in Cognitive Performance, and Its Relation to Self-Monitoring”, Behavioural Brain Research , 261: 220–239. doi:10.1016/j.bbr.2013.12.034
  • Petrolini, Valentina, Marta Jorba, and Agustín Vicente, 2020, “The Role of Inner Speech in Executive Functioning Tasks: Schizophrenia With Auditory Verbal Hallucinations and Autistic Spectrum Conditions as Case Studies”, Frontiers in Psychology , 11: article 572035. doi:10.3389/fpsyg.2020.572035
  • Piaget, Jean, 1923 [1926/1959], Le Langage et la pensée chez l’enfant , Paris: Delachaux and Niestlé. Translated as The Language and Thought of the Child , Marjorie Warden (trans.), (International Library of Psychology, Philosophy and Scientific Method), London/New York: K. Paul, Trench, Trubner/Harcourt Brace, 1926. Third edition, revised and enlarged, 1959, Marjorie and Ruth Gabain (trans.),, London/New York: Routledge & Kegan Paul/Humanities Press.
  • Prinz, Jesse J., 2011, “The Sensory Basis of Cognitive Phenomenology1”, in Cognitive Phenomenology , Tim Bayne and Michelle Montague (eds.), Oxford/New York: Oxford University Press, 174–196. doi:10.1093/acprof:oso/9780199579938.003.0008
  • –––, 2012, The Conscious Brain: How Attention Engenders Experience , (Philosophy of Mind), New York: Oxford University Press. doi:10.1093/acprof:oso/9780195314595.001.0001
  • Proust, Joëlle, 2006, “Agency in Schizophrenia from a Control Theory Viewpoint”, in Disorders of Volition , Natalie Sebanz and Wolfgang Prinz (eds), Cambridge, MA: MIT Press, pp. 87–118 (ch. 5).
  • –––, 2013, The Philosophy of Metacognition: Mental Agency and Self-Awareness , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199602162.001.0001
  • Roessler, Johannes, 2013, “Thought Insertion, Self-Awareness, and Rationality”, in The Oxford Handbook of Philosophy and Psychiatry , K. W. M. Fulford, Martin Davies, Richard Gipps, George Graham, John Z. Sadler, Giovanni Stanghellini, and Tim Thornton (eds.), Oxford: Oxford University Press, 658–672.
  • –––, 2016, “Thinking, Inner Speech, and Self-Awareness”, Review of Philosophy and Psychology , 7(3): 541–557. doi:10.1007/s13164-015-0267-y
  • Ryle, Gilbert, 1949 [2009], The Concept of Mind , London: Hutchinson’s University Library. 60th anniversary edition, London/New York: Routledge, 2009.
  • Sandler, Wendy, 2012, “The Phonological Organization of Sign Languages: Sign Language Phonology”, Language and Linguistics Compass , 6(3): 162–182. doi:10.1002/lnc3.326
  • Schwitzgebel, Eric, 2010 [2019], “Introspection”, The Stanford Encyclopedia of Philosophy (Winter 2019 Edition), Edward N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/win2019/entries/introspection/ >.
  • Searle, John R., 1969, Speech Acts: An Essay in the Philosophy of Language , London: Cambridge University Press.
  • Sellars, Wilfrid, 1956, “Empiricism and the Philosophy of Mind”, in Minnesota Studies in the Philosophy of Science, Volume 1: Foundations of Science and the Concepts of Psychology and Psychoanalysis , Herbert Feigl and Michael Scriven (eds.), Minneapolis, MN: University of Minnesota Press, pp. 253–329.
  • Shoemaker, Sydney, 1994, “Self-Knowledge and ‘Inner Sense’: Lecture I: The Object Perception Model”, Philosophy and Phenomenological Research , 54(2): 249–269. doi:10.2307/2108488
  • Siegel, Susanna, 2005 [2021], “The Contents of Perception”, The Stanford Encyclopedia of Philosophy (Fall 2021 Edition), Edward N. Zalta, URL = < https://plato.stanford.edu/archives/fall2021/entries/perception-contents/ >.
  • Sollberger, Michael, 2014, “Making Sense of an Endorsement Model of Thought-Insertion”, Mind & Language , 29(5): 590–612. doi:10.1111/mila.12067
  • Stephane, Massoud, 2019, “The Self, Agency and Spatial Externalizations of Inner Verbal Thoughts, and Auditory Verbal Hallucinations”, Frontiers in Psychiatry , 10: article 668. doi:10.3389/fpsyt.2019.00668
  • Stephens, G. Lynn and George Graham, 2000, When Self-Consciousness Breaks: Alien Voices and Inserted Thoughts , (Philosophical Psychopathology. Disorders in Mind), Cambridge, MA: The MIT Press. doi:10.7551/mitpress/7218.001.0001
  • Stokoe, William C., 2005, “Sign Language Structure: An Outline of the Visual Communication Systems of the American Deaf”, Journal of Deaf Studies and Deaf Education , 10(1): 3–37. doi:10.1093/deafed/eni001
  • Strawson, P. F., 1964, “Intention and Convention in Speech Acts”, The Philosophical Review , 73(4): 439–460. doi:10.2307/2183301
  • Swiney, Lauren, 2018, “Activity, Agency, and Inner Speech Pathology”, in Langland-Hassan and Vicente 2018a: 299–331 (ch. 12).
  • Swiney, Lauren and Paulo Sousa, 2014, “A New Comparator Account of Auditory Verbal Hallucinations: How Motor Prediction Can Plausibly Contribute to the Sense of Agency for Inner Speech”, Frontiers in Human Neuroscience , 8: article 675. doi:10.3389/fnhum.2014.00675
  • Synofzik, Matthis, Gottfried Vosgerau, and Albert Newen, 2008, “Beyond the Comparator Model: A Multifactorial Two-Step Account of Agency”, Consciousness and Cognition , 17(1): 219–239. doi:10.1016/j.concog.2007.03.010
  • Thompson, Valerie A., Jamie A. Prowse Turner, Gordon Pennycook, Linden J. Ball, Hannah Brack, Yael Ophir, and Rakefet Ackerman, 2013, “The Role of Answer Fluency and Perceptual Fluency as Metacognitive Cues for Initiating Analytic Thinking”, Cognition , 128(2): 237–251. doi:10.1016/j.cognition.2012.09.012
  • Unkelbach, Christian and Rainier Greifender, 2013, “A General Model of Fluency Effects in Judgment and Decision Making”, in The Experience of Thinking: How the Fluency of Mental Processes Influences Cognition and Behaviour , Christian Unkelbach and Rainer Greifender (eds.), Hove, UK: Psychology Press, pp. 11–32.
  • Vicente, Agustín, 2014, “The Comparator Account on Thought Insertion, Alien Voices and Inner Speech: Some Open Questions”, Phenomenology and the Cognitive Sciences , 13(2): 335–353. doi:10.1007/s11097-013-9303-5
  • Vicente, Agustín and Marta Jorba, 2019, “The Linguistic Determination of Conscious Thought Contents”, Noûs , 53(3): 737–759. doi:10.1111/nous.12239
  • Vicente, Agustín and Fernando Martinez-Manrique, 2005, “Semantic Underdetermination and the Cognitive Uses of Language”, Mind & Language , 20(5): 537–558. doi:10.1111/j.0268-1064.2005.00299.x
  • –––, 2008, “Thought, Language, and the Argument from Explicitness”, Metaphilosophy , 39(3): 381–401. doi:10.1111/j.1467-9973.2008.00545.x
  • –––, 2011, “Inner Speech: Nature and Functions: Inner Speech”, Philosophy Compass , 6(3): 209–219. doi:10.1111/j.1747-9991.2010.00369.x
  • –––, 2016, “The Nature of Unsymbolized Thinking”, Philosophical Explorations , 19(2): 173–187. doi:10.1080/13869795.2016.1176234
  • Vosgerau, Gottfried and Albert Newen, 2007, “Thoughts, Motor Actions, and the Self”, Mind & Language , 22(1): 22–43. doi:10.1111/j.1468-0017.2006.00298.x
  • Vygotsky, Lev S., 1934 [1986], Myshlenie i rechʹ . Translated as Thought and Language , Alex Kozulin (trans.), Cambridge, MA: MIT Press, 1986.
  • Wilkinson, Sam, 2014, “Accounting for the Phenomenology and Varieties of Auditory Verbal Hallucination within a Predictive Processing Framework”, Consciousness and Cognition , 30: 142–155. doi:10.1016/j.concog.2014.09.002
  • –––, 2020, “The Agentive Role of Inner Speech in Self-Knowledge”, Teorema: Revista Internacional de Filosofía , 39(2): 7–26.
  • Wilkinson, Sam and Ben Alderson-Day, 2016, “Voices and Thoughts in Psychosis: An Introduction”, Review of Philosophy and Psychology , 7(3): 529–540. doi:10.1007/s13164-015-0288-6
  • Wilkinson, Sam and Charles Fernyhough, 2017, “Auditory Verbal Hallucinations and Inner Speech: A Predictive Processing Perspective”, in Before Consciousness: In Search of the Fundamentals of Mind , Zdravko Radman (ed.), Exeter, UK: Imprint Academic, 285–304. [ Wilkinson and Fernyhough 2017 available online ]
  • –––, 2018, “When Inner Speech Misleads”, in Langland-Hassan and Vicente 2018a: 244–260 (ch. 9).
  • Wing, J. K., T. Babor, T. Brugha, J. Burke, J. E. Cooper, R. Giel, A. Jablenski, D. Regier, and N. Sartorius, 1990, “SCAN. Schedules for Clinical Assessment in Neuropsychiatry”, Archives of General Psychiatry , 47(6): 589–593. doi:10.1001/archpsyc.1990.01810180089012
  • Wolpert, Daniel M, R. Chris Miall, and Mitsuo Kawato, 1998, “Internal Models in the Cerebellum”, Trends in Cognitive Sciences , 2(9): 338–347. doi:10.1016/S1364-6613(98)01221-2
  • Wu, Wayne, 2012, “Explaining Schizophrenia: Auditory Verbal Hallucination and Self-Monitoring”, Mind & Language , 27(1): 86–107. doi:10.1111/j.1468-0017.2011.01436.x
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.

[Please contact the authors with suggestions.]

cognition: embodied | introspection | language of thought hypothesis | mental imagery | mind: modularity of | perception: auditory | perception: the contents of | self-consciousness | self-knowledge | speech acts

Acknowledgments

We thank Peter Carruthers, Christopher Gauker, Christopher Hill, Marta Jorba, Fernando Martínez-Manrique, Lucy O’Brien, Stephen Mann, Wade Munroe, Shivam Patel, Agustín Vicente, and Sam Wilkinson for written feedback and responses to queries. We also thank the audience at the Inner Speech Colloquium in February 2023 and the participants in the INACT Work in Progress Seminar (Daphne Bernués, Mariela Destéfano, and Víctor Verdejo, as well as Marta Jorba) for feedback. Work on this entry was supported in part by the State Research Agency and the Spanish Ministry of Science and Innovation (grant number PID2020-115052GA-Ioo).

Copyright © 2023 by Daniel Gregory < daniel . gregory @ ub . edu > Peter Langland-Hassan < langlapr @ ucmail . uc . edu >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2024 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Kyle D. Killian Ph.D., LMFT

How Inner Monologues Work, and Who Has Them

... and what happens when people can't turn them off..

Posted April 25, 2023 | Reviewed by Michelle Quirk

  • What Is a Career
  • Find a career counselor near me
  • About 30 to 50 percent of people regularly think to themselves in internal monologues.
  • Inner monologues have a function in language development and in information and memory processing.
  • This phenomenon demonstrates a rich diversity of experience in what we deem to be "normal" thought lives.

Flashback to college: I’m sitting in the dining hall with my best friend Dave, and a student enters in a firefighter outfit. Assuming (and probably correctly) that this man is part of the arts/creative “Western campus program,” a woman at our table exclaims, “Those Western people….! They just think all of the time!” My friend and I give each other the side-eye, and he later asks incredulously, “Really? Is there any other option?”

It turns out there is , at least in the sense of having an inner conversation in unspoken words. About 30 to 50 percent of people, according to psychologist Russell Hurlburt’s research, regularly think to themselves in internal monologues. Inner or "private" speech is something most of us likely did as very young people seeking to develop our language skills, and later as a way of rehearsing information to successfully encode and retain working memory . So it’s clearly functional and not a sign of mental disorder.

To get a read on whether you have inner monologues, try listening in and noting an internal voice or intrusive thoughts during meditation . Mindful practice provides tremendous insights into whether you have and how often you have inner monologues. But what about the 50 to 70 percent of folks who don’t have or infrequently have words in their heads?

Visual Imagery

There are different theories, but the simplest (and least condescending/least pejorative) view of folks who don’t regularly have inner monologues is that many of them are processing information and prepping for tasks using visual imagery rather than words. That is, they see images, such as a to-do list, rather than thinking about or hearing the words for the items on the list. I found this explanation helpful because I think in terms of words, visual images, and music all day long, and it’s easy to perspective take on folks who are a bit “quiet” in there by relating to their use of images or their playing back a song in their heads, things I do, too. It’s helpful to think about their inner experience in visual terms, as while it might be quiet as the occasional cricket when it comes to words, it’s not a total void or vacuum in there.

Too Much of a Good Thing

And before those of us start to feel wonderful about our rich inner monologues, it’s important to remember that too much of a good thing is possible. Inner monologues can become a little like King Midas’s golden touch when we can’t turn them off. For instance, anxious minds continuously scan for and entertain intrusive thoughts, and rumination on these can lead to brooding, and brooding can give itself over to highly critical talk about self and others. In the particular instance of an inner critic , a lack of inner monologue sounds like a quiet, peaceful reprieve from constant chatting and potentially corrosive self-talk .

I know people who have internal dialogues and “think in words all of the time.” I know a person who has internal monologues and external dialogues with himself, engaging in rich conversations that can lead in unexpected, even challenging, directions. I can imagine that sounds weird to folks who have only inner monologues or those who have no inner monologues at all. But as Vulcan philosophy states, “Infinite diversity in infinite combinations,” which is another way of saying “Different strokes for different folks,” and empathy, perspective-taking , and imagining what others’ experiences are like—and what it’s like inside their heads—are worthwhile, growth-enhancing, and humanizing exercises.

Inner monologues can represent a rich, deep, “pristine” experience (Hurlburt et al., 2016) for some, as long as they don’t get out of hand, and as long as external monologues/dialogues don’t freak out your friends, family members, and co-workers.

Facebook image: DimaBerlin/Shutterstock

LinkedIn image: Tonuka Stock/Shutterstock

Hurlburt, R.T., Alderson-Day, B., Kuhn, S. & Fernyhough, C. (2016). Exploring the ecological validity of thinking on demand: Neural correlates of elicited vs. spontaneously occurring inner speech. PLoS One , 11 (2), e0147932. DOI: 10.1371/journal.pone.0147932

Kyle D. Killian Ph.D., LMFT

Kyle D. Killian, Ph.D., LMFT is the author of Interracial Couples, Intimacy and Therapy: Crossing Racial Borders.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Online Therapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Therapy Center NEW
  • Diagnosis Dictionary
  • Types of Therapy

March 2024 magazine cover

Understanding what emotional intelligence looks like and the steps needed to improve it could light a path to a more emotionally adept world.

  • Emotional Intelligence
  • Gaslighting
  • Affective Forecasting
  • Neuroscience

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Langland-Hassan P, Vicente A, editors. Inner Speech: New Voices. Oxford (UK): Oxford University Press; 2018.

This chapter is an author manuscript version first made accessible on the NCBI Bookshelf website March 21, 2019.

Cover of Inner Speech

Inner Speech: New Voices.

Chapter 9 when inner speech misleads.

Sam Wilkinson and Charles Fernyhough .

This chapter examines whether and when the experience of inner speech can be inaccurate and thereby mislead the subject. It presents a view about the representational content of speech experience generally and then applies it to inner speech in particular. On such a view, speech experience typically presents us with far more than simply the low-level acoustic properties of speech: it conveys the relevant mental states of the (actual or hypothetical) speaker. Similarly, inner speech presents inner speakers with their own mental states. In light of this, inner speech can mislead either by presenting the subject with mental states they do not in fact have, or by presenting these mental states as belonging to another agent. The chapter reflects on the sorts of contexts in which either of these could occur.

9.1. Introduction

Most philosophers think that at least some experiences have representational content: they represent the world as being a certain way. 1 Representational content dictates accuracy conditions, namely, what would need to be the case in order for the experience to be accurate. Inner speech, that “interior monologue” or familiar voice inside your head, is something that we experience, and that experience of inner speech seems to have representational content: it seems to “tell” the subject that something is going on in the world. Our central question is: What is it that the experience of inner speech is telling the subject is going on in the world, and could it, in some circumstances, be telling the subject something inaccurate? In other words: When, if ever, does the experience of inner speech mislead?

This may seem like a strange question to ask, and its importance may not be immediately obvious, but answering it has a number of significant implications. To start with, the question about whether the experience of inner speech can mislead requires us to answer a more basic question first: what sorts of things enter into the representational content of an experience of inner speech? This question is of tremendous importance since it tells us what the epistemic weight of an experience of inner speech is, namely, the content that it carries. In particular, if we view the experience of inner speech as important to self-knowledge, the content of the experience will tell us more precisely what the route to that self-knowledge is.

A more specific implication of an answer to this question is that there are unusual experiences (often in the context of psychiatric diagnoses), such as auditory verbal hallucinations (AVHs), which are taken by a number of theorists to involve inner speech ( Frith 1992 , Seal et al. 2004 , Jones & Fernyhough 2007 ). If we think of AVHs as experiences of inner speech, we can usefully ask ourselves: is this experience of inner speech telling the subject something inaccurate? And if it is, what aspects of the world aren’t, and which are, the way they are represented as being?

At this point it is important to clarify two things. First, there is the question of what exactly we mean by an “experience of inner speech”. Some might want to say that inner speech simply is an experience. Others might want to say that inner speech is something that we do, and which we have an experience of . At this stage we remain neutral between these two, but it will become clear later on that our position is more in line with the latter. Second, it is important to clarify that we are talking about the experiential content of an experience of inner speech and not its linguistic content . We are not talking about utterances of inner speech linguistically expressing inaccuracies. Thus to draw an analogy with outer speech experience, if someone says “Madrid is the Capital of France”, although they have said something inaccurate, my experience is accurate to the extent that it has accurately represented various features of the utterance, for example, the speech sounds produced, and perhaps more besides (a central part of this chapter is the controversy surrounding this). Now, the extent to which this analogy with outer speech holds is itself up for dispute and will depend upon how we think of inner speech.

We proceed as follows. We start by presenting an intuitively appealing view according to which an episode of inner speech is an imaginative episode, and therefore cannot mislead (at least not in the relevant sense). We criticize this view and reject it in favour of the view that inner speech is actually a kind of speech, rather than merely imagined speech. We then present a view about the representational content of speech experience generally, and then apply it to inner speech in particular. We end, in light of this, by presenting the different ways in which inner speech could potentially mislead.

9.2. Content without Commitment: Inner Speech as Imagination

It is important to distinguish representational content from psychological force. Perceiving and believing have representational content, but they also have a certain psychological force: they don’t merely represent a content, they represent that content as accurate . In other words, they involve, by their very nature, a certain commitment to the world being the way represented. 2 Other psychological states or events (such as suppositions or certain imaginings etc.) on the other hand may represent something but lack that kind of commitment to what is going on in the world. If you voluntarily imagine a pink unicorn, it cannot be regarded as an inaccurate experience just because there is no such thing in front of you (or no such thing in existence at all). 3 The experience is not even in the running for accuracy. That said, the experience has representational content: it is of (or represents) a pink unicorn. What you have in this case of imagining is content (something is represented, it is about something) without commitment to accuracy. Another way of thinking about this lack of commitment to accuracy is that the imaginative episode is not presenting an aspect of the world over and above the experience itself (and so, trivially, it cannot do so inaccurately).

Some might think that inner speech is like that. Inner speech, on this view, is like imagining yourself speaking (and hearing yourself speak). The experience does not inform you about something going on in the world, and, as such, it cannot be wrong since the world cannot act as a benchmark against which the experience can fall short. There simply is the experience, presenting itself pure and simple. At most, if this is true, an inner speech experience tells you the immediate and infallible fact that you are having that very experience. On such a view, inner speech may represent certain things, which would be reflected in the phenomenology of inner speech, much in the same way as imagining a pink unicorn represents certain things (like pinkness and unicorns), and this too is reflected in the phenomenology of the experience, but neither experience purports to tell you anything about the world beyond the experience itself. On such a view, inner speech, as a variety of imagination, cannot be inaccurate: it just is what it is. 4

But is inner speech an instance of imagination? We think that the answer is no. A crucial step to seeing why this is the case involves an appreciation of the distinction between imagination and imagery. Imagination is a whole psychological event in its own right. People are engaged in acts of imagination. These acts of imagination enable them to appreciate, in potentially many different ways, non-actual scenarios, and, when they are engaged in such acts, they may be motivated to do so by a number of different things. They may be trying to judge whether they could have jumped over that river, reason about a social situation, or simply engage in imagination for the pleasure of it. Furthermore it is in the nature of imagination to have content without commitment (which is not to say that it cannot serve, and fail to serve, a given function). These acts of imagination often will recruit or make use of imagery in many modalities, but there will also be aspects to the imaginative experience that aren’t purely imagistic. Imagery, in contrast, is not in itself a complete psychological event. It features as a component of such events. Whereas people imagine things, people don’t “imagize” or “do imagery”. When people imagine things, imagery may be involved, but it is not all that is involved. And, crucially, imagery is also involved in many psychological events that aren’t imaginings. For example, imagery may be involved in episodic recollections. It may even be involved in certain judgements (see, e.g. Langland-Hassan 2015 ). In other words it may be involved in psychological events that, unlike imagining (in the sense that we are using the term), have an inbuilt commitment to how things are (or were , in the case of memory) in the world.

In light of this, it is too quick to move from the (accurate) observation that inner speech involves imagery to the conclusion that an episode of inner speech is a case of imagination . And if it is not a case of imagination then it seems, at least in principle, that, as an experience, it can be committed to telling you something about the world. 5

9.3. Inner Speech as Speech

If inner speech is not imagination, then what is it? In line with a number of other theorists ( Vygotsky 1987/1934 , Fernyhough 1996 , Martinez-Manrique & Vicente 2010 ) our answer is: it is speech . It is speech in two important senses. First, it is a productive rather than recreative activity. Second, its primal use is in making speech acts : asserting, questioning, insulting etc. We take these points in turn.

9.3.1. Inner speech as productive rather than re-creative

To see the productive rather than re-creative nature of inner speech we need to ask ourselves not just, “What is inner speech?” or “What does it look like once developed?” but also: “How and why did it develop?” One attractive theory (which originates in Vygotsky 1987/1934 ), which carries both evolutionary and developmental plausibility, states that inner speech starts off as speech (namely, outer or “overt” speech). That is to say, whatever function inner speech plays, once it has developed, is played by outer speech in children who have not yet developed the capacity to engage in inner speech. This capacity to engage in inner speech is usually seen as partly constituted by the capacity to inhibit the overt production of speech.

According to this story, inner speech is the end product of a developmental trajectory that begins with private speech. “Private speech” refers to outer speech that is not produced for the benefit of anyone other than the speaker. Young children will first, under the guidance of a caregiver, learn to reason verbally, but out loud, for the benefit of guiding their thinking and attention. Over time, they learn to “internalize” this speech, to inhibit its overt production. However, as with many cases of motoric inhibition, vestiges of the motor processes remain. Evidence of motoric involvement in inner speech has been empirically supported by several electromyographical (EMG) studies, measuring muscular activity during inner speech, some of which date as far back as the early 1930s (e.g. Jacobsen 1931 ). In short, these discovered that, when you engage in inner speech, muscles in the face and throat, associated with speaking, are activated (see also Rapin et al. 2013 ).

There have been brain-imaging studies (fMRI) presenting results that are very much in keeping with the distinction between a productive phenomenon, namely, inner speech proper, and a re-creative imaginative phenomenon, imagined speech. In particular, Tian & Poeppel (2012) and Tian, Zarate, and Poeppel (2016) have shown that there are two very different ways of generating auditory-verbal imagery, namely, of activating relevant areas of auditory sensory cortices in the absence of external sensory stimulation. One, which corresponds to inner speech (which they call “articulation imagery”) is induced through “motor simulation”, i.e., is initiated “top-down” by activation in areas of prefrontal and motor cortex associated with speaking. The other, which corresponds to inner hearing/imagined speech, is induced, in line with more standard accounts of imagery (including in other modalities, such as vision), via a memory-based mechanism (e.g. Kosslyn 1994 ), i.e., by the re-creation of a sensory event (derived, to some extent, from past sensory events). While the former mechanism involves trying to produce something directly (and its inhibition results in imagery being activated as part of the sensory predictions of the completed action), the latter involves trying to re-create the sensory effects of a past or constructed scenario. There is a sense in which imagining hearing something entirely new (i.e., not previously experienced) is “producing something”, but not in the same sense that inner speaking is productive. Unlike the latter, it involves the recreation of the sensory effect of an event, in this case an event that has never happened.

This distinction between a productive and re-creative phenomenon may map onto a phenomenological distinction between two different forms that auditory-verbal imagery can take. Using descriptive experience sampling (DES), Hurlburt and colleagues ( Hurlburt, Heavey, and Kelsey 2013 ) isolated two differently reported phenomena: “inner speaking” on the one hand, and “inner hearing” on the other. The former may correspond to the top-down mechanism of generating imagery that Tian and colleagues isolated; the latter, to the more bottom-up mechanism. Nevertheless, equating Hurlburt’s “inner speaking” with “inner speech” does not suffice to show that “inner speech” is not a case of imagination. The reason for this is that it seems plausible that inner speaking can take part in imaginative episodes as well as in more authentic or ecologically valid instances of inner speech. If you imagine yourself going up to someone and speaking to them, nothing prevents this from engaging the sort of top-down imagery that Tian and colleagues isolate, or in having phenomenological features more akin to inner speaking than to inner hearing. What we actually need is three-way distinction among the phenomena that make use of auditory-verbal imagery: (i) a genuinely productive phenomenon (which we are about to introduce, and which constitutes ecologically valid inner speech); (ii) a re-creative productive phenomenon (like the case of imagining yourself speak to someone, which involves inner speaking); and (iii) a re-creative sensory phenomenon (inner hearing). Whereas (ii) involves the same (or much of the same) apparatus as (i), it is used in a different context and for a different purpose (i.e., for the recreation of a counterfactual scenario). On the other hand, (iii) recruits sensory imagery for a similar re-creative activity as (ii). The genuinely productive phenomenon, namely, (i), is what we examine now.

9.3.2. Inner speech acts as the main form of inner speech

Following Roessler (2016) we can distinguish between a “mere act of inner speech” and an “inner speech act”, in a way that perfectly mirrors the distinction between a “mere act of speech” and a “speech act”. Although there are different accounts of speech acts (see Austin 1962 , Searle 1969 , Bach & Harnish 1979 for some classic formulations) everyone agrees that speech acts are closely tied to the speaker’s mental state in a way that mere acts of speech are not. If you change the mental state in relevant ways, then you change the speech act in relevant ways. Indeed, if you remove the mental state, then you thereby remove the speech act altogether. Examples will make things clearer. Reciting a poem, or repeating an address so as to remember it, is an act of speech, but it is not a speech act. This is, in part, because the speaker, in reciting, or repeating, does not mean what is being said, and any potential variations in the subject’s mental states are compatible with the same act being performed (and variations in what is repeated or recited do not thereby signal similar variations in the subject’s mental states). In stark contrast, sincerely asserting, requesting, demanding, questioning are speech acts. These require the person performing them to be in certain states of mind. For example, an assertion (if sincere) requires the asserter to believe what they are asserting, a question (if sincere) requires the questioner to have the desire to know the answer to the question, and so on.

  • Jane asserted that p
  • Jane imagined asserting that p
  • Jane asserted in inner speech that p

Whereas 3 implies 1, 2 does not. In fact, if anything, 2 implies that 1 is false: merely imagining asserting rules out actually asserting (just like imagining raising your right hand rules out you actually doing so). On the other hand, an assertion in inner speech is a perfectly good instance of assertion. 6 And insofar as 1 and 3 are both assertions, they both, if sincere, require that Jane be in a certain mental state (i.e., believing that p). In a related manner, assertions that p are treated as evidence for the attribution of the mental states that they (if sincere) require (or express), in this case, believing that p. Thus if someone asserts, “Paris is the capital of France”, you will (other things being equal) think that they believe that Paris is the capital of France. The same applies to other kinds of speech acts, and other kinds of speech acts are intimately tied to other kinds of mental state. Orders and requests are tied to goals, questions are tied to desires to know, compliments are tied to positive evaluations, insults to negative evaluations, etc. And when people request, question, compliment, or insult, if we take them to be sincere, we thereby take them to be in those mental states.

Of course, there is one rather perplexing feature of inner speech, construed as an inner speech act, which is: why do we engage in it at all? Usually when we assert, question, or insult in outer speech, we have an addressee. We are speaking our minds to someone else. When we assert, question or insult in inner speech, who are we doing it for? Who are we speaking our minds to ? The answer is: ourselves.

Organisms that live in groups, that cooperate and communicate, can do so very successfully without inner speech, and also without the need to directly introspect. They simply need to express themselves to their conspecifics. These communicative acts do not require the organism to have reflected on, or even have prior access to, its own mental state: the expression can be spontaneous and unreflective. However, once produced, these communicative acts can be perceived and interpreted by the agent who produced them. But of course, this cannot be regularly used as a way of accessing your mental states, since that would involve making your beliefs, desires, plans, and evaluations entirely public. That would often be, at best, socially unacceptable, and, at worst, downright dangerous. Inner speech can be understood in part as a solution to this problem of indiscretion: it is a way of expressing, and hence accessing and reflecting upon, your own state of mind without thereby having to risk giving that information away to others. 7

There are many theorists who would be in general agreement with this picture (e.g. Jackendoff 1996 , Clark 1996 , Carruthers 2011 ). One interesting feature of positing this role for inner speech is that it suggests that we (at least sometimes, perhaps always) lack other more direct means of reflecting on our mental states. Our view is that inner speech certainly helps a great deal with reflection on our minds, but there are certainly ways of so reflecting that don’t make use of inner speech.

9.4. The Experiential Content of Speech Experience

If inner speech is, in an important sense, speech, it is reasonable to assume that we may learn about its content by examining the content of speech experience per se . As it happens, there is currently a lively philosophical debate about the content of the auditory experience of speech (see O’Callaghan 2011 and Brogaard forthcoming ). This debate is a specific version of a more general debate about the content of perceptual experience generally. There are those, sometimes called “liberals”, who want to allow that “high-level properties” can enter into the contents of perceptual experience (e.g. Siegel 2006 , Bayne 2009 ), and there are those, sometimes called “conservatives” (e.g. Dretske 1995, Tye 1995 ), who claim that only “low-level properties” can.

a certain shape and colour

the property of being an apple

the property of being a Granny Smith apple

the property of having been grown in Chile

The further down this list you go, in terms of accepting that it could enter into the content of perceptual experience, the more “liberal” you are about the admissible content of perceptual experience. Even the most liberal of liberals will admit that (iv) just isn’t the right kind of property for your perceptual experience to convey. You may come to know that the apple was grown in Chile, but you can’t have known that solely on the basis of your perceptual experience. Liberals, however, may claim that (iii) can enter into the content of perceptual experience for, say, someone familiar with Granny Smiths. And they will certainly say that (ii) enters into the content of perceptual experience for those of us familiar with apples. The conservative, on the other hand, wants to say that only (i) is the purview of perceptual experience: (ii) and (iii) go beyond what perceptual experience can represent.

This debate came to prominence in the light of a classic argument in favour of liberalism that proceeded by presenting what might be called “contrast cases” (see Siegel 2006 for perhaps the classic example of a contrast case). In contrast cases, you compare two cases where the “low-level” properties represented (e.g. colour and shape) remain constant, but the “high-level” properties represented are different because, in one of the cases, the high-level properties cannot be represented due to lack of knowledge or expertise. For example, looking at one and the same oak tree will be phenomenologically different depending on whether you know nothing of tree species, or whether you are an expert. The idea is that the two cases differ in specifically perceptual phenomenology, and that this should be attributed to the representation of high-level properties in perceptual experience. The expert, in automatically recognizing the oak, has represented in her perceptual experience the property of being an oak, whereas the novice hasn’t.

When applied to speech perception, the very same phenomenal contrast arguments can be used, and are perhaps even more convincing since language is an area where expertise has especially powerful effects on experience. If you think about the phenomenological difference between hearing a language that you understand and one that you don’t, it seems plausible that understanding a spoken language makes it sound different (see, e.g. Strawson 1994 ). This has led some people to attribute this to the representation of meaning in auditory speech experience. Thus, you don’t merely get loudness, pitch, and timbre represented: you also get “high-level” properties like meaning (in a way that is akin to how you get the high-level property “oak tree” in the visual case).

O’Callaghan (2011) has recently criticized the view that meanings are represented in the auditory experience of speech. He does, however, accept that there is a phenomenological contrast between hearing speech when you understand the language and when you don’t, and he accepts that the contrast is one of perceptual (rather than emotional or cognitive) phenomenology. What he thinks explains the difference is the representation, in one instance, of, not the standard low-level properties of loudness, pitch, and timbre, but properties a bit “higher” (we might call them mid-level properties), namely language-specific phonological properties (“language-specific” in the sense of specific to, say, French as opposed to German).

O’Callaghan’s reasons for adopting such a view stem from another contrast case that compares homophones. He claims that there is no phenomenological difference between hearing homophones, even if we perceive them as having different meanings. So, to take an example, if we hear an utterance of “bank” (the financial institution) and “bank” (the edge of a river), they sound the same. As a result, O’Callaghan claims that it isn’t meaning that explains the phenomenological difference, since here we have different meanings but the same phenomenology. Rather, what better explains the difference between hearing languages you do and don’t understand is familiarity or expertise with the phonology of the known language, which affects the temporal and qualitative features the relevant speech sounds are experienced as having.

As Brogaard (forthcoming) rightly points out, this argument from homophones has the weakness that it arguably isn’t words that are the relevant vehicles of meaning, but entire utterances, namely, sentences used in context. We would go a step further and say that, whatever “meaning” is taken to be (it refers to different abstract entities for different purposes) the relevant sense in which meaning is represented in speech experience is in the sense of “speaker meaning”, namely, the underlying mental state of the speaking agent that is expressed by the speech act. What makes it the case that two assertions of “I’m going down to the bank” are experienced differently based on attributing different meanings to the word “bank” is that in one case you take the speaker to be expressing (their belief in) their intention to go to a financial institution, while in the other you take the speaker to be expressing (their belief in) their intention to go to the edge of the river. That said, the phenomenological difference between the two uses of “I’m going down to the bank” is very subtle, and some people may deny its existence. Clearer examples are cases of syntactic ambiguity (“I’m glad I’m a man and so is Lola”), or cases of sarcasm (e.g. saying “Well done” in a berating, rather than congratulatory manner). Of course, in such cases (especially sarcasm) the acoustic properties of the utterance are often altered by the speaker in order to promote one interpretation over the other. This, however, doesn’t mean that two identical speech sounds won’t be experienced as phenomenologically different if interpreted differently.

However, the conservative can say that, although there is a phenomenological difference, it is attributable to phenomenological differences associated with judgements about the speaker’s mental state, rather than experiences of these mental states. Thus when I hear “I’m glad I’m a man and so is Lola”, it is phenomenologically different to judge that the speaker is expressing gladness that he and Lola are both men, than to judge that the speaker is expressing that both Lola and he are glad that he is a man. In other words, the phenomenology is different, but it is not experiential phenomenology. One problem with this suggestion is that the relevant phenomenology remains even when we know that the speaker doesn’t have that mental state (e.g., is acting on stage), or when the speaker is just a vague, hypothetical construct (e.g., as when abstractly considering different utterances, or hearing announcements at the train station). It doesn’t therefore seem that it can be something to do with judgement. Granted, it could be a phenomenology associated with something less committal than judgement, but that remains non-experiential. Whatever this state may be, the phenomenology seems stimulus-bound, bound to the experience, so why not view it as part of the experience? 8

The other thing that the conservative might say, which is very much in line with what O’Callaghan says, is that judging, or even merely hypothesizing that a speech sound expresses a certain (even hypothetical) mental state has a top-down influence on how the low-level stuff is experienced. That doesn’t mean that perceptual experience represents anything over and above those low-level properties. The difference in phenomenology is indeed a difference in properly perceptual content, but this difference just is a difference in those low-level properties. In other words, a premise of the liberal’s contrast case doesn’t hold, since the low-level properties aren’t being kept constant after all.

This seems like a plausible response, but then the debate becomes one of conceptual cartography. What do you mean by “ perceptual experience”? In particular, the liberal could just say that these and similar top-down influences are so rife in even the most basic forms of perception, that that just is what perceptual experience is. If even the experience of low-level properties is enabled by top-down influences, where do you draw the line? One might say that you draw the line at that which remains the same when the sensory inputs are kept the same. But arguably, even at the very front line of sensation, top-down effects have influence (see e.g. Lee 2002 for vision, Davis and Johnsrude 2007 for audition). And if top-down influences enable us to hear sounds a certain way, why not allow that top-down influences enable us to experience meanings? Granted, such a response on the part of the liberal raises problems about sensory modality. If the speaker meaning (the mental state, real or hypothetical) behind an utterance is represented in perceptual experience, then surely it must be represented in a sensory modality, namely, audition? But isn’t it implausible to say that mental states are auditorily represented? I surely cannot literally hear your beliefs.

It doesn’t matter for our purposes whether the phenomenological changes in the experience are down to properly “perceptual” features of the experience, or to some kind of non-perceptual experiential accompaniment (e.g., some kind of stimulus-bound cognitive or affective phenomenology). What matters to us here is that the overall experience of speech is extremely representationally rich, regardless of whether all of the features can be thought of as perceptual or non-perceptual. In particular, we think that, as well as the low-level features of loudness, pitch, and timbre, the states of mind underlying utterances can at times also be part of what is represented in the experience of those utterances.

9.5. The Experiential Content of Inner Speech

Although we think that the lessons are transferrable, we cannot assume that the content of your experience of someone else’s outer speech is similar to the content of your own inner speech. To argue for this step by step, it is helpful to move from experience of someone else’s speech to a qualitatively intermediate stage on the way to inner speech: the experience of your own speech spoken out loud.

So, what is the difference between hearing someone else’s speech and experiencing your own speech? First of all, you don’t only experience your own speech by hearing it. You are proprioceptively and sensorially aware of your speech production apparatus. But that’s not the only thing: you have a sense of agency, in both the sense that you tend to be aware that it is you who is speaking, and also in the more specific sense that what you say tends not to come as a surprise. In spite of this difference, you also tend to know who is speaking when you hear others speak, and what comes out of the mouths of people you know really well tends not to come as a surprise either (and conversely you can sometimes surprise yourself). Another difference between your speech (and your action generally) and the speech of others, is that it is embedded in a rich and pervasive context that you (normally) have unparalleled access to (not least because you are always with yourself). You tend to speak as part of an overall complex serial process (namely, your life) in the service of you plans, goals, habits, machinations. And you are there to witness it all, effortlessly taking in the past and projecting into the future.

We agree that there are major asymmetries between the epistemology of your own speech and the speech of others. These are asymmetries that parallel the difference between experiencing yourself act, and perceiving others act. That said, these are epistemological differences, rather than differences in experiential content. Your access to the experiential content of your own speech may be more direct, more secure, and aided by a pervasive context, but that doesn’t mean that it doesn’t have a similar kind of content to that of someone else’s speech. Thus, we suggest, the experience of your own outer speech, like the experience of someone else’s speech, doesn’t only represent the low-level acoustic properties of your speech (as well as low-level features that are lacking in the experience of someone else speaking, such as tactile and proprioceptive information about your speech production) but also mental state information.

It is a small step from the experience of our own outer speech to the experience of our inner speech. Of course, the experiences are qualitatively rather different, but they achieve, or at least can achieve, the same thing. For example, you can (in situations when you are alone, or social norms allow it), replace your inner speech with outer speech with very much the same effect. Encouraging yourself with “Come on!” during a game of tennis, or asking yourself “What did I come upstairs for?” can be done in either inner or outer speech with similar effect (although there may be an added motivational effect to saying the former out loud).

In outer speech, there is lots of fine-grained auditory information in the content of the experience. If you were to mishear the pitch at which you were speaking, that would be relatively unimportant in most cases, but it would still be an inaccurate experience. You can imagine someone who hears pitch distortions, but still manages to pick up on the content and nuances of utterances. Another interesting case to reflect on in this instance is when congenitally deaf people learn to speak. In these cases they are producing speech sounds that they themselves cannot hear, for the benefit of a hearing interlocutor. But in what sense is their own experience, with its proprioceptive and tactile feedback elements, failing to adequately represent their speech? Sure, it is not representing the sounds they are producing, but does that mean that it is not still representing what is by far the most valuable aspect of the speech, namely, what is being conveyed? Clearly the deaf speaker has an experiential appreciation of what they are saying in the absence of hearing what they are saying. In short, even in cases where sounds are being produced, what is more significant are the mental states—the speaker meanings—expressed in speech.

How does all of this apply to inner speech? Given that, in inner speech, there is no real-world auditory information to accurately represent, the information about mental states seems to be even more at the heart of what is carried by the vehicle of inner speech. An experience of inner speech, insofar as it is of an inner speech act, typically represents the state of mind expressed by that speech act, and, however minimally, the individual whose state of mind it is, namely, you . Thus when you assert something in inner speech, the conscious experience of that represents your belief in that which you have asserted, and, somewhat trivially, represents it as belonging to you. This much can also be said about hearing someone (yourself or someone else) sincerely assert something in outer speech. However, in contrast to this, it is hard to see how things like loudness, pitch, and timbre (or even phonology) can be represented in a relevantly committal way in inner speech. They may be (and often probably are) represented insofar as they contribute to the phenomenology of the experience (just like an imagining of a pink unicorn represents a pink unicorn in a way that is reflected in the episode’s phenomenology), but it seems that there is no feature of the world that would make that aspect of the experience accurate or not. 9 To make matters clearer, consider the fact that in hearing your own outer speech, you might mishear, e.g., the pitch at which you spoke, and this is determined by an objective feature of the world (namely, the actual pitch of the sounds you produced). It is not clear how something like this would work for inner speech. Although you could imagine someone complaining to a doctor and saying “I’m hearing my own voice 2 tones lower than it actually is”, such a complaint would make no sense in inner speech. There is no epistemic distance between how inner speech really is in terms of pitch and how it is experienced as being (just like there is no epistemic distance between my imagining of a pink unicorn and how it is experienced as being: how it is experienced as being constitutes the imaginative episode). However, in contrast, the mental and agentive aspects are the same in both inner and outer speech. And whether your experience of those aspects is accurate is determined by objective features of the world, namely, your actual state of mind. Such objective features are precisely what enables us to draw the boundary between sincerity and insincerity. “Great drawing!” you might say, in response to your friend’s woeful attempt at a sketch, with a feigned air of sincerity in your voice so as not to offend them. The actual insincerity of that speech act is an objective fact about the world, largely determined by the fact that you don’t really positively evaluate their drawing. Similarly, saying, “I’m such an idiot!” in inner speech is accurate to the extent that you are genuinely reprimanding yourself, which, like any speech act, requires you to be in a very particular mental state.

9.6. The Ways in Which Inner Speech Can (and Can’t) Mislead

As we’ve already mentioned, it is not clear that auditory or even phonological properties which may well be represented in inner speech (as reflected in the phenomenology of some inner speech) are subject to inaccuracy. They are cases of “content without commitment”, since it really isn’t clear what objective feature of the world, over and above the experience of those auditory or phonological properties themselves, might act as a benchmark against which they could fall short. In contrast, the agent and their mental state is precisely such an objective part of the world, and is one that, crucially, an inner speech act may well, in principle, misrepresent. Now let’s examine ways in which this can be misrepresented.

In producing an inner speech act and becoming aware of it, you become aware, in the good case, that (i) the mental state expressed is such and such (i.e., the speech act has a certain meaning), and (ii) you are the agent of the speech act (i.e., it is you, and not someone else, who has the mental state the speech act expresses). As a result, it seems to us that you can in principle have two errors, which might sometimes occur together.

  • Misrepresenting the state of mind you actually have.
  • Misrepresenting the agent of the speech act (namely, whose speech act it is).

We take it that, on reflection, A is relatively common. Both culpable insincerities and innocent inaccuracies regularly creep into our inner speech, and do so with negative impact upon our self-knowledge (although they may have positive impact in non-epistemic ways, e.g., on psychological comfort or well-being).

B, on the other hand, seems much less common. However, if the model according to which (at least some) auditory verbal hallucinations (AVHs) are misattributed episodes of inner speech is correct, then it might be something that one sees in those instances. And if that is the case, one interesting question, which to our knowledge has never been addressed in this way, is whether in these cases you get misrepresentations of just B, or of both A and B? In other words, does the speech act the voice-hearer experiences in some cases express a mental state that she herself “really has deep down”? In which case, is it something that is protectively disowned in a failure to recognize it as her own mental state? Some cases of AVH in the context of very strong feelings of shame and self-loathing may be like this. 10 Or is it that the voice-hearer has produced an episode of inner speech that is somehow expressively inaccurate, namely, doesn’t express a mental state that the voice-hearer actually has, and so is attributed to another agent (as in, for example, Stephens and Graham’s (2000) model of ego-dystonic thoughts being misattributed as alien “voices”). This latter option in a sense does not involve the same degree of lack of self-knowledge. Although there is failure to detect self-production, the voice-hearer has in fact accurately detected that she doesn’t have the relevant mental state. This then raises a number of interesting further questions. For example, if the episode of inner speech (the inner speech act) that constitutes the AVH is not an expression of the voice-hearer’s own mental state, then whose mental state is it? Where does the voice (in the sense of the agent producing the speech act) come from ? This may then lead us to hypothesize that voice hearers countenance rich and relatively autonomous representations of communicative agents (see Deamer & Wilkinson 2015 , Wilkinson & Bell 2016 ). Then, of course, the question arises as to where this agent representation comes from. Why is it voice-hearers who have a propensity to represent agents in this way? Perhaps at this point it makes sense to suggest that it isn’t only voice-hearers who have this propensity. Indeed it can be argued (see McCarthy-Jones & Fernyhough 2011 ) that normal inner speech can be dialogic and is shot through with representations of agents other than ourselves making speech acts. For example, reasonably large proportions of respondents endorse statements about hearing the voices of other people in inner speech ( McCarthy-Jones & Fernyhough 2011 , Alderson-Day et al. 2014 ). Hence our inner speech sometimes expresses mental states that we don’t in fact have but that we hypothesize someone else might have. This would clearly be helpful, and may play a crucial role in underpinning social, and perhaps even normative, reasoning.

If the inner speech model of AVHs is accurate, these reflections on inner speech acts add a dimension of complexity to the phenomenon in question. These experiences are not just straightforward hallucinations of sounds that aren’t there (although in many cases they are partly this). They are experiences of mental states, had by people with mental states, and so admit of the different dimensions of inaccuracy explained above.

9.7. Conclusion

In this chapter, we have combined a particular view about the nature of inner speech with a liberal view about the representational content of speech experience. Our view of inner speech thinks of inner speech as speech in a number of important ways. It is a productive rather than re-creative phenomenon, and, as with speech, its normal ecological use is in the performing of speech acts. Our liberal view about the content of speech experience allows that, in addition to the speech sounds that you hear when someone speaks to you, what their speech means, where this is seen as speaker meaning (e.g. their communicative intentions), also enters into the content of the experience. Applying this to inner speech, although there appears to be no constraint of accuracy on the experience of speech sounds (since there are no objective speech sounds produced that could be accurately or inaccurately represented), there is a constraint for the agentive elements of the experience. The mental state that you happen to be in when you engage in inner speech is an objective fact, and your episode of inner speech could mislead you about it. Within this framework there is a great deal of work that could be done in ascertaining when and how people mislead themselves in inner speech, and as a result develop flaws in their self-knowledge. In more extreme cases, this approach could be used to explore cases of AVH.

We say “some experiences” because, although it is uncontroversial among representationalists (those philosophers who buy into the notion that experiences can have representational content) that, e.g., perceptual experiences have representational content, it is contentious whether other experiences that are less clearly about the world, e.g., pains, orgasms, etc. have such content.

In perception, it is a commitment that you can override: you don’t have to take your perceptual experience at face value.

The term “imagination” gets used in lots of different ways for different purposes, for example, there are imagistic and propositional forms of imagining. What we mean by imagining is simply a mental state (or, better, episode) that represents something and has no commitment to its reality or actuality (it is hence to be contrasted with judgement and perception, which do have such commitments). Thus imagination may or may not recruit imagery, and is certainly not synonymous with imagery.

To put it another way there is no appearance/reality distinction. Since the phenomenon is an appearance, the appearance is the reality.

One natural question at this point is whether inner speech can ever be an instance of imagining. This is a tricky question. In the first instance we want to say that paradigmatic inner speech isn’t imagination. But the question of whether inner speech can sometimes be an instance of imagining seems to get things the wrong way round: imaginative episodes may be enabled by inner speech, but inner speech is not constructed out of imaginative episodes.

Things are somewhat complicated by the fact that some theorists (e.g. Searle 1969 ) make it a requirement that an assertion have an interlocutor. It seems to us that we regularly make private assertions (and that these carry the same features as “normal” assertions, e.g., have the same sincerity conditions). This can either be accommodated by (contra Searle) removing the dialogic requirement, or by claiming that human inner speech is in some important sense dialogic. As will become clearer, we would opt for the latter.

We say “in part” because although inner speech, like outer speech, offers us improved self-knowledge, it doesn’t always, or even often, serve that purpose. Much of the time it regulates our behaviour and focuses our attention.

When you hear the Kinks song, Lola, you don’t literally attribute different mental states to Ray Davies depending on how you disambiguate “I’m glad I’m a man and so is Lola”. But you do experience it differently depending on how you disambiguate it, and this comes down to hypothetical mental state attribution (namely, the mental state that you would attribute if you took it seriously).

Note that this will apply even in the case of having a dialogue with another person in inner speech. You might be talking to your mother in your head, for example, and be getting her voice all wrong, but you would still be talking to your mother.

See Woods et al. (2015) for a recent phenomenological survey exploring, among other things, the varied emotional states that surround the experience of hearing voices (depression is reported in 29 per cent and shame in 14 per cent of their participants).

  • Alderson-Day, B., McCarthy-Jones, S., Bedford, S., Collins, H., Dunne, H., Rooke, C., & Fernyhough, C. (2014). Shot through with voices: Dissociation mediates the relationship between varieties of inner speech and auditory hallucination proneness. Consciousness and Cognition 27: 288–96. [ PMC free article : PMC4111865 ] [ PubMed : 24980910 ]
  • Austin, J. L. (1962). How to Do Things with Words . Clarendon Press.
  • Bach, K. & Harnish, R. (1979). Linguistic Communication and Speech Acts . MIT Press.
  • Bayne, T. (2009). Perception and the reach of phenomenal content. Philosophical Quarterly 59 (236): 385–404.
  • Brogaard, B. (forthcoming). In Defense of Hearing Meanings. Synthese : 1–17.
  • Davis, M. H. & Johnsrude, I. S. (2007). Hearing speech sounds: top-down influences on the interface between audition and speech perception. Hearing Research 229: 132–47. [ PubMed : 17317056 ]
  • Deamer, F. & Wilkinson, S. (2015). The speaker behind the voice: therapeutic practice from the perspective of pragmatic theory. Frontiers in Psychology 6: 817. [ PMC free article : PMC4463863 ] [ PubMed : 26124738 ]
  • Carruthers, P. (2011). The Opacity of Mind: An Integrative Theory of Self-Knowledge . Oxford University Press.
  • Clark, A. (1996). Linguistic anchors in the sea of thought? Pragmatics and Cognition 4 (1): 93–103.
  • Fernyhough, C. (1996). The dialogic mind: A dialogic approach to the higher mental functions. New Ideas in Psychology 14: 47–62.
  • Frith, C. (1992). The cognitive neuropsychology of schizophrenia . Hove: Lawrence Erlbaum.
  • Hurlburt, R. T., Heavey, C. L., & Kelsey, J. M. (2013). Toward a phenomenology of inner speaking. Consciousness and Cognition 22: 1477–94. [ PubMed : 24184987 ]
  • Jackendoff, R. S. (1996). How language helps us think. Pragmatics and Cognition 4 (1): 1–34.
  • Jacobsen, E. (1931). Electrical measurements of neuromuscular states during mental activities, VII: Imagination, recollection, and abstract thinking involving the speech musculature. American Journal of Physiology 97: 200–9.
  • Jones, S. R. & Fernyhough, C. (2007). Thought as action: Inner speech, self-monitoring, and auditory verbal hallucinations. Consciousness and Cognition 16 (2): 391–9. [ PubMed : 16464616 ]
  • Kosslyn, S. (1994). Image and Brain: The Resolution of the Imagery Debate . Cambridge, MA: MIT Press.
  • Langland-Hassan, Peter (2015). Imaginative Attitudes. Philosophy and Phenomenological Research 90 (3): 664–86.
  • Lee, T.S. (2002). Top-down influence in early visual processing: A Bayesian perspective. Behaviors and Physiology 77(4–5): 645–50. [ PubMed : 12527013 ]
  • Macpherson, F. (ed.) (2011). The Senses: Classic and Contemporary Philosophical Perspectives . New York: Oxford University Press.
  • Martínez-Manrique, F. & Vicente, A. (2010). What the …! The role of inner speech in conscious thought. Journal of Consciousness Studies 17 (9–10): 141–67.
  • McCarthy-Jones, S. & Fernyhough, C. (2011). The varieties of inner speech: Links between quality of inner speech and psychopathological variables in a sample of young adults. Consciousness and Cognition 20: 1586–93. [ PubMed : 21880511 ]
  • O’Callaghan, C. (2011). Against hearing meanings. Philosophical Quarterly 61 (245): 783–807.
  • Rapin, L., Dohen, M., Polosan, M., Perrier, P., & Loevenbruck, H. (2013). An EMG study of the lip muscles during covert auditory verbal hallucinations in schizophrenia. Journal of Speech, Language and Hearing Research 56: S1882–S1893. [ PubMed : 24687444 ]
  • Roessler, J. (2016). Thinking, Inner Speech, and Self-Awareness. Review of Philosophy and Psychology 7 (3): 541–57.
  • Seal, M. L., Aleman, A., & McGuire, P. K. (2004). Compelling imagery, unanticipated speech and deceptive memory: Neurocognitive models of auditory verbal hallucinations in schizophrenia. Cognitive Neuropsychiatry 9(1–2): 43–72. [ PubMed : 16571574 ]
  • Searle, J. (1969). Speech Acts: An Essay in the Philosophy of Language . Cambridge: Cambridge University Press.
  • Siegel, S. (2006). Which properties are represented in perception? In Tamar S. Gendler & John Hawthorne (eds.), Perceptual Experience (pp. 481–503). Oxford University Press.
  • Stephens, G. L. & Graham, G. (2000). When Self-Consciousness Breaks: Alien Voices and Inserted Thoughts . Cambridge, MA: MIT Press.
  • Strawson, G. (1994). Mental Reality . Cambridge, MA: MIT Press.
  • Tian, X. & Poeppel, D. (2012). Mental imagery of speech: linking motor and perceptual systems through internal simulation and estimation. Frontiers in Human Neuroscience 6: 314. [ PMC free article : PMC3508402 ] [ PubMed : 23226121 ]
  • Tian, X., Zarate, J. M., & Poeppel, D. (2016). Mental imagery of speech implicates two mechanisms of perceptual reactivation. Cortex, 77, 1–12. [ PMC free article : PMC5357080 ] [ PubMed : 26889603 ]
  • Tye, M. (1995). Ten Problems of Consciousness . Cambridge, MA: MIT Press.
  • Vygotsky, L. S. (1987). Thinking and speech. In R.W. Rieber & A.S. Carton (eds.), The collected works of L.S. Vygotsky, Volume 1: Problems of general psychology (pp. 39–285). New York: Plenum Press. (Original work published 1934.)
  • Wilkinson, S. & Bell, V. (2016). The Representation of Agents in Auditory Verbal Hallucinations. Mind and Language 31 (1): 104–26. [ PMC free article : PMC4744949 ] [ PubMed : 26900201 ]
  • Woods, A., Jones, N., Alderson-Day, B., Callard, F., & Fernyhough, C. (2015). Experiences of hearing voices: analysis of a novel phenomenological survey. The Lancet Psychiatry 2 (4): 323–31. [ PMC free article : PMC4580735 ] [ PubMed : 26360085 ]

This chapter is open access under a CC-BY license.

Monographs, or book chapters, which are outputs of Wellcome Trust funding have been made freely available as part of the Wellcome Trust's open access policy

  • Cite this Page Wilkinson S, Fernyhough C. When Inner Speech Misleads. In: Langland-Hassan P, Vicente A, editors. Inner Speech: New Voices. Oxford (UK): Oxford University Press; 2018. Chapter 9.
  • PDF version of this page (126K)

In this Page

  • Introduction
  • Content without Commitment: Inner Speech as Imagination
  • Inner Speech as Speech
  • The Experiential Content of Speech Experience
  • The Experiential Content of Inner Speech
  • The Ways in Which Inner Speech Can (and Can’t) Mislead

Other titles in this collection

  • Wellcome Trust–Funded Monographs and Book Chapters

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • Intelligibility of dysarthric speech: perceptions of speakers and listeners. [Int J Lang Commun Disord. 2008] Intelligibility of dysarthric speech: perceptions of speakers and listeners. Walshe M, Miller N, Leahy M, Murray A. Int J Lang Commun Disord. 2008 Nov-Dec; 43(6):633-48.
  • A neural mechanism for recognizing speech spoken by different speakers. [Neuroimage. 2014] A neural mechanism for recognizing speech spoken by different speakers. Kreitewolf J, Gaudrain E, von Kriegstein K. Neuroimage. 2014 May 1; 91:375-85. Epub 2014 Jan 13.
  • Acoustic parameters in speaker height and weight identification: sex-specific behaviour. [Lang Speech. 1995] Acoustic parameters in speaker height and weight identification: sex-specific behaviour. van Dommelen WA, Moxness BH. Lang Speech. 1995 Jul-Sep; 38 ( Pt 3):267-87.
  • Relationship between acoustic measures and speech naturalness ratings in Parkinson's disease: A within-speaker approach. [Clin Linguist Phon. 2015] Relationship between acoustic measures and speech naturalness ratings in Parkinson's disease: A within-speaker approach. Klopfenstein M. Clin Linguist Phon. 2015; 29(12):938-54. Epub 2015 Sep 24.
  • Review Neural correlates of inner speech and auditory verbal hallucinations: a critical review and theoretical integration. [Clin Psychol Rev. 2007] Review Neural correlates of inner speech and auditory verbal hallucinations: a critical review and theoretical integration. Jones SR, Fernyhough C. Clin Psychol Rev. 2007 Mar; 27(2):140-54. Epub 2006 Nov 22.

Recent Activity

  • When Inner Speech Misleads - Inner Speech When Inner Speech Misleads - Inner Speech

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Inner Speech

 Getty Images

  • An Introduction to Punctuation
  • Ph.D., Rhetoric and English, University of Georgia
  • M.A., Modern English and American Literature, University of Leicester
  • B.A., English, State University of New York

Inner speech is a form of internalized, self-directed dialogue: talking to oneself. The phrase inner speech was used by Russian psychologist Lev Vygotsky to describe a stage in language acquisition and the process of thought. In Vygotsky's conception, "Speech began as a social medium and became internalized as inner speech, that is, verbalized thought," (Katherine Nelson, Narratives From the Crib , 2006).

Inner Speech and Identity

"Dialogue launches language, the mind, but once it is launched we develop a new power, 'inner speech,' and it is this that is indispensable for our further development, our thinking. ... 'We are our language,' it is often said; but our real language, our real identity, lies in inner speech, in that ceaseless stream and generation of meaning that constitutes the individual mind. It is through inner speech that the child develops his own concepts and meanings; it is through inner speech that he achieves his own identity; it is through inner speech, finally, that he constructs his own world," (Oliver Sacks, Seeing Voices . University of California Press, 1989).

Is Inner Speech a Form of Speech or Thought?

"Difficult as it is to study inner speech, there have been attempts to describe it: it's said to be a shorthand version of real speech (as one researcher put it, a word in inner speech is 'the mere skin of a thought'), and it's very egocentric, not surprisingly, given that it's a monologue, with the speaker and the audience being the same person," (Jay Ingram, Talk Talk Talk: Decoding the Mysteries of Speech . Doubleday, 1992).

"Inner speech comprises both the inner voice we hear when reading and the muscle movements of the speech organs that often accompany reading and that are called subvocalizations, " (Markus Bader, "Prosody and Reanalysis." Reanalysis in Sentence Processing , ed. by Janet Dean Fodor and Fernanda Ferreira. Kluwer Academic Publishers, 1998).

Vygotsky on Inner Speech

"Inner speech is not the interior aspect of external speech—it is a function in itself. It still remains speech, i.e., thought connected with words. But while in external speech thought is embodied in words, in inner speech words die as they bring forth thought. Inner speech is to a large extent thinking in pure meanings. It is a dynamic, shifting, unstable thing, fluttering between word and thought, the two more or less stable, more or less firmly delineated components of verbal thought," (Lev Vygotsky, Thought, and Language , 1934. MIT Press, 1962).

Linguistic Characteristics of Inner Speech

"Vygotsky identified a number of lexicogrammatical features which are foregrounded in both egocentric speech and inner speech. These features include omission of the subject , the foregrounding of predication , and a highly elliptical relationship between these forms and the speech situation (Vygotsky 1986 [1934]: 236)," (Paul Thibault, Agency and Consciousness in Discourse: Self-Other Dynamics as a Complex System . Continuum, 2006).

"In inner speech the only grammatical rule at play is association through juxtaposition . Like inner speech, film uses a concrete language in which sense comes not from deduction but from the fullness of the individual attractions as qualified by the image which they help to develop," (J. Dudley Andrew, The Major Film Theories: An Introduction . Oxford University Press, 1976).

Inner Speech and Writing

" Writing is part of the process of finding, developing, and articulating inner speech, that reservoir of internalized thought and language on which we depend for communication," (Gloria Gannaway, Transforming Mind: A Critical Cognitive Activity . Greenwood, 1994).

"Because it is a more deliberate act, writing engenders a different awareness of language use. Rivers (1987) related Vygotsky's discussion of inner speech and language production to writing as discovery : 'As the writer expands his inner speech, he becomes conscious of things [of] which he was not previously aware. In this way, he can write more than he realizes' (p. 104).

"Zebroski (1994) noted that Luria looked at the reciprocal nature of writing and inner speech and described the functional and structural features of written speech, which 'inevitably lead to a significant development of inner speech. Because it delays the direct appearance of speech connections, inhibits them, and increases requirements for the preliminary, internal preparation for the speech act , written speech produces a rich development for inner speech' (p. 166)," (William M. Reynolds and Gloria Miller, eds., Handbook of Psychology: Educational Psychology . John Wiley, 2003).

  • The 9 Parts of Speech: Definitions and Examples
  • Context in Language
  • What Is the Zone of Proximal Development? Definition and Examples
  • Overgeneralization Definition and Examples
  • Phonological Segments
  • Vocabulary Acquisition
  • What Is Foregrounding?
  • Telegraphic Speech
  • Embolalia in Speech
  • Information Content (Language)
  • An Introduction to Semantics
  • Constructed Dialogue in Storytelling and Conversation
  • Reported Speech
  • Chunk (Language Acquisition)
  • How Figurative Language Is Used Every Day
  • Universal Grammar (UG)

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 13 May 2024

Representation of internal speech by single neurons in human supramarginal gyrus

  • Sarah K. Wandelt   ORCID: orcid.org/0000-0001-9551-8491 1 , 2 ,
  • David A. Bjånes 1 , 2 , 3 ,
  • Kelsie Pejsa 1 , 2 ,
  • Brian Lee 1 , 4 , 5 ,
  • Charles Liu   ORCID: orcid.org/0000-0001-6423-8577 1 , 3 , 4 , 5 &
  • Richard A. Andersen 1 , 2  

Nature Human Behaviour ( 2024 ) Cite this article

3552 Accesses

1 Citations

268 Altmetric

Metrics details

  • Brain–machine interface
  • Neural decoding

Speech brain–machine interfaces (BMIs) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury. While important advances in vocalized, attempted and mimed speech decoding have been achieved, results for internal speech decoding are sparse and have yet to achieve high functionality. Notably, it is still unclear from which brain areas internal speech can be decoded. Here two participants with tetraplegia with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. In both participants, we found significant neural representation of internal and vocalized speech, at the single neuron and population level in the SMG. From recorded population activity in the SMG, the internally spoken and vocalized words were significantly decodable. In an offline analysis, we achieved average decoding accuracies of 55% and 24% for each participant, respectively (chance level 12.5%), and during an online internal speech BMI task, we averaged 79% and 23% accuracy, respectively. Evidence of shared neural representations between internal speech, word reading and vocalized speech processes was found in participant 1. SMG represented words as well as pseudowords, providing evidence for phonetic encoding. Furthermore, our decoder achieved high classification with multiple internal speech strategies (auditory imagination/visual imagination). Activity in S1 was modulated by vocalized but not internal speech in both participants, suggesting no articulator movements of the vocal tract occurred during internal speech production. This work represents a proof-of-concept for a high-performance internal speech BMI.

Similar content being viewed by others

inner represented speech

Online speech synthesis using a chronically implanted brain–computer interface in an individual with ALS

inner represented speech

A high-performance speech neuroprosthesis

inner represented speech

The speech neuroprosthesis

Speech is one of the most basic forms of human communication, a natural and intuitive way for humans to express their thoughts and desires. Neurological diseases like amyotrophic lateral sclerosis (ALS) and brain lesions can lead to the loss of this ability. In the most severe cases, patients who experience full-body paralysis might be left without any means of communication. Patients with ALS self-report loss of speech as their most serious concern 1 . Brain–machine interfaces (BMIs) are devices offering a promising technological path to bypass neurological impairment by recording neural activity directly from the cortex. Cognitive BMIs have demonstrated potential to restore independence to participants with tetraplegia by reading out movement intent directly from the brain 2 , 3 , 4 , 5 . Similarly, reading out internal (also reported as inner, imagined or covert) speech signals could allow the restoration of communication to people who have lost it.

Decoding speech signals directly from the brain presents its own unique challenges. While non-invasive recording methods such as functional magnetic resonance imaging (fMRI), electroencephalography (EEG) or magnetoencephalography 6 are important tools to locate speech and internal speech production, they lack the necessary temporal and spatial resolution, adequate signal-to-noise ratio or portability for building an online speech BMI 7 , 8 , 9 . For example, state-of-the-art EEG-based imagined speech decoding performances in 2022 ranged from approximately 60% to 80% binary classification 10 . Intracortical electrophysiological recordings have higher signal-to-noise ratios and excellent temporal resolution 11 and are a more suitable choice for an internal speech decoding device.

Invasive speech decoding has predominantly been attempted with electrocorticography (ECoG) 9 or stereo-electroencephalographic depth arrays 12 , as they allow sampling neural activity from different parts of the brain simultaneously. Impressive results in vocalized and attempted speech decoding and reconstruction have been achieved using these techniques 13 , 14 , 15 , 16 , 17 , 18 . However, vocalized speech has also been decoded from localized regions of the cortex. In 2009, the use of a neurotrophic electrode 19 demonstrated real-time speech synthesis from the motor cortex. More recently, speech neuroprosthetics were built from small-scale microelectrode arrays located in the motor cortex 20 , 21 , premotor cortex 22 and supramarginal gyrus (SMG) 23 , demonstrating that vocalized speech BMIs can be built using neural signals from localized regions of cortex.

While important advances in vocalized speech 16 , attempted speech 18 and mimed speech 17 , 22 , 24 , 25 , 26 decoding have been made, highly accurate internal speech decoding has not been achieved. Lack of behavioural output, lower signal-to-noise ratio and differences in cortical activations compared with vocalized speech are speculated to contribute to lower classification accuracies of internal speech 7 , 8 , 13 , 27 , 28 . In ref. 29 , patients implanted with ECoG grids over frontal, parietal and temporal regions silently read or vocalized written words from a screen. They significantly decoded vowels (37.5%) and consonants (36.3%) from internal speech (chance level 25%). Ikeda et al. 30 decoded three internally spoken vowels using ECoG arrays using frequencies in the beta band, with up to 55.6% accuracy from the Broca area (chance level 33%). Using the same recording technology, ref. 31 investigated the decoding of six words during internal speech. The authors demonstrated an average pair-wise classification accuracy of 58%, reaching 88% for the highest pair (chance level 50%). These studies were so-called open-loop experiments, in which the data were analysed offline after acquisition. A recent paper demonstrated real-time (closed-loop) speech decoding using stereotactic depth electrodes 32 . The results were encouraging as internal speech could be detected; however, the reconstructed audio was not discernable and required audible speech to train the decoding model.

While, to our knowledge, internal speech has not previously been decoded from SMG, evidence for internal speech representation in the SMG exists. A review of 100 fMRI studies 33 not only described SMG activity during speech production but also suggested its involvement in subvocal speech 34 , 35 . Similarly, an ECoG study identified high-frequency SMG modulation during vocalized and internal speech 36 . Additionally, fMRI studies have demonstrated SMG involvement in phonologic processing, for instance, during tasks while participants reported whether two words rhyme 37 . Performing such tasks requires the participant to internally ‘hear’ the word, indicating potential internal speech representation 38 . Furthermore, a study performed in people suffering from aphasia found that lesions in the SMG and its adjacent white matter affected inner speech rhyming tasks 39 . Recently, ref. 16 showed that electrode grids over SMG contributed to vocalized speech decoding. Finally, vocalized grasps and colour words were decodable from SMG from one of the same participants involved in this work 23 . These studies provide evidence for the possibility of an internal speech decoder from neural activity in the SMG.

The relationship between inner speech and vocalized speech is still debated. The general consensus posits similarities between internal and vocalized speech processes 36 , but the degree of overlap is not well understood 8 , 35 , 40 , 41 , 42 . Characterizing similarities between vocalized and internal speech could provide evidence that results found with vocalized speech could translate to internal speech. However, such a relationship may not be guaranteed. For instance, some brain areas involved in vocalized speech might be poor candidates for internal speech decoding.

In this Article, two participants with tetraplegia performed internal and vocalized speech of eight words while neurophysiological responses were captured from two implant sites. To investigate neural semantic and phonetic representation, the words were composed of six lexical words and two pseudowords (words that mimic real words without semantic meaning). We examined representations of various language processes at the single-neuron level using recording microelectrode arrays from the SMG located in the posterior parietal cortex (PPC) and the arm and/or hand regions of the primary somatosensory cortex (S1). S1 served as a control for movement, due to emerging evidence of its activation beyond defined regions of interest 43 , 44 . Words were presented with an auditory or a written cue and were produced internally as well as orally. We hypothesized that SMG and S1 activity would modulate during vocalized speech and that SMG activity would modulate during internal speech. Shared representation between internal speech, vocalized speech, auditory comprehension and word reading processes was investigated.

Task design

We characterized neural representations of four different language processes within a population of SMG and S1 neurons: auditory comprehension, word reading, internal speech and vocalized speech production. In this manuscript, internal speech refers to engaging a prompted word internally (‘inner monologue’), without correlated motor output, while vocalized speech refers to audibly vocalizing a prompted word. Participants were implanted in the SMG and S1 on the basis of grasp localization fMRI tasks (Fig. 1 ).

figure 1

a , b , SMG implant locations in participant 1 (1 × 96 multielectrode array) ( a ) and participant 2 (1 × 64 multielectrode array) ( b ). c , d , S1 implant locations in participant 1 (2 × 96 multielectrode arrays) ( c ) and participant 2 (2 × 64 multielectrode arrays) ( d ).

The task contained six phases: an inter-trial interval (ITI), a cue phase (cue), a first delay (D1), an internal speech phase (internal), a second delay (D2) and a vocalized speech phase (speech). Words were cued with either an auditory or a written version of the word (Fig. 2a ). Six of the words were informed by ref. 31 (battlefield, cowboy, python, spoon, swimming and telephone). Two pseudowords (nifzig and bindip) were added to explore phonetic representation in the SMG. The first participant completed ten session days, composed of both the auditory and the written cue tasks. The second participant completed nine sessions, focusing only on the written cue task. The participants were instructed to internally say the cued word during the internal speech phase and to vocalize the same word during the speech phase.

figure 2

a , Written words and sounds were used to cue six words and two pseudowords in a participant with tetraplegia. The ‘audio cue’ task was composed of an ITI, a cue phase during which the sound of one of the words was emitted from a speaker (between 842 and 1,130 ms), a first delay (D1), an internal speech phase, a second delay (D2) and a vocalized speech phase. The ‘written cue’ task was identical to the ‘audio cue’ task, except that written words appeared on the screen for 1.5 s. Eight repetitions of eight words were performed per session day and per task for the first participant. For the second participant, 16 repetitions of eight words were performed for the written cue task. b – e , Example smoothed firing rates of neurons tuned to four words in the SMG for participant 1 (auditory cue, python ( b ), and written cue, telephone ( c )) and participant 2 (written cue, nifzig ( d ), and written cue, spoon ( e )). Top: the average firing rate over 8 or 16 trials (solid line, mean; shaded area, 95% bootstrapped confidence interval). Bottom: one example trial with associated audio amplitude (grey). Vertically dashed lines indicate the beginning of each phase. Single neurons modulate firing rate during internal speech in the SMG.

For each of the four language processes, we observed selective modulation of individual neurons’ firing rates (Fig. 2b–e ). In general, the firing rates of neurons increased during the active phases (cue, internal and speech) and decreased during the rest phases (ITI, D1 and D2). A variety of activation patterns were present in the neural population. Example neurons were selected to demonstrate increases in firing rates during internal speech, cue and vocalized speech. Both the auditory (Fig. 2b ) and the written cue (Fig. 2c–e ) evoked highly modulated firing rates of individual neurons during internal speech.

These stereotypical activation patterns were evident at the single-trial level (Fig. 2b–e , bottom). When the auditory recording was overlaid with firing rates from a single trial, a heterogeneous neural response was observed (Supplementary Fig. 1a ), with some SMG neurons preceding or lagging peak auditory levels during vocalized speech. In contrast, neural activity from primary sensory cortex (S1) only modulated during vocalized speech and produced similar firing patterns regardless of the vocalized word (Supplementary Fig. 1b ).

Population activity represented selective tuning for individual words

Population analysis in the SMG mirrored single-neuron patterns of activation, showing increases in tuning during the active task phases (Fig. 3a,d ). Tuning of a neuron to a word was determined by fitting a linear regression model to the firing rate in 50-ms time bins ( Methods ). Distinctions between participant 1 and participant 2 were observed. Specifically, participant 1 exhibited strong tuning, whereas the number of tuned units was notably lower in participant 2. Based on these findings, we exclusively ran the written cue task with participant number 2. In participant 1, representation of the auditory cue was lower compared with the written cue (Fig. 3b , cue). However, this difference was not observed for other task phases. In both participants, the tuned population activity in S1 increased during vocalized speech but not during the cue and internal speech phases (Supplementary Fig. 3a,b ).

figure 3

a , The average percentage of tuned neurons to words in 50-ms time bins in the SMG over the trial duration for ‘auditory cue’ (blue) and ‘written cue’ (green) tasks for participant 1 (solid line, mean over ten sessions; shaded area, 95% confidence interval of the mean). During the cue phase of auditory trials, neural data were aligned to audio onset, which occurred within 200–650 ms following initiation of the cue phase. b , The average percentage of tuned neurons computed on firing rates per task phase, with 95% confidence interval over ten sessions. Tuning during action phases (cue, internal and speech) following rest phases (ITI, D1 and D2) was significantly higher (paired two-tailed t -test, d.f. 9, P ITI_CueWritten  < 0.001, Cohen’s d  = 2.31; P ITI_CueAuditory  = 0.003, Cohen’s d  = 1.25; P D1_InternalWritten  = 0.008, Cohen’s d  = 1.08; P D1_InternalAuditory  < 0.001, Cohen’s d  = 1.71; P D2_SpeechWritten  < 0.001, Cohen’s d  = 2.34; P D2_SpeechAuditory  < 0.001, Cohen’s d  = 3.23). c , The number of neurons tuned to each individual word in each phase for the ‘auditory cue’ and ‘written cue’ tasks. d , The average percentage of tuned neurons to words in 50-ms time bins in the SMG over the trial duration for ‘written cue’ (green) tasks for participant 2 (solid line, mean over nine sessions; shaded area, 95% confidence interval of the mean). Due to a reduced number of tuned units, only the ‘written cue’ task variation was performed. e , The average percentage of tuned neurons computed on firing rates per task phase, with 95% confidence interval over nine sessions. Tuning during cue and internal phases following rest phases ITI and D1 was significantly higher (paired two-tailed t -test, d.f. 8, P ITI_CueWritten  = 0.003, Cohen’s d  = 1.38; P D1_Internal  = 0.001, Cohen’s d  = 1.67). f , The number of neurons tuned to each individual word in each phase for the ‘written cue’ task.

Source data

To quantitatively compare activity between phases, we assessed the differential response patterns for individual words by examining the variations in average firing rate across different task phases (Fig. 3b,e ). In both participants, tuning during the cue and internal speech phases was significantly higher compared with their preceding rest phases ITI and D1 (paired t -test between phases. Participant 1: d.f. 9, P ITI_CueWritten  < 0.001, Cohen’s d  = 2.31; P ITI_CueAuditory  = 0.003, Cohen’s d  = 1.25; P D1_InternalWritten  = 0.008, Cohen’s d  = 1.08; P D1_InternalAuditory  < 0.001, Cohen’s d  = 1.71. Participant 2: d.f. 8, P ITI_CueWritten  = 0.003, Cohen’s d  = 1.38; P D1_Internal  = 0.001, Cohen’s d  = 1.67). For participant 1, we also observed significantly higher tuning to vocalized speech than to tuning in D2 (d.f. 9, P D2_SpeechWritten  < 0.001, Cohen’s d  = 2.34; P D2_SpeechAuditory  < 0.001, Cohen’s d  = 3.23). Representation for all words was observed in each phase, including pseudowords (bindip and nifzig) (Fig. 3c,f ). To identify neurons with selective activity for unique words, we performed a Kruskal–Wallis test (Supplementary Fig. 3c,d ). The results mirrored findings of the regression analysis in both participants, albeit weaker in participant 2. These findings suggest that, while neural activity during active phases differed from activity during the ITI phase, neural responses of only a few neurons varied across different words for participant 2.

The neural population in the SMG simultaneously represented several distinct aspects of language processing: temporal changes, input modality (auditory, written for participant 1) and unique words from our vocabulary list. We used demixed principal component analysis (dPCA) to decompose and analyse contributions of each individual component: timing, cue modality and word. In Fig. 4 , demixed principal components (PCs) explaining the highest amount of variance were plotted by projecting data onto their respective dPCA decoder axis.

figure 4

a – e , dPCA was performed to investigate variance within three marginalizations: ‘timing’, ‘cue modality’ and ‘word’ for participant 1 ( a – c ) and ‘timing’ and ‘word’ for participant 2 ( d and e ). Demixed PCs explaining the highest variance within each marginalization were plotted over time, by projecting the data onto their respective dPCA decoder axis. In a , the ‘timing’ marginalization demonstrates SMG modulation during cue, internal speech and vocalized speech, while S1 only represents vocalized speech. The solid blue lines (8) represent the auditory cue trials, and dashed green lines (8) represent written cue trials. In b , the ‘cue modality’ marginalization suggests that internal and vocalized speech representation in the SMG are not affected by the cue modality. The solid blue lines (8) represent the auditory cue trials, and dashed green lines (8) represent written cue trials. In c , the ‘word’ marginalization shows high variability for different words in the SMG, but near zero for S1. The colours (8) represent individual words. For each colour, solid lines represent auditory trials and dashed lines represent written cue trials. d is the same as a , but for participant 2. The dashed green lines (8) represent written cue trials. e is the same as c , but for participant 2. The colours (8) represent individual words during written cue trials. The variance for different words in the SMG (left) was higher than in S1 (right), but lower in comparison with SMG in participant 1 ( c ).

For participant 1, the ‘timing’ component revealed that temporal dynamics in the SMG peaked during all active phases (Fig. 4a ). In contrast, temporal S1 modulation peaked only during vocalized speech production, indicating a lack of synchronized lip and face movement of the participant during the other task phases. While ‘cue modality’ components were separable during the cue phase (Fig. 4b ), they overlapped during subsequent phases. Thus, internal and vocalized speech representation may not be influenced by the cue modality. Pseudowords had similar separability to lexical words (Fig. 4c ). The explained variance between words was high in the SMG and was close to zero in S1. In participant 2, temporal dynamics of the task were preserved (‘timing’ component). However, variance to words was reduced, suggesting lower neuronal ability to represent individual words in participant 2. In S1, the results mirrored findings from S1 in participant 1 (Fig. 4d,e , right).

Internal speech is decodable in the SMG

Separable neural representations of both internal and vocalized speech processes implicate SMG as a rich source of neural activity for real-time speech BMI devices. The decodability of words correlated with the percentage of tuned neurons (Fig. 3a–f ) as well as the explained dPCA variance (Fig. 4c,e ) observed in the participants. In participant 1, all words in our vocabulary list were highly decodable, averaging 55% offline decoding and 79% (16–20 training trials) online decoding from neurons during internal speech (Fig. 5a,b ). Words spoken during the vocalized phase were also highly discriminable, averaging 74% offline (Fig. 5a ). In participant 2, offline internal speech decoding averaged 24% (Supplementary Fig. 4b ) and online decoding averaged 23% (Fig. 5a ), with preferential representation of words ‘spoon’ and ‘swimming’.

figure 5

a , Offline decoding accuracies: ‘audio cue’ and ‘written cue’ task data were combined for each individual session day, and leave-one-out CV was performed (black dots). PCA was performed on the training data, an LDA model was constructed, and classification accuracies were plotted with 95% confidence intervals, over the session means. The significance of classification accuracies were evaluated by comparing results with a shuffled distribution (averaged shuffle results over 100 repetitions indicated by red dots; P  < 0.01 indicates that the average mean is >99.5th percentile of shuffle distribution, n  = 10). In participant 1, classification accuracies during action phases (cue, internal and speech) following rest phases (ITI, D1 and D2) were significantly higher (paired two-tailed t -test: n  = 10, d.f. 9, for all P  < 0.001, Cohen’s d  = 6.81, 2.29 and 5.75). b , Online decoding accuracies: classification accuracies for internal speech were evaluated in a closed-loop internal speech BMI application on three different session days for both participants. In participant 1, decoding accuracies were significantly above chance (averaged shuffle results over 1,000 repetitions indicated by red dots; P  < 0.001 indicates that the average mean is >99.95th percentile of shuffle distribution) and improved when 16–20 trials per words were used to train the model (two-sample two-tailed t -test, n (8–14)  = 8, d.f. 11, n (16–20)  = 5, P  = 0.029), averaging 79% classification accuracy. In participant 2, online decoding accuracies were significant (averaged shuffle results over 1,000 repetitions indicated by red dots; P  < 0.05 indicates that average mean is >97.5th percentile of shuffle distribution, n  = 7) and averaged 23%. c , An offline confusion matrix for participant 1: confusion matrices for each of the different task phases were computed on the tested data and averaged over all session days. d , An online confusion matrix: a confusion matrix was computed combining all online runs, leading to a total of 304 trials (38 trials per word) for participant 1 and 448 online trials for participant 2. Participant 1 displayed comparable online decoding accuracies for all words, while participant 2 had preferential decoding for the words ‘swimming’ and ‘spoon’.

In participant 1, trial data from both types of cue (auditory and written) were concatenated for offline analysis, since SMG activity was only differentiable between the types of cue during the cue phase (Figs. 3a and 4b ). This resulted in 16 trials per condition. Features were selected via principal component analysis (PCA) on the training dataset, and PCs that explained 95% of the variance were kept. A linear discriminant analysis (LDA) model was evaluated with leave-one-out cross-validation (CV). Significance was computed by comparing results with a null distribution ( Methods ).

Significant word decoding was observed during all phases, except during the ITI (Fig. 5a , n  = 10, mean decoding value above 99.5th percentile of shuffle distribution is P  < 0.01, per phase, Cohen’s d  = 0.64, 6.17, 3.04, 6.59, 3.93 and 8.26, confidence interval of the mean ± 1.73, 4.46, 5.21, 5.67, 4.63 and 6.49). Decoding accuracies were significantly higher in the cue, internal speech and speech condition, compared with rest phases ITI, D1 and D2 (Fig. 5a , paired t -test, n  = 10, d.f. 9, for all P  < 0.001, Cohen’s d  = 6.81, 2.29 and 5.75). Significant cue phase decoding suggested that modality-independent linguistic representations were present early within the task 45 . Internal speech decoding averaged 55% offline, with the highest session at 72% and a chance level of ~12.5% (Fig. 5a , red line). Vocalized speech averaged even higher, at 74%. All words were highly decodable (Fig. 5c ). As suggested from our dPCA results, individual words were not significantly decodable from neural activity in S1 (Supplementary Fig. 4a ), indicating generalized activity for vocalized speech in the S1 arm region (Fig. 4c ).

For participant 2, SMG significant word decoding was observed during the cue, internal and vocalized speech phases (Supplementary Fig. 4b , n  = 9, mean decoding value above 97.5th/99.5th percentile of shuffle distribution is P  < 0.05/ P  < 0.01, per phase Cohen’s d  = 0.35, 1.15, 1.09, 1.44, 0.99 and 1.49, confidence interval of the mean ± 3.09, 5.02, 6.91, 8.14, 5.45 and 4.15). Decoding accuracies were significantly higher in the cue and internal speech condition, compared with rest phases ITI and D1 (Supplementary Fig. 4b , paired t -test, n  = 9, d.f. 8, P ITI_Cue  = 0.013, Cohen’s d  = 1.07, P D1_Internal  = 0.01, Cohen’s d  = 1.11). S1 decoding mirrored results in participant 1, suggesting that no synchronized face movements occurred during the cue phase or internal speech phase (Supplementary Fig. 4c ).

High-accuracy online speech decoder

We developed an online, closed-loop internal speech BMI using an eight-word vocabulary (Fig. 5b ). On three separate session days, training datasets were generated using the written cue task, with eight repetitions of each word for each participant. An LDA model was trained on the internal speech data of the training set, corresponding to only 1.5 s of neural data per repetition for each class. The trained decoder predicted internal speech during the online task. During the online task, the vocalized speech phase was replaced with a feedback phase. The decoded word was shown in green if correctly decoded, and in red if wrongly decoded (Supplementary Video 1 ). The classifier was retrained after each run of the online task, adding the newly recorded data. Several online runs were performed on each session day, corresponding to different datapoints on Fig. 5b . When using between 8 and 14 repetitions per words to train the decoding model, an average of 59% classification accuracy was obtained for participant 1. Accuracies were significantly higher (two-sample two-tailed t -test, n (8–14)  = 8, n (16–20)  = 5, d.f. 11, P  = 0.029) the more data were added to train the model, obtaining an average of 79% classification accuracy with 16–20 repetitions per word. The highest single run accuracy was 91%. All words were well represented, illustrated by a confusion matrix of 304 trials (Fig. 5d ). In participant 2, decoding was statistically significant, but lower compared with participant 1. The lower number of tuned units (Fig. 3a–f ) and reduced explained variance between words (Fig. 4e , left) could account for these findings. Additionally, preferential representation of words ‘spoon’ and ‘swimming’ was observed.

Shared representations between internal speech, written words and vocalized speech

Different language processes are engaged during the task: auditory comprehension or visual word recognition during the cue phase, and internal speech and vocalized speech production during the speech phases. It has been widely assumed that each of these processes is part of a highly distributed network, involving multiple cortical areas 46 . In this work, we observed significant representation of different language processes in a common cortical region, SMG, in our participants. To explore the relationships between each of these processes, for participant 1 we used cross-phase classification to identify the distinct and common neural codes separately in the auditory and written cue datasets. By training our classifier on the representation found in one phase (for example, the cue phase) and testing the classifier on another phase (for example, internal speech), we quantified generalizability of our models across neural activity of different language processes (Fig. 6 ). The generalizability of a model to different task phases was evaluated through paired t -tests. No significant difference between classification accuracies indicates good generalization of the model, while significantly lower classification accuracies suggest poor generalization of the model.

figure 6

a , Evaluating the overlap of shared information between different task phases in the ‘auditory cue’ task. For each of the ten session days, cross-phase classification was performed. It consisted in training a model on a subset of data from one phase (for example, cue) and applying it on a subset of data from ITI, cue, internal and speech phases. This analysis was performed separately for each task phase. PCA was performed on the training data, an LDA model was constructed and classification accuracies were plotted with a 95% confidence interval over session means. Significant differences in performance between phases were evaluated between the ten sessions (paired two-tailed t -test, FDR corrected, d.f. 9, P  < 0.001 for all, Cohen’s d  ≥ 1.89). For easier visibility, significant differences between ITI and other phases were not plotted. b , Same as a for the ‘written cue’ task (paired two-tailed t -test, FDR corrected, d.f. 9, P Cue_Internal  = 0.028, Cohen’s d  > 0.86; P Cue_Speech  = 0.022, Cohen’s d  = 0.95; all others P  < 0.001 and Cohen’s d  ≥ 1.65). c , The percentage of neurons tuned during the internal speech phase that are also tuned during the vocalized speech phase. Neurons tuned during the internal speech phase were computed as in Fig. 3b separately for each session day. From these, the percentage of neurons that were also tuned during vocalized speech was calculated. More than 80% of neurons during internal speech were also tuned during vocalized speech (82% in the ‘auditory cue’ task, 85% in the ‘written cue’ task). In total, 71% of ‘auditory cue’ and 79% ‘written cue’ neurons also preserved tuning to at least one identical word during internal speech and vocalized speech phases. d , The percentage of neurons tuned during the internal speech phase that were also tuned during the cue phase. Right: 78% of neurons tuned during internal speech were also tuned during the written cue phase. Left: a smaller 47% of neurons tuned during the internal speech phase were also tuned during the auditory cue phase. In total, 71% of neurons preserved tuning between the written cue phase and the internal speech phase, while 42% of neurons preserved tuning between the auditory cue and the internal speech phase.

The strongest shared neural representations were found between visual word recognition, internal speech and vocalized speech (Fig. 6b ). A model trained on internal speech was highly generalizable to both vocalized speech and written cued words, evidence for a possible shared neural code (Fig. 6b , internal). In contrast, the model’s performance was significantly lower when tested on data recorded in the auditory cue phase (Fig. 6a , training phase internal: paired t -test, d.f. 9, P Cue_Internal  < 0.001, Cohen’s d  = 2.16; P Cue_Speech  < 0.001, Cohen’s d  = 3.34). These differences could stem from the inherent challenges in comparing visual and auditory language stimuli, which differ in processing time: instantaneous for text versus several hundred milliseconds for auditory stimuli.

We evaluated the capability of a classification model, initially trained to distinguish words during vocalized speech, in its ability to generalize to internal and cue phases (Fig. 6a,b , training phase speech). The model demonstrated similar levels of generalization during internal speech and in response to written cues, as indicated by the lack of significance in decoding accuracy between the internal and written cue phase (Fig. 6b , training phase speech, cue–internal). However, the model generalized significantly better to internal speech than to representations observed during the auditory cue phase (Fig. 6a , training phase speech, d.f. 9, P Cue_Internal  < 0.001, Cohen’s d  = 2.85).

Neuronal representation of words at the single-neuron level was highly consistent between internal speech, vocalized speech and written cue phases. A high percentage of neurons were not only active during the same task phases but also preserved identical tuning to at least one word (Fig. 6c,d ). In total, 82–85% of neurons active during internal speech were also active during vocalized speech. In 71–79% of neurons, tuning was preserved between the internal speech and vocalized speech phases (Fig. 6c ). During the cue phase, 78% of neurons active during internal speech were also active during the written cue (Fig. 6d , right). However, a lower percentage of neurons (47%) were active during the auditory cue phase (Fig. 6d , left). Similarly, 71% of neurons preserved tuning between the written cue phase and the internal speech phase, while 42% of neurons preserved tuning between the auditory cue phase and the internal speech phase.

Together with the cross-phase analysis, these results suggest strong shared neural representations between internal speech, vocalized speech and the written cue, both at the single-neuron and at the population level.

Robust decoding of multiple internal speech strategies within the SMG

Strong shared neural representations in participant 1 between written, inner and vocalized speech suggest that all three partly represent the same cognitive process or all cognitive processes share common neural features. While internal and vocalized speech have been shown to share common neural features 36 , similarities between internal speech and the written cue could have occurred through several different cognitive processes. For instance, the participant’s observation of the written cue could have activated silent reading. This process has been self-reported as activating internal speech, which can involve ‘hearing’ a voice, thus having an auditory component 42 , 47 . However, the participant could also have mentally pictured an image of the written word while performing internal speech, involving visual imagination in addition to language processes. Both hypotheses could explain the high amount of shared neural representation between the written cue and the internal speech phases (Fig. 6b ).

We therefore compared two possible internal sensory strategies in participant 1: a ‘sound imagination’ strategy in which the participant imagined hearing the word, and a ‘visual imagination’ strategy in which the participant visualized the word’s image (Supplementary Fig. 5a ). Each strategy was cued by the modalities we had previously tested (auditory and written words) (Table 1 ). To assess the similarity of these internal speech processes to other task phases, we conducted a cross-phase decoding analysis (as performed in Fig. 6 ). We hypothesized that, if the high cross-decoding results between internal and written cue phases primarily stemmed from the participant engaging in visual word imagination, we would observe lower decoding accuracies during the auditory imagination phase.

Both strategies demonstrated high representation of the four-word dataset (Supplementary Fig. 5b , highest 94%, chance level 25%). These results suggest our speech BMI decoder is robust to multiple types of internal speech strategy.

The participant described the ‘sound imagination’ strategy as being easier and more similar to the internal speech condition of the first experiment. The participant’s self-reported strategy suggests that no visual imagination was performed during internal speech. Correspondingly, similarities between written cue and internal speech phases may stem from internal speech activation during the silent reading of the cue.

In this work, we demonstrated a decoder for internal and vocalized speech, using single-neuron activity from the SMG. Two chronically implanted, speech-abled participants with tetraplegia were able to use an online, closed-loop internal speech BMI to achieve on average 79% and 23% classification accuracy with 16–32 training trials for an eight-word vocabulary. Furthermore, high decoding was achievable with only 24 s of training data per word, corresponding to 16 trials each with 1.5 s of data. Firing rates recorded from S1 showed generalized activation only during vocalized speech activity, but individual words were not classifiable. In the SMG, shared neural representations between internal speech, the written cue and vocalized speech suggest the occurrence of common processes. Robust control could be achieved using visual and auditory internal speech strategies. Representation of pseudowords provided evidence for a phonetic word encoding component in the SMG.

Single neurons in the SMG encode internal speech

We demonstrated internal speech decoding of six different words and two pseudowords in the SMG. Single neurons increased their firing rates during internal speech (Fig. 2 , S1 and S2), which was also reflected at the population level (Fig. 3a,b,d,e ). Each word was represented in the neuronal population (Fig. 3c,f ). Classification accuracy and tuning during the internal speech phase were significantly higher than during the previous delay phase (Figs. 3b,e and 5a , and Supplementary Figs. 3c,d and 4b ). This evidence suggests that we did not simply decode sustained activity from the cue phase but activity generated by the participant performing internal speech. We obtained significant offline and online internal speech decoding results in two participants (Fig. 5a and Supplementary Fig. 4b ). These findings provide strong evidence for internal speech processing at the single-neuron level in the SMG.

Neurons in S1 are modulated by vocalized but not internal speech

Neural activity recorded from S1 served as a control for synchronized face and lip movements during internal speech. While vocalized speech robustly activated sensory neurons, no increase of baseline activity was observed during the internal speech phase or the auditory and written cue phases in both participants (Fig. 4 , S1). These results underline no synchronized movement inflated our decoding accuracy of internal speech (Supplementary Fig. 4a,c ).

A previous imaging study achieved significant offline decoding of several different internal speech sentences performed by patients with mild ALS 6 . Together with our findings, these results suggest that a BMI speech decoder that does not rely on any movement may translate to communication opportunities for patients suffering from ALS and locked-in syndrome.

Different face activities are observable but not decodable in arm area of S1

The topographic representation of body parts in S1 has recently been found to be less rigid than previously thought. Generalized finger representation was found in a presumably S1 arm region of interest (ROI) 44 . Furthermore, an fMRI paper found observable face and lip activity in S1 leg and hand ROIs. However, differentiation between two lip actions was restricted to the face ROI 43 . Correspondingly, we observed generalized face and lip activity in a predominantly S1 arm region for participant 1 (see ref. 48 for implant location) and a predominantly S1 hand region for participant 2 during vocalized speech (Fig. 4a,d and Supplementary Figs. 1 and 4a,b ). Recorded neural activity contained similar representations for different spoke words (Fig. 4c,e ) and was not significantly decodable (Supplementary Fig. 4a,c ).

Shared neural representations between internal and vocalized speech

The extent to which internal and vocalized speech generalize is still debated 35 , 42 , 49 and depends on the investigated brain area 36 , 50 . In this work, we found on average stronger representation for vocalized (74%) than internal speech (Fig. 5a , 55%) in participant 1 but the opposite effect in participant 2 (Supplementary Fig. 4b , 24% internal, 21% vocalized speech). Additionally, cross-phase decoding of vocalized speech from models trained on data during internal speech resulted in comparable classification accuracies to those of internal speech (Fig. 6a,b , internal). Most neurons tuned during internal speech were also tuned to at least one of the same words during vocalized speech (71–79%; Fig. 6c ). However, some neurons were only tuned during internal speech, or to different words. These observations also applied to firing rates of individual neurons. Here, we observed neurons that had higher peak rates during the internal speech phase than the vocalized speech phase (Supplementary Fig. 1 : swimming and cowboy). Together, these results further suggest neural signatures during internal and vocalized speech are similar but distinct from one another, emphasizing the need for developing speech models from data recorded directly on internal speech production 51 .

Similar observations were made when comparing internal speech processes with visual word processes. In total, 79% of neurons were active both in the internal speech phase and the written cue phase, and 79% preserved the same tuning (Fig. 6d , written cue). Additionally, high cross-decoding between both phases was observed (Fig. 6b , internal).

Shared representation between speech and written cue presentation

Observation of a written cue may engage a variety of cognitive processes, such as visual feature recognition, semantic understanding and/or related language processes, many of which modulate similar cortical regions as speech 45 . Studies have found that silent reading can evoke internal speech; it can be modulated by a presumed author’s speaking speed, voice familiarity or regional accents 35 , 42 , 47 , 52 , 53 . During silent reading of a cued sentence with a neutral versus increased prosody (madeleine brought me versus MADELEINE brought me), one study in particular found that increased left SMG activation correlated with the intensity of the produced inner speech 54 .

Our data demonstrated high cross-phase decoding accuracies between both written cue and speech phases in our first participant (Fig. 6b ). Due to substantial shared neural representation, we hypothesize that the participant’s silent reading during the presentation of the written cue may have engaged internal speech processes. However, this same shared representation could have occurred if visual processes were activated in the internal speech phase. For instance, the participant could have performed mental visualization of the written word instead of generating an internal monologue, as the subjective perception of internal speech may vary between individuals.

Investigating internal speech strategies

In a separate experiment, participant 1 was prompted to execute different mental strategies during the internal speech phase, consisting of ‘sound imagination’ or ‘visual word imagination’ (Supplementary Fig. 5a ). We found robust decoding during the internal strategy phase, regardless of which mental strategy was performed (Supplementary Fig. 5b ). This participant reported the sound strategy was easier to execute than the visual strategy. Furthermore, this participant reported that the sound strategy was more similar to the internal speech strategy employed in prior experiments. This self-report suggests that the patient did not perform visual imagination during the internal speech task. Therefore, shared neural representation between internal and written word phases during the internal speech task may stem from silent reading of the written cue. Since multiple internal mental strategies are decodable from SMG, future patients could have flexibility with their preferred strategy. For instance, people with a strong visual imagination may prefer performing visual word imagination.

Audio contamination in decoding result

Prior studies examining neural representation of attempted or vocalized speech must potentially mitigate acoustic contamination of electrophysiological brain signals during speech production 55 . During internal speech production, no detectable audio was captured by the audio equipment or noticed by the researchers in the room. In the rare cases the participant spoke during internal speech (three trials), the trials were removed. Furthermore, if audio had contaminated the neural data during the auditory cue or vocalized speech, we would have probably observed significant decoding in all channels. However, no significant classification was detected in S1 channels during the auditory cue phase nor the vocalized speech phase (Supplementary Fig. 2b ). We therefore conclude that acoustic contamination did not artificially inflate observed classification accuracies during vocalized speech in the SMG.

Single-neuron modulation during internal speech with a second participant

We found single-neuron modulation to speech processes in a second participant (Figs. 2d,e and 3f , and Supplementary Fig. 2d ), as well as significant offline and online classification accuracies (Fig. 5a and Supplementary Fig. 4b ), confirming neural representation of language processes in the SMG. The number of neurons distinctly active for different words was lower compared with the first participant (Fig. 2e and Supplementary Fig. 3d ), limiting our ability to decode with high accuracy between words in the different task phases (Fig. 5a and Supplementary Fig. 4b ).

Previous work found that single neurons in the PPC exhibited a common neural substrate for written action verbs and observed actions 56 . Another study found that single neurons in the PPC also encoded spoken numbers 57 . These recordings were made in the superior parietal lobule whereas the SMG is in the inferior parietal lobule. Thus, it would appear that language-related activity is highly distributed across the PPC. However, the difference in strength of language representation between each participant in the SMG suggests that there is a degree of functional segregation within the SMG 37 .

Different anatomical geometries of the SMG between participants mean that precise comparisons of implanted array locations become difficult (Fig. 1 ). Implant locations for both participants were informed from pre-surgical anatomical/vasculature scans and fMRI tasks designed to evoke activity related to grasp and dexterous hand movements 48 . Furthermore, the number of electrodes of the implanted array was higher in the first participant (96) than in the second participant (64). A pre-surgical assessment of functional activity related to language and speech may be required to determine the best candidate implant locations within the SMG for online speech decoding applications.

Impact on BMI applications

In this work, an online internal speech BMI achieved significant decoding from single-neuron activity in the SMG in two participants with tetraplegia. The online decoders were trained on as few as eight repetitions of 1.5 s per word, demonstrating that meaningful classification accuracies can be obtained with only a few minutes’ worth of training data per day. This proof-of-concept suggests that the SMG may be able to represent a much larger internal vocabulary. By building models on internal speech directly, our results may translate to people who cannot vocalize speech or are completely locked in. Recently, ref. 26 demonstrated a BMI speller that decoded attempted speech of the letters of the NATO alphabet and used those to construct sentences. Scaling our vocabulary to that size could allow for an unrestricted internal speech speller.

To summarize, we demonstrate the SMG as a promising candidate to build an internal brain–machine speech device. Different internal speech strategies were decodable from the SMG, allowing patients to use the methods and languages with which they are most comfortable. We found evidence for a phonetic component during internal and vocalized speech. Adding to previous findings indicating grasp decoding in the SMG 23 , we propose the SMG as a multipurpose BMI area.

Experimental model and participant details

Two male participants with tetraplegia (33 and 39 years) were recruited for an institutional review board- and Food and Drug Administration-approved clinical trial of a BMI and gave informed consent to participate (Institutional Review Board of Rancho Los Amigos National Rehabilitation Center, Institutional Review Board of California Institute of Technology, clinical trial registration NCT01964261 ). This clinical trial evaluated BMIs in the PPC and the somatosensory cortex for grasp rehabilitation. One of the primary effectiveness objectives of the study is to evaluate the effectiveness of the neuroport in controlling virtual or physical end effectors. Signals from the PPC will allow the subjects to control the end effector with accuracy greater than chance. Participants were compensated for their participation in the study and reimbursed for any travel expenses related to participation in study activities. The authors affirm that the human research participant provided written informed consent for publication of Supplementary Video 1 . The first participant suffered a spinal cord injury at cervical level C5 1.5 years before participating in the study. The second participant suffered a C5–C6 spinal cord injury 3 years before implantation.

Method details

Data were collected from implants located in the left SMG and the left S1 (for anatomical locations, see Fig. 1 ). For description of pre-surgical planning, localization fMRI tasks, surgical techniques and methodologies, see ref. 48 . Placement of electrodes was based on fMRI tasks involving grasp and dexterous hand movements.

The first participant underwent surgery in November 2016 to implant two 96-channel platinum-tipped multi-electrode arrays (NeuroPort Array, Blackrock Microsystems) in the SMG and in the ventral premotor cortex and two 7 × 7 sputtered iridium oxide film (SIROF)-tipped microelectrode arrays with 48 channels each in the hand and arm area of S1. Data were collected between July 2021 and August 2022. The second participant underwent surgery in October 2022 and was implanted with SIROF-tipped 64-channel microelectrode arrays in S1 (two arrays), SMG, ventral premotor cortex and primary motor cortex. Data were collected in January 2023.

Data collection

Recording began 2 weeks after surgery and continued one to three times per week. Data for this work were collected between 2021 and 2023. Broadband electrical activity was recorded from the NeuroPort Arrays using Neural Signal Processors (Blackrock Microsystems). Analogue signals were amplified, bandpass filtered (0.3–7,500 Hz) and digitized at 30,000 samples s −1 . To identify putative action potentials, these broadband data were bandpass filtered (250–5,000 Hz) and thresholded at −4.5 the estimated root-mean-square voltage of the noise. For some of the analyses, waveforms captured at these threshold crossings were then spike sorted by manually assigning each observation to a putative single neuron; for others, multiunit activity was considered. For participant 1, an average of 33 sorted SMG units (between 22 and 56) and 83 sorted S1 units (between 59 and 96) were recorded per session. For participant 2, an average of 80 sorted SMG units (between 69 and 92) and 81 sorted S1 units (between 61 and 101) were recorded per session. Auditory data were recorded at 30,000 Hz simultaneously to the neural data. Background noise was reduced post-recording by using the noise reduction function of the program ‘Audible’.

Experimental tasks

We implemented different tasks to study language processes in the SMG. The tasks cued six words informed by ref. 31 (spoon, python, battlefield, cowboy, swimming and telephone) as well as two pseudowords (bindip and nifzig). The participants were situated 1 m in front of a light-emitting diode screen (1,190 mm screen diagonal), where the task was visualized. The task was implemented using the Psychophysics Toolbox 58 , 59 , 60 extension for MATLAB. Only the written cue task was used for participant 2.

Auditory cue task

Each trial consisted of six phases, referred to in this paper as ITI, cue, D1, internal, D2 and speech. The trial began with a brief ITI (2 s), followed by a 1.5-s-long cue phase. During the cue phase, a speaker emitted the sound of one of the eight words (for example, python). Word duration varied between 842 and 1,130 ms. Then, after a delay period (grey circle on screen; 0.5 s), the participant was instructed to internally say the cued word (orange circle on screen; 1.5 s). After a second delay (grey circle on screen; 0.5 s), the participant vocalized the word (green circle on screen, 1.5 s).

Written cue task

The task was identical to the auditory cue task, except words were cued in writing instead of sound. The written word appeared on the screen for 1.5 s during the cue phase. The auditory cue was played between 200 ms and 650 ms later than the written cue appeared on the screen, due to the utilization of varied sound outputs (direct computer audio versus Bluetooth speaker).

One auditory cue task and one written cue task were recorded on ten individual session days in participant 1. The written cue task was recorded on seven individual session days in participant 2.

Control experiments

Three experiments were run to investigate internal strategies and phonetic versus semantic processing.

Internal strategy task

The task was designed to vary the internal strategy employed by the participant during the internal speech phase. Two internal strategies were tested: a sound imagination and a visual imagination. For the ‘sound imagination’ strategy, the participant was instructed to imagine what the sound of the word sounded like. For the ‘visual imagination’ strategy, the participant was instructed to perform mental visualization from the written word. We also tested if the cue modality (auditory or written) influenced the internal strategy. A subset of four words were used for this experiment. This led to four different variations of the task.

The internal strategy task was run on one session day with participant 1.

Online task

The ‘written cue task’ was used for the closed-loop experiments. To obtain training data for the online task, a written cue task was run. Then, a classification model was trained only on the internal speech data of the task (see ‘Classification’ section). The closed-loop task was nearly identical to the ‘written cue task’ but replaced the vocalized speech phase by a feedback phase. Feedback was provided by showing the word on the screen either in green if correctly classified or in red if wrongly classified. See Supplementary Video 1 for an example of the participant performing the online task. The online task was run on three individual session days.

Error trials

Trials in which participants accidentally spoke during the internal speech part (3 trials) or said the wrong word during the vocalized speech part (20 trials) were removed from all analysis.

Total number of recording trials

For participant 1, we collected offline datasets composed of eight trials per word across ten sessions. Trials during which participant errors occurred were excluded. In total, between 156 and 159 trials per word were included, with a total of 1,257 trials for offline analysis. On four non-consecutive session days, the auditory cue task was run first, and on six non-consecutive days, the written cue task was run first. For online analysis, datasets were recorded on three different session days, for a total of 304 trials. Participant 2 underwent a similar data collection process, with offline datasets comprising 16 trials per word using the written cue modality over nine sessions. Error trials were excluded. In total, between 142 and 144 trials per word were kept, with a total of 1,145 trials for offline analysis. For online analysis, datasets were recorded on three session days, leading to a total of 448 online trials.

Quantification and statistical analysis

Analyses were performed using MATLAB R2020b and Python, version 3.8.11.

Neural firing rates

Firing rates of sorted units were computed as the number of spikes occurring in 50-ms bins, divided by the bin width and smoothed using a Gaussian filter with kernel width of 50 ms to form an estimate of the instantaneous firing rates (spikes s −1 ).

Linear regression tuning analysis

To identify units exhibiting selective firing rate patterns (or tuning) for each of the eight words, linear regression analysis was performed in two different ways: (1) step by step in 50-ms time bins to allow assessing changes in neuronal tuning over the entire trial duration; (2) averaging the firing rate in each task phase to compare tuning between phases. The model returns a fit that estimates the firing rate of a unit on the basis of the following variables:

where FR corresponds to the firing rate of the unit, β 0 to the offset term equal to the average ITI firing rate of the unit, X is the vector indicator variable for each word w , and β w corresponds to the estimated regression coefficient for word w . W was equal to 8 (battlefield, cowboy, python, spoon, swimming, telephone, bindip and nifzig) 23 .

In this model, β symbolizes the change of firing rate from baseline for each word. A t -statistic was calculated by dividing each β coefficient by its standard error. Tuning was based on the P value of the t -statistic for each β coefficient. A follow-up analysis was performed to adjust for false discovery rate (FDR) between the P values 61 , 62 . A unit was defined as tuned if the adjusted P value is <0.05 for at least one word. This definition allowed for tuning of a unit to zero, one or multiple words during different timepoints of the trial. Linear regression was performed for each session day individually. A 95% confidence interval of the mean was computed by performing the Student’s t -inverse cumulative distribution function over the ten sessions.

Kruskal–Wallis tuning analysis

As an alternative tuning definition, differences in firing rates between words were tested using the Kruskal–Wallis test, the non-parametric analogue to the one-way analysis of variance (ANOVA). For each neuron, the analysis was performed to evaluate the null hypothesis that data from each word come from the same distribution. A follow-up analysis was performed to adjust for FDR between the P values for each task phase 61 , 62 . A unit was defined as tuned during a phase if the adjusted P value was smaller than α  = 0.05.

Classification

Using the neuronal firing rates recorded during the tasks, a classifier was used to evaluate how well the set of words could be differentiated during each phase. Classifiers were trained using averaged firing rates over each task phase, resulting in six matrices of size n ,  m , where n corresponds to the number of trials and m corresponds to the number of recorded units. A model for each phase was built using LDA, assuming an identical covariance matrix for each word, which resulted in best classification accuracies. Leave-one-out CV was performed to estimate decoding performance, leaving out a different trial across neurons at each loop. PCA was applied on the training data, and PCs explaining more than 95% of the variance were selected as features and applied to the single testing trial. A 95% confidence interval of the mean was computed as described above.

Cross-phase classification

To estimate shared neural representations between different task phases, we performed cross-phase classification. The process consisted in training a classification model (as described above) on one of the task phases (for example, ITI) and to test it on the ITI, cue, imagined speech and vocalized speech phases. The method was repeated for each of the ten sessions individually, and a 95% confidence interval of the mean was computed. Significant differences in classification accuracies between phases decoded with the same model were evaluated using a paired two-tailed t -test. FDR correction of the P values was performed (‘Linear regression tuning analysis’) 61 , 62 .

Classification performance significance testing

To assess the significance of classification performance, a null dataset was created by repeating classification 100 times with shuffled labels. Then, different percentile levels of this null distribution were computed and compared to the mean of the actual data. Mean classification performances higher than the 97.5th percentile were denoted with P < 0.05 and higher than 99.5th percentile were denoted with P < 0.01.

dPCA analysis

dPCA was performed on the session data to study the activity of the neuronal population in relation to the external task parameters: cue modality and word. Kobak et al. 63 introduced dPCA as a refinement of their earlier dimensionality reduction technique (of the same name) that attempts to combine the explanatory strengths of LDA and PCA. By deconstructing neuronal population activity into individual components, each component relates to a single task parameter 64 .

This text follows the methodology outlined by Kobak et al. 63 . Briefly, this involved the following steps for N neurons:

First, unlike in PCA, we focused not on the matrix, X , of the original data, but on the matrices of marginalizations, X ϕ . The marginalizations were computed as neural activity averaged over trials, k , and some task parameters in analogy to the covariance decomposition done in multivariate analysis of variance. Since our dataset has three parameters: timing, t , cue modality, \(c\) (for example, auditory or visual), and word, w (eight different words), we obtained the total activity as the sum of the average activity with the marginalizations and a final noise term

The above notation of Kobak et al. is the same as used in factorial ANOVA, that is, \({X}_{{tcwk}}\) is the matrix of firing rates for all neurons, \(< \bullet { > }_{{ab}}\) is the average over a set of parameters \(a,b,\ldots\) , \(\bar{X}= < {X}_{{tcwk}}{ > }_{{tcwk}}\) , \({\bar{X}}_{t}= < {X}_{{tcwk}}-\bar{X}{ > }_{{cwk}}\) , \({\bar{X}}_{{tc}}= < {X}_{{tcwk}}-\bar{X}-{\bar{X}}_{t}-{\bar{X}}_{c}-{\bar{X}}_{w}{ > }_{{wk}}\) and so on. Finally, \({{{\epsilon }}}_{{tcwk}}={X}_{{tcwk}}- < {X}_{{tcwk}}{ > }_{k}\) .

Participant 1 datasets were composed of N  = 333 (SMG), N  = 828 (S1) and k  = 8. Participant 2 datasets were composed of N  = 547 (SMG), N  = 522 (S1) and k  = 16. To create balanced datasets, error trials were replaced by the average firing rate of k  − 1 trials.

Our second step reduced the number of terms by grouping them as seen by the braces in the equation above, since there is no benefit in demixing a time-independent pure task, \(a\) , term \({\bar{X}}_{a}\) from the time–task interaction terms \({\bar{X}}_{{ta}}\) since all components are expected to change with time. The above grouping reduced the parametrization down to just five marginalization terms and the noise term (reading in order): the mean firing rate, the task-independent term, the cue modality term, the word term, the cue modality–word interaction term and the trial-to-trial noise.

Finally, we gained extra flexibility by having two separate linear mappings \({F}_{\varphi }\) for encoding and \({D}_{\varphi }\) for decoding (unlike in PCA, they are not assumed to be transposes of each other). These matrices were chosen to minimize the loss function (with a quadratic penalty added to avoid overfitting):

Here, \({{\mu }}=(\lambda\Vert X\Vert)^{2}\) , where λ was optimally selected through tenfold CV in each dataset.

We refer the reader to Kobak et al. for a description of the full analytic solution.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data supporting the findings of this study are openly available via Zenodo at https://doi.org/10.5281/zenodo.10697024 (ref. 65 ). Source data are provided with this paper.

Code availability

The custom code developed for this study is openly available via Zenodo at https://doi.org/10.5281/zenodo.10697024 (ref. 65 ).

Hecht, M. et al. Subjective experience and coping in ALS. Amyotroph. Lateral Scler. Other Mot. Neuron Disord. 3 , 225–231 (2002).

Google Scholar  

Aflalo, T. et al. Decoding motor imagery from the posterior parietal cortex of a tetraplegic human. Science 348 , 906–910 (2015).

CAS   PubMed   PubMed Central   Google Scholar  

Andersen, R. A. Machines that translate wants into actions. Scientific American https://www.scientificamerican.com/article/machines-that-translate-wants-into-actions/ (2019).

Andersen, R. A., Aflalo, T. & Kellis, S. From thought to action: the brain–machine interface in posterior parietal cortex. Proc. Natl Acad. Sci. USA 116 , 26274–26279 (2019).

Andersen, R. A., Kellis, S., Klaes, C. & Aflalo, T. Toward more versatile and intuitive cortical brain machine interfaces. Curr. Biol. 24 , R885–R897 (2014).

Dash, D., Ferrari, P. & Wang, J. Decoding imagined and spoken phrases from non-invasive neural (MEG) signals. Front. Neurosci. 14 , 290 (2020).

PubMed   PubMed Central   Google Scholar  

Luo, S., Rabbani, Q. & Crone, N. E. Brain–computer interface: applications to speech decoding and synthesis to augment communication. Neurotherapeutics https://doi.org/10.1007/s13311-022-01190-2 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Martin, S., Iturrate, I., Millán, J. D. R., Knight, R. T. & Pasley, B. N. Decoding inner speech using electrocorticography: progress and challenges toward a speech prosthesis. Front. Neurosci. 12 , 422 (2018).

Rabbani, Q., Milsap, G. & Crone, N. E. The potential for a speech brain–computer interface using chronic electrocorticography. Neurotherapeutics 16 , 144–165 (2019).

Lopez-Bernal, D., Balderas, D., Ponce, P. & Molina, A. A state-of-the-art review of EEG-based imagined speech decoding. Front. Hum. Neurosci. 16 , 867281 (2022).

Nicolas-Alonso, L. F. & Gomez-Gil, J. Brain computer interfaces, a review. Sensors 12 , 1211–1279 (2012).

Herff, C., Krusienski, D. J. & Kubben, P. The potential of stereotactic-EEG for brain–computer interfaces: current progress and future directions. Front. Neurosci. 14 , 123 (2020).

Angrick, M. et al. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J. Neural Eng. https://doi.org/10.1088/1741-2552/ab0c59 (2019).

Herff, C. et al. Generating natural, intelligible speech from brain activity in motor, premotor, and inferior frontal cortices. Front. Neurosci. 13 , 1267 (2019).

Kellis, S. et al. Decoding spoken words using local field potentials recorded from the cortical surface. J. Neural Eng. 7 , 056007 (2010).

Makin, J. G., Moses, D. A. & Chang, E. F. Machine translation of cortical activity to text with an encoder–decoder framework. Nat. Neurosci. 23 , 575–582 (2020).

Metzger, S. L. et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620 , 1037–1046 (2023).

Moses, D. A. et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 385 , 217–227 (2021).

Guenther, F. H. et al. A wireless brain–machine interface for real-time speech synthesis. PLoS ONE 4 , e8218 (2009).

Stavisky, S. D. et al. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 8 , e46015 (2019).

Wilson, G. H. et al. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. J. Neural Eng. 17 , 066007 (2020).

Willett, F. R. et al. A high-performance speech neuroprosthesis. Nature 620 , 1031–1036 (2023).

Wandelt, S. K. et al. Decoding grasp and speech signals from the cortical grasp circuit in a tetraplegic human. Neuron https://doi.org/10.1016/j.neuron.2022.03.009 (2022).

Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568 , 493–498 (2019).

Bocquelet, F., Hueber, T., Girin, L., Savariaux, C. & Yvert, B. Real-time control of an articulatory-based speech synthesizer for brain computer interfaces. PLoS Comput. Biol. 12 , e1005119 (2016).

Metzger, S. L. et al. Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis. Nat. Commun. 13 , 6510 (2022).

Meng, K. et al. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J. Neural Eng. https://doi.org/10.1088/1741-2552/ace7f6 (2023).

Proix, T. et al. Imagined speech can be decoded from low- and cross-frequency intracranial EEG features. Nat. Commun. 13 , 48 (2022).

Pei, X., Barbour, D. L., Leuthardt, E. C. & Schalk, G. Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans. J. Neural Eng. 8 , 046028 (2011).

Ikeda, S. et al. Neural decoding of single vowels during covert articulation using electrocorticography. Front. Hum. Neurosci. 8 , 125 (2014).

Martin, S. et al. Word pair classification during imagined speech using direct brain recordings. Sci. Rep. 6 , 25803 (2016).

Angrick, M. et al. Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity. Commun. Biol. 4 , 1055 (2021).

Price, C. J. The anatomy of language: a review of 100 fMRI studies published in 2009. Ann. N. Y. Acad. Sci. 1191 , 62–88 (2010).

PubMed   Google Scholar  

Langland-Hassan, P. & Vicente, A. Inner Speech: New Voices (Oxford Univ. Press, 2018).

Perrone-Bertolotti, M., Rapin, L., Lachaux, J.-P., Baciu, M. & Lœvenbruck, H. What is that little voice inside my head? Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring. Behav. Brain Res. 261 , 220–239 (2014).

CAS   PubMed   Google Scholar  

Pei, X. et al. Spatiotemporal dynamics of electrocorticographic high gamma activity during overt and covert word repetition. NeuroImage 54 , 2960–2972 (2011).

Oberhuber, M. et al. Four functionally distinct regions in the left supramarginal gyrus support word processing. Cereb. Cortex 26 , 4212–4226 (2016).

Binder, J. R. Current controversies on Wernicke’s area and its role in language. Curr. Neurol. Neurosci. Rep. 17 , 58 (2017).

Geva, S. et al. The neural correlates of inner speech defined by voxel-based lesion–symptom mapping. Brain 134 , 3071–3082 (2011).

Cooney, C., Folli, R. & Coyle, D. Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech. Neurosci. Biobehav. Rev. 140 , 104783 (2022).

Dash, D. et al. Interspeech (International Speech Communication Association, 2020).

Alderson-Day, B. & Fernyhough, C. Inner speech: development, cognitive functions, phenomenology, and neurobiology. Psychol. Bull. 141 , 931–965 (2015).

Muret, D., Root, V., Kieliba, P., Clode, D. & Makin, T. R. Beyond body maps: information content of specific body parts is distributed across the somatosensory homunculus. Cell Rep. 38 , 110523 (2022).

Rosenthal, I. A. et al. S1 represents multisensory contexts and somatotopic locations within and outside the bounds of the cortical homunculus. Cell Rep. 42 , 112312 (2023).

Leuthardt, E. et al. Temporal evolution of gamma activity in human cortex during an overt and covert word repetition task. Front. Hum. Neurosci. 6 , 99 (2012).

Indefrey, P. & Levelt, W. J. M. The spatial and temporal signatures of word production components. Cognition 92 , 101–144 (2004).

Alderson-Day, B., Bernini, M. & Fernyhough, C. Uncharted features and dynamics of reading: voices, characters, and crossing of experiences. Conscious. Cogn. 49 , 98–109 (2017).

Armenta Salas, M. et al. Proprioceptive and cutaneous sensations in humans elicited by intracortical microstimulation. eLife 7 , e32904 (2018).

Cooney, C., Folli, R. & Coyle, D. Neurolinguistics research advancing development of a direct-speech brain–computer interface. iScience 8 , 103–125 (2018).

Soroush, P. Z. et al. The nested hierarchy of overt, mouthed, and imagined speech activity evident in intracranial recordings. NeuroImage https://doi.org/10.1016/j.neuroimage.2023.119913 (2023).

Soroush, P. Z. et al. The nested hierarchy of overt, mouthed, and imagined speech activity evident in intracranial recordings. NeuroImage 269 , 119913 (2023).

Alexander, J. D. & Nygaard, L. C. Reading voices and hearing text: talker-specific auditory imagery in reading. J. Exp. Psychol. Hum. Percept. Perform. 34 , 446–459 (2008).

Filik, R. & Barber, E. Inner speech during silent reading reflects the reader’s regional accent. PLoS ONE 6 , e25782 (2011).

Lœvenbruck, H., Baciu, M., Segebarth, C. & Abry, C. The left inferior frontal gyrus under focus: an fMRI study of the production of deixis via syntactic extraction and prosodic focus. J. Neurolinguist. 18 , 237–258 (2005).

Roussel, P. et al. Observation and assessment of acoustic contamination of electrophysiological brain signals during speech production and sound perception. J. Neural Eng. 17 , 056028 (2020).

Aflalo, T. et al. A shared neural substrate for action verbs and observed actions in human posterior parietal cortex. Sci. Adv. 6 , eabb3984 (2020).

Rutishauser, U., Aflalo, T., Rosario, E. R., Pouratian, N. & Andersen, R. A. Single-neuron representation of memory strength and recognition confidence in left human posterior parietal cortex. Neuron 97 , 209–220.e3 (2018).

Brainard, D. H. The psychophysics toolbox. Spat. Vis. 10 , 433–436 (1997).

Pelli, D. G. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat. Vis. 10 , 437–442 (1997).

Kleiner, M. et al. What’s new in psychtoolbox-3. Perception 36 , 1–16 (2007).

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57 , 289–300 (1995).

Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29 , 1165–1188 (2001).

Kobak, D. et al. Demixed principal component analysis of neural population data. eLife 5 , e10989 (2016).

Kobak, D. dPCA. GitHub https://github.com/machenslab/dPCA (2020).

Wandelt, S. K. Data associated to manuscript “Representation of internal speech by single neurons in human supramarginal gyrus”. Zenodo https://doi.org/10.5281/zenodo.10697024 (2024).

Download references

Acknowledgements

We thank L. Bashford and I. Rosenthal for helpful discussions and data collection. We thank our study participants for their dedication to the study that made this work possible. This research was supported by the NIH National Institute of Neurological Disorders and Stroke Grant U01: U01NS098975 and U01: U01NS123127 (S.K.W., D.A.B., K.P., C.L. and R.A.A.) and by the T&C Chen Brain-Machine Interface Center (S.K.W., D.A.B. and R.A.A.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.

Author information

Authors and affiliations.

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA

Sarah K. Wandelt, David A. Bjånes, Kelsie Pejsa, Brian Lee, Charles Liu & Richard A. Andersen

T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA

Sarah K. Wandelt, David A. Bjånes, Kelsie Pejsa & Richard A. Andersen

Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA

David A. Bjånes & Charles Liu

Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA

Brian Lee & Charles Liu

USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

S.K.W., D.A.B. and R.A.A. designed the study. S.K.W. and D.A.B. developed the experimental tasks and collected the data. S.K.W. analysed the results and generated the figures. S.K.W., D.A.B. and R.A.A. interpreted the results and wrote the paper. K.P. coordinated regulatory requirements of clinical trials. C.L. and B.L. performed the surgery to implant the recording arrays.

Corresponding author

Correspondence to Sarah K. Wandelt .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Human Behaviour thanks Abbas Babajani-Feremi, Matthew Nelson and Blaise Yvert for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary Figs. 1–5.

Reporting Summary

Peer review file, supplementary video 1.

The video shows the participant performing the internal speech task in real time. The participant is cued with a word on the screen. After a delay, an orange dot appears, during which the participant performs internal speech. Then, the decoded word appears on the screen, in green if it is correctly decoded and in red if it is wrongly decoded.

Supplementary Data

Source data for Fig. 3.

Source data for Fig. 4.

Source data for Fig. 5.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 5

Source data fig. 6, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Wandelt, S.K., Bjånes, D.A., Pejsa, K. et al. Representation of internal speech by single neurons in human supramarginal gyrus. Nat Hum Behav (2024). https://doi.org/10.1038/s41562-024-01867-y

Download citation

Received : 15 May 2023

Accepted : 16 March 2024

Published : 13 May 2024

DOI : https://doi.org/10.1038/s41562-024-01867-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Brain-reading device is best yet at decoding ‘internal speech’.

  • Miryam Naddaf

Nature (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

inner represented speech

University of Notre Dame

Notre Dame Philosophical Reviews

  • Home ›
  • Reviews ›

Inner Speech: New Voices

Placeholder book cover

Peter Langland-Hassan and Agustín Vicente (eds.), Inner Speech: New Voices , Oxford University Press, 2018, 336pp., $70.00 (hbk), ISBN 9780198796640.

Reviewed by Marta Jorba, University of the Basque Country (UPV/EHU)

This collection offers a comprehensive, diverse and timely treatment of inner speech, the phenomenon of "the little voice in the head". It is the first volume entirely dedicated to inner speech from an interdisciplinary perspective and includes contributions by leading experts in philosophy, psychology and neuroscience. Inner speech is a pervasive feature of our minds that is introspectively salient and empirically tractable, a feature whose nature and functions are a matter of debate. It is a language-like phenomenon that has been seen as a form of thought or intrinsically related to it, a conscious phenomenon that involves a perceptual hearing component, and is also involved in several cognitive functions. The book offers an in-depth treatment of the phenomenon from both theoretical and empirical approaches and thus provides a unique platform to contrast and evaluate the various approaches. Moreover, by directly engaging with inner speech, its contributors provide insights into the nature of thinking, consciousness, perception, action, self-knowledge and the self, thus presenting a network of interrelated topics for the study of the mind.

The book is divided into two parts; the first six chapters are devoted to the nature of inner speech and the second six to the self-reflection and self-knowledge functions attributed to inner speech. The chapters can be read quite independently. However, it should be noted that the interdisciplinary value of the book might become an obstacle for readers not familiar with technical terms and methods in philosophy, psychology or neuroscience.

The editors, Peter Langland-Hassan and Agustín Vicente, provide an instructive introduction, presenting the complexity of the phenomenon, the motivation for focusing on it, its intrinsic interest as well as its connection to a wide range of other perennial questions in philosophy and psychology. They offer a general and very useful map to navigate the landscape. They first provide a broad-brush description of the history of the study of inner speech, mainly highlighting the still influential work of the Russian psychologist Lev S. Vygotsky (1987) together with experimental psychology research on working memory (Baddeley 1992, 2007). In analytic philosophy of mind, the topic was of peripheral interest until the nineties, when several thinkers started to focus on inner speech for their theories of consciousness, explanations of auditory verbal hallucinations (AVHs) and inserted thoughts, self-knowledge, or the relation of language and thought. In the rest of the introduction, Langland-Hassan and Vicente summarize the book's main contributions, making use of several guiding questions.

The book's central questions are, first, what is the nature of inner speech and, second, what is/are the function(s) of inner speech in cognitive processes. All the contributors present their  views on one or both questions, as well as discussing other more specific related issues.

Regarding the nature of inner speech: while the "little voice in the head" is a pre-theoretically good enough expression to localize the phenomenon, more technical definitions reveal the discrepancies as to what should count as proper inner speech. It is difficult to find a unified way of referring to the phenomenon in the book, as different definitions are found depending on the focus of inquiry and the level of description or explanation: (i) the experience of inner speech, (ii) the causes and mechanisms that underlie inner speech and (iii) the neurological evidence associated with inner speech-involving tasks. Examples of all these perspectives can be found in the volume. The book leaves quite open the specific ways in which such characterizations are (in)compatible or rather complement each other.

Russell T. Hurlburt and Christopher L. Heavey (Chapter 6) are the main proponents of a detailed description of (i), the experience of inner speech. They focus on what they call the "pristine experience of inner speech", meaning the phenomenon that occurs and is directly apprehended by people in their everyday environment. Using the Descriptive Sampling Method (DES), designed to explore pristine inner experience in high fidelity (p. 179) -- described and discussed in several other works (Hurlburt and Akther 2006; Hurlburt and Schwitzgebel 2007) -- they present different varieties of inner experience: unworded speech, inner speech with complete sentences, and unsymbolized thinking. Hurlburt and Heavey are reluctant to base definitions (and subsequent theories) of inner speech on casual introspection, questionnaires and sampling that does not provide training in bracketing presuppositions or inferences from experimental settings.

As examples of (ii), the causes and mechanisms that underlie inner speech, we find the complex discussion on computational models of inner speech, in particular motor control models. Seven contributions mention or discuss the issue whether inner speech can be regarded as the product of a forward model. A forward model is an internal representation of the system (body, limb, organ) that captures the forward or causal relationship between the inputs to the system (motor commands) and the outputs (Lœvenbruck et al., p. 147). The details of the various existing proposals are presented by Lauren Swiney and by Hèléne Lœvenbruck et al. Swiney (Chapter 12) presents the conceptions of inner speech implied in the different accounts of schizophrenia symptoms such as AVHs and inserted thoughts. She describes in detail two competing models: (i) inner speech as a prediction in the absence of sensory input (which parallels the literature in motor imagery), or (ii) inner speech as an act with sensory consequences that are themselves predicted (which parallels the literature on language production). Swiney's chapter could have appeared in Part I next to the one on computational models by Lœvenbruck et al., which would have also maintained the homogeneity of Part II's contents (self-knowledge and self-reflection functions of inner speech).

A development of (i) is given by Lœvenbruck et al. (Chapter 5) with their clear exposition of the various notions and processes involved in the predictive control model (forward model, efference copy, sensory attenuation, etc.). Lœvenbruck et al. also defend the idea -- in contrast to the other approaches -- of talking about inner language, conceived of as multimodal acts with multisensory percepts (auditory, somatosensory, visual), stemming from coarse multisensory goals. Accepting definitions that rely on the causes and mechanism underlying inner speech, Peter Carruthers argues that inner speech is an attended sensory forward model of a rehearsed speech action, where the action has been selected over the others by unconscious appraisal and decision-making processes.

Focusing on (iii), the neurological evidence associated with inner speech-involving tasks, Sharon Geva (Chapter 4) offers a detailed and exhaustive review of the main findings of the neural basis of mental imagery and inner speech. The bulk of her chapter is dedicated to functional imaging studies, summarizing the main results related to a variety of inner speech-involving tasks (pp. 108-117): word repetition, verb generation, stem completion, rhyme judgment, homophone judgment, fluency, and verbal transformation. After reviewing the studies of inner speech in aphasia (pp. 117-119), she discusses the principles of mental imagery through its common mechanisms: auditory, visual, and olfactory imagery activate primary sensory areas, whereas inner speech and motor imagery are higher brain functions that require multiple steps and processors. In the case of inner speech, linguistic processing, perception and execution are involved.

One interesting question regarding the nature of inner speech treated by several contributors is whether inner speech presents an auditory-phonological nature and whether this is an essential property or an associated episode. Langland-Hassan (Chapter 3) answers affirmatively, claiming that inner speech has an auditory-phonological component (or represents it). His argument summarized is that: (1) inner speech is keyed to a specific natural language, (2) the only feature that inner speech episodes plausibly have that will allow us to swiftly and reliably determine which language they are keyed to is their auditory-phonological component (semantics, syntax, phonology, graphology and articulation are discarded), and (3) therefore, inner speech must have an auditory-phonological component. From this introspective argument, he moves to the essential claim that all inner speech involves an auditory-phonological component by arguing that unconscious inner speech has it as well. Langland-Hassan's article is a good example of an empirically-informed philosophical argument on the topic.

Sam Wilkinson and Charles Fernyhough (Chapter 9) take a different position regarding the auditory-phonological nature of inner speech. They claim that inner speech represents both the sound of an utterance and a state of affairs with semantic content, although just the latter is assessable for accuracy. The auditory-phonological representations are cases of "content without commitment" (p. 256). They further argue that we can be misled about two specific aspects of the representation, the kind of mental state one is in when engaging in inner speech and the agent of the episode, whose speech act it is (thus leading to episodes of AVHs). Even if inner speech episodes represent sounds, Wilkinson and Fernyhough maintain that inner speaking and inner hearing are two distinct but related phenomena. For them, inner speech is a productive rather than a re-creative phenomenon of imagining or inner hearing -- even if inner hearing and inner speaking are related. Hurlburt and Heavey also defend a sharp distinction between these two phenomena.

Christopher Gauker (Chapter 2), in contrast, states that inner speech is a kind of thought that consists in internal tokening of words and sentences of a natural language and, crucially, the auditory-phonological component is not a proper part of inner speech but rather an associated episode by which we become aware of inner speech. This view characterizes inner speech in analogy with outer speech, where we can distinguish between outer speech per se and the perception or comprehension of outer speech. Inner speech per se would be the result of production mechanisms and the perception of inner speech would be a related but separated phenomenon. Interestingly for the discussion on the auditory-phonological component, Geva also concludes her contribution by stating that the activation of brain auditory areas and the presence of auditory percepts of inner speech is still a matter of debate.

One way to know about the nature of inner speech is by exploring its pathologies and the cases in which one or several features of the processes functions differently. The most examined cases are the conditions of AVHs and the delusion of thought insertion. Swiney explains that failures of inner speech that have to do with AVHs and inserted thoughts have been posited to affect the sense of agency, resulting in inner speech that is not felt as one's own. This model, she argues, still presents open questions about the way in which the approach specifies inner speech in relation to overt actions such as hand movement or talking. She then discusses the different views on the mechanisms that might underpin both symptoms. Langland-Hassan proposes a new way of conceiving them in which a unified diagnostic might be available. Inserted thoughts would be a subset of AVHs insofar as reports of inserted thoughts seem to the patient to occur in natural language and, thus, following Langland-Hassan's argument, can be said to possess auditory-phonological properties.

Moving to the second issue, the function/s of inner speech, we again find several views. On the one hand, Carruthers (Chapter 1) suggests that inner speech functions enable the mental rehearsal and evaluation of overt speech actions. The proper function of inner speech is, according to him, conscious planning, although it has evolved to play other roles. For Wilkinson and Fernyhough, we talk to ourselves as a way of expressing and reflecting on our own minds without having to risk giving that information away. Relying on this reflection function, Alain Morin (Chapter 11) reviews a comprehensive body of empirical work to show the self-reflective role of inner speech, understood as involving different forms of what he (perhaps too) broadly calls 'self-awareness': self-ascription, self-concept formation, self-knowledge, self-evaluation, self-esteem, sense of agency, self-regulation, mental time travel, and self-efficiency. He argues that the kind of information about the self that inner speech can provide is mostly conceptual, and so inner speech is not necessary to achieve lower, more perceptual forms of self-referential activities.

Picking out a specific reflection function, Edouard Machery (Chapter 10) argues that inner speech allows us to transparently know our beliefs, in contrast to our desires, which are opaque: beliefs are transparently communicated by assertions (although not all), while desires are not. The listener does not need anything more than the speech act to be justified in believing that the speaker believes so and so (this is why they are transparent). According to him, inner speech is a form of communication and beliefs are social mental states: they exist to be communicated.

Still within the question of the function of inner speech, José Luis Bermúdez (Chapter 7) defends the view that inner speech is required for intentional ascent, i.e., thinking about our thinking. The idea is that we can only think our thoughts when they are linguistically formulated. The two cases presented by Bermúdez are reflective evaluation and monitoring of one's own thinking and propositional attitude mindreading. He presents a nine-step argument (p. 202) to defend his view. He then responds to two main objections to the view: the problem that inner speech is too semantically indeterminate to present a thought as the object of reflective awareness (Martínez-Manrique and Vicente 2010) and the problematic implication that the view has of carrying two different types of content (auditory and propositional) at the same time (Langland-Hassan 2014). Regarding the first, Bermúdez provides an alternative way of thinking about semantic indeterminacy and linguistic understanding (pp. 207-211) and, with respect to the second, he argues that the problem with the two represented contents disappears when one denies that inner speech, besides having auditory-phonological properties, also represents those properties (pp. 213-217).

Bermúdez, together with other authors such as Clark (1998) and Prinz (2007) have been representatives of what have been called 'format views', according to which the proper function of inner speech is to enable thinking, and other functions are derivative of it. In contrast, activity views (Fernyhough 2009, Martínez-Manrique and Vicente 2015) of a Vygotskyan inspiration defend the position that inner speech is the activity of outer speech internalized, thus inheriting the main functions of speech acts: motivation, reminding, aid reasoning, etc. Relevant to this is the view Keith Frankish (Chapter 8) presents according to which, with certain modifications, the format and the activity view are compatible. He holds that one of the functions of inner speech is to provide a format (a representational medium) for conscious thinking, but is typically an activity, which has many functions continuous with those of outer speech. Frankish pairs inner speech with Type-2 thinking -- a slow, serial, conscious form of reasoning linked to language, conceived as a form of intentional action -- in contrast with Type-1 thinking -- fast, non-conscious and automatic. He defends the claim that intentional reasoning is a cyclical process in which inner speech is used in particular steps. This process is an internalization of linguistic exchanges in problem-solving settings.

The recurring and common topics that provide a diversity of approaches to a single issue are one of the most interesting aspects of the book, as we have already seen. Perhaps inevitably, however, some questions are treated only in one chapter or are just briefly mentioned in others. A central issue to inner speech research that could have been included in a separate chapter concerns methodological questions on empirical investigations. Aside from Hurlburt and Heavey's brief discussion on methods, we do not find anything on the usefulness and reliability of questionnaires or sampling methods, on their possibilities and limitations, or on suggestions about how to integrate (if possible) phenomenological reports with indirect measures. Phenomenological reports track subjective qualities of inner speech while indirect measures, relying mainly on articulatory suppression or phonological judgments, purport to show the presence or absence of inner speech in the realization of cognitive tasks. A general methodological discussion could have enriched this already very complete treatment of inner speech.

Another missing element is the treatment and development of the role of inner speech in pathologies other than schizophrenia. Beyond the specific differential functioning of inner speech in AVHs and inserted thoughts, the volume only incorporates brief discussions of the role of inner speech in aphasia (mainly in Geva's chapter, pp. 117-119), but doesn't cover research on the role of inner speech in other linguistic impairments or in other conditions such as autistic spectrum conditions (ASC). Although research on inner speech and autism is not abundant, there are a few studies that have dealt with this issue (see Williams et al. 2016) and could have contributed to expand horizons on inner speech and psychopathologies.

Overall, the volume succeeds in its attempt to provide common general research questions beyond the more or less isolated contributions on the topic that could be found in the literature so far. But as a first comprehensive compilation of approaches to inner speech, it is understandable that "the work of extracting the key points of agreement and dispute among different research programs remains to be done" (p. 4), as the editors point out. In my opinion, this very same fact makes the book an interesting and useful platform for further discussion and development of the topic. All in all, the book is an excellent collection of cutting-edge research on the philosophy, psychology and neuroscience of inner speech, a phenomenon that is key to many different cognitive processes. Carefully engaging with it will prove useful to students, professors, researchers and anyone interested in the nature of the mind.

Baddeley, A. D. (1992). "Working memory", Science, 255 (5044), 556-9.


Baddeley, A. D. (2007). Working memory, thought and action . Oxford: Oxford University Press.

Clark, A. (1998). "Magic words: how language augments human computation". In P. Carruthers and J. Boucher (eds.), Language and Thought: Interdisciplinary Themes (pp. 162-83). Cambridge: Cambridge University Press.


Fernyhough, C. (2009). "Dialogic thinking". In A. Winsler, C. Fernyhough, and I. Montero (eds.), Private Speech, Executive Functioning, and the Development of Verbal Self-Regulation , Cambridge: Cambridge University Press, 42-52.

Hurlburt, R. T., and Akhter, S. A. (2006). "The Descriptive Experience Sampling method", Phenomenology and the Cognitive Sciences , 5, 271-301.


Hurlburt, R. T., and Schwitzgebel, E. (2007). Describing inner experience? Proponent meets skeptic . Cambridge, MA: MIT Press.


Langland-Hassan, P. (2014). "Inner speech and metacognition: In search of a connection", Mind and Language , 29, 511-33.


Martínez-Manrique, F., and Vicente, A. (2010). "What the . . . ! The role of inner speech in conscious thought", Journal of Consciousness Studies , 17, 141-67.


Martínez-Manrique, F., and Vicente, A. (2015). "The activity view of inner speech", Frontiers in Psychology , 6(232), 1-13.

Prinz, J. J. (2007). "All consciousness is perceptual". In B. P. McLaughlin and J. D. Cohen (eds.), Contemporary Debates in Philosophy of Mind , Oxford, UK: Blackwell Publishing, 335-57.

Vygotsky, L. S. (1987). Thinking and speech . In The collected works of L. S. Vygotsky (Vol. 1) . New York: Plenum (Original work published 1934).

Williams, D.M., Peng, C., Wallace, G. L, (2016). "Verbal Thinking and Inner Speech in Autism Spectrum Disorder", Neuropsychol Rev 26:394-419.

  • Search Menu
  • Advance articles
  • Analysis Author Guidelines
  • Analysis Reviews Author Guidelines
  • Submission site
  • Why Publish with Analysis?
  • About Analysis
  • About The Analysis Trust
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Terms and Conditions
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

  • 1. The nature of inner speech
  • 2. Inner speech, self-reflection, and self-knowledge
  • 3. Inner speech as conversation
  • 4. Conclusion
  • < Previous

Inner Speech: New Voices

  • Article contents
  • Figures & tables
  • Supplementary Data

Daniel Gregory, Inner Speech: New Voices, Analysis , Volume 80, Issue 1, January 2020, Pages 164–173, https://doi.org/10.1093/analys/anz096

  • Permissions Icon Permissions

In the last 10 years, inner speech – the little voice in the head – has started to become established as a topic in the philosophy of psychology. The two philosophers who have contributed most to this development are Agustín Vicente 1 and Peter Langland-Hassan. Together, they have now edited the first largely philosophical anthology on the topic, Inner Speech: New Voices . 2

In the first sentence of their introduction, Vicente and Langland-Hassan write that, ‘[i]n another possible world, not far from our own, inner speech occupies center stage in contemporary philosophical psychology’ (1). 3 After all, the phenomenon lies at the meeting point of ‘intersecting “big ticket” questions about thought, language and consciousness’ (2). An example they offer of one such question: does inner speech merely express thoughts, or is inner speech itself actually a kind of thought? Readers who work through all of the contributions to the volume might well start to think that the philosophers and psychologists in this nearby world have fixed on something that we have neglected.

Email alerts

Citing articles via.

  • Author Guidelines
  • Contact Analysis Trust
  • Recommend to your Library

Affiliations

British Philosophical Association

  • Online ISSN 1467-8284
  • Print ISSN 0003-2638
  • Copyright © 2024 The Analysis Trust
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

Why the White House Won’t Retreat on Commencement Speeches

President Biden Delivers Howard University Commencement Address

This article is part of The D.C. Brief, TIME’s politics newsletter. Sign up here to get stories like this sent to your inbox.

Even before the White House announced last month that President Joe Biden would deliver one of his two commencement speeches this graduation season at Morehouse College in Atlanta, the prestigious historically Black college was seeing signs of the same political unrest that was bubbling up at dozens of colleges and universities across the country.

But Morehouse is no Columbia . There have been no massive pro-Palestinian encampments, no clashes between protesters and police in riot gear. The tension is there all the same, most of it a bare inch below the surface, and Democrats fear it may erupt this Sunday as Biden dons a graduation robe to address its student body.

Since the school and the White House announced on April 23 that Biden would be the school’s commencement speaker, faculty , and students have engaged in an intense debate over blemishes in Biden’s records—not just his handling of the conflict in Gaza and his critical posture toward pro-Palestinian student protestors, but also his policies related to mass incarcerations and policing. Put plainly: All of the reasons Biden’s poll numbers with Black voters—and their intersectional allies—are softening have come into focus in Atlanta, as Biden arrives for a marquee speech at a major HBCU. It’s a microcosm of the fracturing coalition that put Biden in power in 2020, a trend that even White House apologists acknowledge threatens to make their boss a one-term President.

White House allies and partners—and there is a difference between legitimate support and mutual utility—say Biden would never consider retreating from the college speeches. From a pure optics standpoint, it would be a gift to the Trump campaign. “How can we say he’s up for another four years confronting Putin, defending NATO, countering China, protecting voting rights, and working to make sure reproductive health is a right when he can’t handle a campus protest?” one outside Biden adviser says. “Joe Biden is not going to write the other side’s scripts for them.”

But deeper still, Biden personally finds something deeply gratifying about delivering commencement speeches. Even as a first-term senator, he would happily deliver commencement addresses to students barely younger than himself on almost any campus that would have him. Over the years, Biden has come to relish the format and circumstances of speaking to new graduates on a day many would long remember. 

“He loves this [stuff],” says a former Biden aide who is no longer actively working in his orbit. “He feels like this is how he changes every life in that room, makes a difference for a generation, leaves his mark on a fulcrum day. He can’t do retail as much as he used to, so this is as close as he’s going to get.”

Biden insiders say his delivering of speeches to audiences of mixed political allegiances reflects a larger truth: he remains unshakable in his belief that when a good person makes a moral argument to an honest audience, persuasion is possible and even skeptics will bend to his will. Absent that, Biden believes he at least owes his opponents an honest airing of their viewpoint. While many of his more jaded hired guns find such thinking naive, they also concede a begrudging respect that the old-school pol hasn’t been corrupted by the cynicism that pervades so much of Washington. No one believes in Biden more than Biden himself, for better or worse.

Biden’s defenders are quick to point out that his belief in compromise and cajoling, coupled with his track record on thorny negotiations, gives him reason to rationally think he’s correct. When skeptics said his ambitious agendas for infrastructure , green-energy , and, more recently, sending more weapons for Ukraine , faced forceful skeptics, Biden kept pushing until there were signing ceremonies at the White House. The whole of his reelection raison d'etre is embedded in those laws, and win or lose this November, his successors will find it difficult to unspool them. Nor is Biden even remotely interested in hearing that he can’t do something, whether it’s win over a room of noisy critics or dislodge months of hardened political opposition that arrives on his West Wing doorstep.

So, it’s with this situational awareness that Biden’s team is sending him to Morehouse for his speech on Sunday. Ahead of that, though, the White House last Friday dispatched Steve Benjamin, the former mayor of Columbia, S.C., who now heads the White House Office of Public Engagement, to hear students’ concerns. Chief among them: Biden would turn his visit to a legendary HBCU into a campaign stop; Benjamin assured them Biden would be there to celebrate the graduates, not to look for votes in Georgia. 

But before the speech comes a faculty vote scheduled for Thursday to sign-off on Biden’s honorary degree. Such a vote is typically a formality. When Barack Obama visited campus in 2013 and picked up his honorary degree, the vote was completely unremarkable. Yet Team Biden will be watching that vote closely, as no one can say if he’ll draw the same rubber stamp in this charged moment.

Morehouse officially invited Biden back in September to become the second U.S. President in history to speak to graduates. He didn’t accept the invite until April, perhaps not coincidentally around the time ex-President Donald Trump stopped for lunch not far from campus.

Morehouse administrators have made clear that rescinding the President’s invitation is not an option: Morehouse President David A. Thomas has told students, alumni, and activists alike that there was zero chance the college would “reverse course.”

It’s a similar resolve in the White House, which is linking Biden’s approach to the Morehouse visit to his winning over wary voters for re-election.

As deputy White House press secretary Andrew Bates put it on April 23: “When people speak out at our events, he shows empathy. He shows compassion. He respects their right to make their voice heard. And I think that that says a lot about how he is approaching what is a very complex situation.”

The White House is trying to convey an empathy for what Morehouse students are saying without conceding that they are right, added press secretary Karine Jean-Pierre a day later. “We get it. It’s painful,” she said.

But no one knows if all those efforts in recognizing Biden’s critics will convince Morehouse’s faculty and students gathering this weekend to receive the President politely and with few interruptions. Former Rep. Cedric Richmond, a Morehouse alumnus and a co-chairman of the Biden re-election campaign, is monitoring the situation and quietly working his network to remind leaders that a rebuke of Biden at such a prominent venue heading into an admittedly difficult re-election bid may feel good for activists in the moment but ultimately only helps Trump’s odds of returning to power. It’s not elegant, but it’s the reality facing the students on the campus that educated the Rev. Dr. Martin Luther King Jr.—and may have a lesson or two for Biden this weekend, too.

Make sense of what matters in Washington. Sign up for the D.C. Brief newsletter .

More Must-Reads from TIME

  • The New Face of Doctor Who
  • Putin’s Enemies Are Struggling to Unite
  • Women Say They Were Pressured Into Long-Term Birth Control
  • Scientists Are Finding Out Just How Toxic Your Stuff Is
  • Boredom Makes Us Human
  • John Mulaney Has What Late Night Needs
  • The 100 Most Influential People of 2024
  • Want Weekly Recs on What to Watch, Read, and More? Sign Up for Worth Your Time

Write to Philip Elliott at [email protected]

  • Search Please fill out this field.
  • Manage Your Subscription
  • Give a Gift Subscription
  • Newsletters
  • Sweepstakes
  • Entertainment

Breaking Down Harrison Butker's Speech: Read the Chiefs Player's Most Controversial Comments

Fans are calling for Harrison Butker to be removed from the Chiefs' roster for the upcoming NFL season after his 20-minute commencement speech at Benedictine College

inner represented speech

Kansas City Chiefs kicker Harrison Butker 's commencement speech at Benedictine College, a Catholic school, continues to cause outrage.

The NFL player's 20-minute address included attacks on working women, the LBGTQ+ community and families who utilize surrogacy and in vitro fertilization (IVF).

Butker and the Chiefs did not immediately return PEOPLE's requests for comment. In a statement, the NFL’s Chief Diversity Officer Jonathan Beane says, "Harrison Butker gave a speech in his personal capacity. His views are not those of the NFL as an organization. The NFL is steadfast in our commitment to inclusion, which only makes our league stronger.”

Former Kansas City commissioner Justice Horn went further, slamming Butker in a post on X (formerly known as Twitter). "Harrison Butker doesn’t represent Kansas City nor has he ever," he wrote . "Kansas City has always been a place that welcomes, affirms, and embraces our LGBTQ+ community members."

Rapper Flavor Flav chimed in, "Sounds like some players 'need to stay in their lanes' and shouldn’t be giving commencement speeches."

Cooper Neill/Getty

Writer Cyd Zeigler wrote , "Pretty awful to hear an NFL player so proudly tell women to 'stay in their lane,' serve their man and make babies. Not to mention comparing Pride month to a 'deadly sin' and lobbing bombs at the trans community. Not a fan."

Amid the backlash, below is a breakdown of Butker's most controversial comments from the speech.

On Taylor Swift's Lyric

A portion of the outrage against Butker came from supporters of Taylor Swift after he shared one of her lyrics, which fans noticed he misinterpreted.

Butker quoted a lyric from Swift's 2022 song "Bejeweled," and referred to her as his teammate, Travis Kelce 's girlfriend.

"As my teammate's girlfriend would say, familiarity breeds contempt," Butker said when discussing the values of the Catholic Church.

What seemed to have been lost on Butker, however, is that the song's message is about Swift embracing her independence away from a former boyfriend.

On the Covid-19 Pandemic and President Joe Biden

At the start of his speech, Butker addressed how the class of 2024 was affected by the COVID-19 pandemic at the start of their college careers. The subject quickly transitioned into attacks on President Joe Biden and topics such as abortion, IVF, surrogacy and euthanasia.

"I'm sure your high school graduation was not what you had imagined and most likely neither was your first couple years of college. By making it to this moment through all the adversity thrown your way from COVID, I hope you learned the important lesson that suffering in this life is only temporary," he said. "As a group, you witnessed firsthand how bad leaders who don't stay in their lane can have a negative impact on society."

Butker went on: "Bad policies and poor leadership have negatively impacted major life issues. Things like abortion, IVF, surrogacy, euthanasia, as well as a growing support for degenerate cultural values in media, all stem from the pervasiveness of disorder. Our own nation is led by a man who publicly and proudly proclaims his Catholic faith, but at the same time is delusional enough to make the sign of the cross."

The Chiefs kicker continued his verbal attack on Biden, saying, "During a pro-abortion rally, he has been so vocal in his support for the murder of innocent babies that I'm sure to many people it appears that you can be both Catholic and pro-choice."

"This is an important reminder that being Catholic alone doesn't cut it. These are the sorts of things we're told in polite society to not bring up. The difficult and unpleasant things. But if we are going to be men and women for this time in history, we need to stop pretending that the 'Church of Nice' is a winning proposition. We must always speak and act in charity, but never mistake charity for cowardice."

He continued: "As members of the church founded by Jesus Christ, it is our duty and ultimately privilege to be authentically and unapologetically Catholic."

Carmen Mandato/Getty 

On LGBTQ+ Pride

Butker then turned the focus of his speech to the Catholic values at Benedictine College. However, his sentiments attacked the LGBTQ+ and trans communities.

Butker said, "Benedictine has gone from just another liberal arts school with nothing to set it apart to a thriving beacon of light ... I'm certain the reporters at the AP could not have imagined that their attempt to rebuke and embarrass places and people like those here at Benedictine wouldn't be met with anger, but instead met with excitement and pride, not the deadly sin sort of pride that has an entire month dedicated to it, but the true God-centered pride that is cooperating with the Holy Ghost to glorify him."

On IVF and Surrogacy

Butker later attacked families who utilize IVF and surrogacy to have children, saying, "It is imperative that this class, this generation, in this time in our society must stop pretending that the things we see around us are normal."

"Heterodox ideas abound, even within Catholic circles. Let's be honest, there is nothing good about playing God with having children, whether that be your ideal number or the perfect time to conceive. No matter how you spin it, there is nothing natural about Catholic birth control. It is only in the past few years that I have grown encouraged to speak more boldly and directly because as I mentioned earlier, I have leaned into my vocation as a husband and father and as a man."

Never miss a story — sign up for  PEOPLE's free daily newsletter  to stay up-to-date on the best of what PEOPLE has to offer, from juicy celebrity news to compelling human interest stories.

On Working Women

In a direct address to the female graduates at Benedictine, Butker said, "For the ladies present today, congratulations on an amazing accomplishment. I want to speak directly to you briefly because I think it is you, the women, who have had the most diabolical lies told to you."

"Some of you may go on to lead successful careers in the world, but I would venture to guess that the majority of you are most excited about your marriage and the children you will bring into this world."

Of his spouse, Butker said, "I can tell you that my beautiful wife, Isabelle, would be the first to say that her life truly started when she began living her vocation as a wife and as a mother. I'm on this stage today and able to be the man I am because I have a wife who leans into her vocation. I'm beyond blessed with the many talents God has given me, but it cannot be overstated that all of my success is made possible because a girl I met in band class back in middle school, who would convert to the faith, become my wife and embrace one of the most important titles of all: homemaker."

Jamie Squire/Getty

He continued, "I say all of this to you because I have seen firsthand how much happier someone can be when they disregard the outside noise and move closer and closer to God's will in their life. Isabelle's dream of having a career might not have come true, but if you asked her today if she has any regrets on her decision, she would laugh out loud without hesitation and say, heck no."

Several users on social media have pointed out , however, that Butker's mother, Elizabeth Butker, is a successful physicist at Emory University's Department of Radiation Oncology.

On Masculinity

Moving his focus to the male graduates in the room, Butker said, "To the gentlemen here today, part of what plagues our society is this lie that has been told to you that men are not necessary in the home or in our communities. As men, we set the tone of the culture. And when that is absent, disorder, dysfunction and chaos set in ..."

"Other countries do not have nearly the same absentee father rates as we find here in the U.S., and a correlation could be made in their drastically lower violence rates as well. Be unapologetic in your masculinity. Fight against the cultural emasculation of men. Do hard things. Never settle for what is easy. You might have a talent that you don't necessarily enjoy, but if it glorifies God, maybe you should lean into that over something that you might think suits you better."

Concluding his speech, Butker said, "I know that my message today had a little less fluff than is expected for these speeches, but I believe that this audience and this venue is the best place to speak openly and honestly about who we are and where we all want to go, which is heaven. I thank God for Benedictine College and for the example it provides to the world."

"Make no mistake, you are entering into mission territory in a post God world, but you are made for this and with God by your side and a constant striving for virtue within your vocation, you too can be a saint. Christ is king to the heights."

Related Articles

Benedictine Sisters of Mount St. Scholastica: Butker graduation speech divisive, doesn’t represent college’s values

WICHITA, Kan. (KWCH) - Providing a Catholic organization’s perspective on a college graduation speech that stirred controversy over the weekend, the Benedictine Sisters of Mount St. Scholastica weighed in on what they heard from the Kansas City Chiefs All-Pro kicker Harrison Butker Sunday at Benedictine College.

“As a founding institution and sponsor of Benedictine College, the sisters of Mount St. Scholastica find it necessary to respond to the controversial remarks of [Butker] as commencement speaker,” the group said, establishing their motive for weighing in on the speech.

The sisters of Mount St. Scholastica established that they “do not believe that [Butker’s] comments in his 2024 Benedictine College commencement address represent the Catholic, Benedictine, liberal arts college that [their] founders envisioned and in which [they] have been so invested.”

“Instead of promoting unity in our church, our nation, and the world, his comments seem to have fostered division,” the Benedictine Sisters of Mount St. Scholastica said. One of our concerns was the assertion that being a homemaker is the highest calling for a woman. We sisters have dedicated our lives to God and God’s people, including the many women whom we have taught and influenced during the past 160 years. These women have made a tremendous difference in the world in their roles as wives and mothers and through their God-given gifts in leadership, scholarship, and their careers.”

The sisters expanded on what the Benedictine community’s view in regard to the role of “homemaker.”

“Our community has taught young women and men not just how to be “homemakers” in a limited sense, but rather how to make a Gospel-centered, compassionate home within themselves where they can welcome others as Christ, empowering them to be the best versions of themselves,” the sisters said. “We reject a narrow definition of what it means to be Catholic. We are faithful members of the Catholic Church who embrace and promote the values of the Gospel, St. Benedict, and Vatican II and the teachings of Pope Francis.”

Speaking on the wider faith community the sisters touched on the importance of inclusivity.

“We want to be known as an inclusive, welcoming community, embracing Benedictine values that have endured for more than 1500 years and have spread through every continent and nation. We believe those values are the core of Benedictine College,” the Benedictine Sisters of Mount St. Scholastica said.

Copyright 2024 KWCH. All rights reserved. To report a correction or typo, please email [email protected]

inner represented speech

Spirit AeroSystems confirms hundreds to be laid off in Wichita

FILE - Kansas City Chiefs kicker Harrison Butker speaks to the media during NFL football Super...

NFL is batting down comments made by Chiefs kicker Harrison Butker

Liberal teen found safe after Amber Alert

Police: Teen in Liberal, Kan. Amber Alert knows suspect

inner represented speech

Kapaun student arrested, accused of aggravated assaul, criminal threat

Latest news.

The Wichita Police Department has arrested a 17-year-old male for felony murder.

Silver Alert cancelled, missing Wichita man found safe

inner represented speech

Economic impact broken down as hundreds brace for layoffs from Spirit AeroSystems

inner represented speech

Economic expert breaks down impact of Spirit AeroSystems layoffs on Wichita

inner represented speech

Notice from city could put stop to Hutchinson's 'Magic Clothes Bus'

Magic Clothes bus in Hutchinson, Kansas

Notice from city could put stop to Hutchinson’s ‘Magic Clothes Bus’

NFL Responds to Kansas City Chiefs Player Harrison Butker's Controversial Graduation Speech

The nfl commented on kansas city chiefs kicker harrison butker's polarizing benedictine college commencement speech, saying that, "his views are not those of the nfl as an organization.".

The NFL is making it clear that  Harrison Butker  does not speak for them.

The Kansas City Chiefs kicker faced criticism  for a May 11 commencement speech he gave at Benedictine College  in Atchison, Kan., in which he touched on a number of topics from abortion to the role of women and LGBTQ+ rights.

Following the graduation address, the NFL clarified that Butker's comments do not represent the league as a whole.

"Harrison Butker gave a speech in his personal capacity," the NFL's senior vice president and chief diversity and inclusion officer  Jonathan Beane  said in a statement to  People . "His views are not those of the NFL as an organization. The NFL is steadfast in our commitment to inclusion, which only makes our league stronger."

During his speech, Butker discussed various political and religious topics, and even quoted the song "Bejeweled" by  Taylor Swift , the girlfriend of his teammate  Travis Kelce .

"Tragically, so many priests revolve much of their happiness from the adulation they receive from their parishioners. And in searching for this, they let their guard down and become overly familiar," he "said. "This undue familiarity will prove to be problematic every time. Because as my teammate's girlfriend says, 'familiarity breeds contempt.'"

Trending Stories

Ben affleck & jennifer lopez wear wedding rings amid breakup rumors, why sophie turner “hated” being called one of the jonas brothers wives, nfl responds to harrison butker's controversial graduation speech.

The 28-year-old also touched on the role he thinks women should play, saying that while many female graduates might "go on to lead successful careers in the world," he believes more of them are "most excited about your marriage and the children you will bring into this world." According to the athlete, his wife  Isabelle   Butker  "would be the first to say her life truly started when she started living her vocation as a wife and as a mother."

Butker—who shares two children with Isabelle—additionally took aim at the  LGBTQ+  community, saying that Pride Month is "the deadly sin sort of pride," and that the community promotes "dangerous gender ideologies."

He also added that while the  COVID-19  pandemic "might've played a large role throughout your formative years, it is not unique."

"Bad policies and poor leadership have negatively impacted major life issues," he continued. "Things like abortion, IVF, surrogacy, euthanasia, as well as a growing support for degenerate cultural values in media all stem from the pervasiveness of disorder."

E! News reached out to reps for Swift, Butker and the Chiefs for comment but has yet to hear back.

Maria Shriver Calls Out Harrison Butker for Graduation Speech

Just look at taylor swift and travis kelce's lake como vacation photos.

Efficient Representation Learning for Inner Speech Domain Generalization

  • Conference paper
  • First Online: 20 September 2023
  • Cite this conference paper

inner represented speech

  • Han Wei Ng   ORCID: orcid.org/0000-0002-4764-4765 15 , 16 &
  • Cuntai Guan   ORCID: orcid.org/0000-0002-0872-3276 15  

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14184))

Included in the following conference series:

  • International Conference on Computer Analysis of Images and Patterns

500 Accesses

2 Citations

Brain computer interfaces (BCIs) enable users to interact with computers via the decoding of their neural activity. In this work, we seek to show the efficacy of “Inner Speech” as an additional communication paradigm. In BCIs, Electroencephalography (EEG) signals are the most regularly used due to their non-invasive nature of collection. However, a frequent problem plaguing EEG-based systems is the high noise-to-signal ratio which often results in poorly performing decoding models. This is further compounded by both intra- and inter-subject variations with their brain signal domain. In this work, we propose a novel Siamese variational autoencoder (VAE) network which allows for unsupervised representation learning to be performed on EEG data. We further implement a selective framework whereby a contrastive loss can be used to selectively reject training data which may not match the target subject’s domain. Finally, by leveraging the lossy compression of the VAE network, the model may be used as a signal pre-processing step towards domain generalisation of the training data. Our results obtained classification accuracy significantly above previous benchmarks while reducing the amount of training time needed through selective learning.

This research/project is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-PhD-2021-08-021) and the RIE2020 AME Programmatic Fund, Singapore (No. A20G8b0102).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abdelfattah, S.M., Abdelrahman, G.M., Wang, M.: Augmenting the size of EEG datasets using generative adversarial networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2018)

Google Scholar  

Bao, G., et al.: Two-level domain adaptation neural network for EEG-based emotion recognition. Front. Hum. Neurosci. 14 , 605246 (2021)

Article   Google Scholar  

Brigham, K., Kumar, B.V.: Imagined speech classification with EEG signals for silent communication: a preliminary investigation into synthetic telepathy. In: 2010 4th International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4. IEEE (2010)

Chen, Z., Ono, N., Altaf-Ul-Amin, M., Kanaya, S., Huang, M.: iVAE: an improved deep learning structure for EEG signal characterization and reconstruction. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1909–1913. IEEE (2020)

Dai, M., Zheng, D., Na, R., Wang, S., Zhang, S.: EEG classification of motor imagery using a novel deep learning framework. Sensors 19 (3), 551 (2019)

Goldenholz, D.M., et al.: Mapping the signal-to-noise-ratios of cortical sources in magnetoencephalography and electroencephalography. Hum. Brain Mapp. 30 (4), 1077–1086 (2009)

Han, Z., Fu, Z., Chen, S., Yang, J.: Contrastive embedding for generalized zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2371–2381 (2021)

Lawhern, V.J., Solon, A.J., Waytowich, N.R., Gordon, S.M., Hung, C.P., Lance, B.J.: EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J. Neural Eng. 15 (5), 056013 (2018)

Nieto, N., Peterson, V., Rufiner, H.L., Kamienkowski, J.E., Spies, R.: Thinking out loud, an open-access EEG-based BCI dataset for inner speech recognition. Sci. Data 9 (1), 1–17 (2022)

Razavi, A., Van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

Saha, S., Baumert, M.: Intra-and inter-subject variability in EEG-based sensorimotor brain computer interface: a review. Front. Comput. Neurosci. 13 , 87 (2020)

Schirrmeister, R.T., et al.: Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 38 (11), 5391–5420 (2017)

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

Stober, S., Sternin, A., Owen, A.M., Grahn, J.A.: Deep feature learning for EEG recordings. arXiv preprint arXiv:1511.04306 (2015)

Viola, F.C., Debener, S., Thorne, J., Schneider, T.R.: Using ICA for the analysis of multi-channel EEG data. In: Simultaneous EEG and fMRI: Recording, Analysis, and Application: Recording, Analysis, and Application, pp. 121–133 (2010)

Wan, Z., Yang, R., Huang, M., Zeng, N., Liu, X.: A review on transfer learning in EEG signal analysis. Neurocomputing 421 , 1–14 (2021)

Wang, Y., Nakanishi, M., Zhang, D.: EEG-based brain-computer interfaces. In: Neural Interface: Frontiers and Applications, pp. 41–65 (2019)

Wang, Y., Qiu, S., Li, D., Du, C., Lu, B.L., He, H.: Multi-modal domain adaptation variational autoencoder for EEG-based emotion recognition. IEEE/CAA J. Autom. Sinica 9 , 1612–1626 (2022)

Wolpaw, J.R.: Brain-computer interfaces (BCIS) for communication and control. In: Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 1–2 (2007)

Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41 (9), 2251–2265 (2018)

Zhang, K., Robinson, N., Lee, S.W., Guan, C.: Adaptive transfer learning for EEG motor imagery classification with deep convolutional neural network. Neural Netw. 136 , 1–10 (2021)

Download references

Author information

Authors and affiliations.

School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore, 639798, Singapore

Han Wei Ng & Cuntai Guan

AI Singapore PhD Fellowship Programme, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Han Wei Ng .

Editor information

Editors and affiliations.

Cyprus University of Technology, Limassol, Cyprus

Nicolas Tsapatsoulis

Cyprus University of Technology/CYENS Center of Excellence, Limassol, Cyprus

Andreas Lanitis

The University of New Mexico, Albuquerque, NM, USA

Marios Pattichis

University of Cyprus/CYENS Center of Excellence, Nicosia, Cyprus

Constantinos Pattichis

University of Cyprus/KIOS Center of Excellence, Nicosia, Cyprus

Christos Kyrkou

Efthyvoulos Kyriacou

Zenonas Theodosiou

CYENS Center of Excellence, Nicosia, Cyprus

Andreas Panayides

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Ng, H.W., Guan, C. (2023). Efficient Representation Learning for Inner Speech Domain Generalization. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14184. Springer, Cham. https://doi.org/10.1007/978-3-031-44237-7_13

Download citation

DOI : https://doi.org/10.1007/978-3-031-44237-7_13

Published : 20 September 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-44236-0

Online ISBN : 978-3-031-44237-7

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. PPT

    inner represented speech

  2. PPT

    inner represented speech

  3. TYPES OF NARRATION REPRESENTED SPEECH Lecture 5 Types

    inner represented speech

  4. Why Inner Speech is Important?, PDF

    inner represented speech

  5. PPT

    inner represented speech

  6. The role of inner speech

    inner represented speech

VIDEO

  1. Reported speech. Statement

  2. How does the language that we speak shape the way we think?

  3. Lawyer's incredible speech dismantling Israeli occupation of Palestine at The Hague

  4. Women's inner strength

  5. Private, Social, and Inner Speech and Languaging

  6. Indirect Speech

COMMENTS

  1. Inner Speech

    Inner speech is known as the "little voice in the head" or "thinking in words." It attracts philosophical attention in part because it is a phenomenon where several topics of perennial interest intersect: language, consciousness, thought, imagery, communication, imagination, and self-knowledge all appear to connect in some way or other to the little voice in the head.

  2. Inner speech: Development, cognitive functions, phenomenology, and

    Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. ... according to which there is only one kind of inner speech, represented at the level of phonemic selection, but where that representation can be modulated by ...

  3. How Inner Monologues Work, and Who Has Them

    Exploring the ecological validity of thinking on demand: Neural correlates of elicited vs. spontaneously occurring inner speech. PLoS One , 11 (2), e0147932. DOI: 10.1371/journal.pone.0147932

  4. Concepts, abstractness and inner speech

    Inner speech might play a role in this searching process and be differentially involved in concept learning compared with use of known concepts. Importantly, inner speech comes in different varieties—e.g. it can be expanded or condensed (with the latter involving syntactic and semantic forms of abbreviation).

  5. When Inner Speech Misleads

    9.1. Introduction. Most philosophers think that at least some experiences have representational content: they represent the world as being a certain way. 1 Representational content dictates accuracy conditions, namely, what would need to be the case in order for the experience to be accurate. Inner speech, that "interior monologue" or familiar voice inside your head, is something that we ...

  6. Inner Speech: Definition and Uses

    Inner speech is a form of internalized, self-directed dialogue: talking to oneself. The phrase inner speech was used by Russian psychologist Lev Vygotsky to describe a stage in language acquisition and the process of thought. In Vygotsky's conception, "Speech began as a social medium and became internalized as inner speech, that is, verbalized thought," (Katherine Nelson, Narratives From the ...

  7. Representation of internal speech by single neurons in human

    SMG represented words as well as pseudowords, providing evidence for phonetic encoding. ... Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring ...

  8. Inner Speech: New Voices

    Even if inner speech episodes represent sounds, Wilkinson and Fernyhough maintain that inner speaking and inner hearing are two distinct but related phenomena. For them, inner speech is a productive rather than a re-creative phenomenon of imagining or inner hearing -- even if inner hearing and inner speaking are related.

  9. Inner Speech and Mental Imagery: A Neuroscientific Perspective

    Inner speech has been investigated using neuroscientific techniques since the beginning of the twentieth century. One of the most important finding is that inner and overt speech differ in many respects, not only in the absence/presence of articulatory movements. ... Here participants are asked to judge whether the words represented in the ...

  10. Inner Speech: New Voices

    When we produce inner speech, we represent our underlying mental states. For example, if you produce an assertion in inner speech, you represent your underlying belief in the proposition expressed. I want to focus on the idea that inner speech is a kind of actual, silent speech, rather than imagined speech. Wilkinson and Fernyhough provide two ...

  11. Inner Speech Brain Mapping. Is It Possible to Map What We Cannot

    Inner speech (IS), or the ability to speak silently in our heads, is an important function in our rich mental life that lies at the crossroad of other cognitive domains such as language, thinking, and working memory as well as self-reflective and self-regulatory functions. An outstanding question is how to measure IS, as this phenomenon is not ...

  12. Inner Speech: Nature and Functions

    of inner speech, as opposed to the role of the language system itself. 3. The Phenomenology of Inner Speech The nature of inner speech may be studied by purely psycholinguistic means, in terms of the representational constraints the theories need to posit so as to account for the empiri-cal findings, as represented by MacKay (1992).

  13. The emotional component of inner speech: A pilot ...

    These data also further support the hypothesis that phonological characteristics can be represented in inner speech. Recently, the special cognitive architecture of inner speech has been developed (Chella & Pipitone, 2020). This cognitive model is based on the integration of the Standard Model of Mind and Baddeley's working memory model.

  14. From speech to voice: on the content of inner speech

    However, as we shall see in Sect. 6, inner speech violates this downward dependency: in inner speech we can represent voices directly communicating propositions and words. Central to vocal content is the voice slot. As I noted, voices are channels of communication, not sources or signals. But there are multiple ways a channel can be characterized.

  15. From 'external speech' to 'inner speech' in Vygotsky: A critical

    1.Introduction 1. This paper continues the critical exploration, begun in Jones (2007), of language theory in Lev Vygotsky's 'cultural-historical' psychology (e.g., Vygotsky, 1986).I will focus here on certain linguistic notions that are crucial to Vygotsky's approach, namely those of 'external speech', 'egocentric speech' and 'inner speech' along with the process of ...

  16. Inner Speech: The Private Area to Remember, Play, and Dream

    The main function attributed to inner speech in psychological research is problem-solving. However, other relevant functions for psychological development have been less considered. For example, inner speech is also a place for remembering, playing or dreaming, during everyday activities of daily life. In these forms of inner speech expression ...

  17. The representational and the expressive: Two functions of the inner speech

    Vygotsky, inner speech is the use of the external word internalized by the . subject, ... account for the relationship between repr esentation and what is represented. On.

  18. PDF The representational and the expressive: Two functions of the inner speech

    Vygotsky, inner speech is the use of the external word internalized by the subject, which in turn allows the expression of thought. ... account for the relationship between representation and what ...

  19. The expressive dimension of inner speech

    318 2017 I volume 28 I número 3 I 318-326 Abstract: This article is a theoretical proposal about the expressive dimension of inner speech, a phenomenon that emphasizes the Karl Buhler's proposal in relation to the expressiveness of the language; the Heinz Werner ́s studies about a physiognomic-organismic dimension of human language and the theoretical and empirical approach of Lev Vygotsky ...

  20. Represented Speech

    Represented speech is that form of utterance which conveys the actual words of the speaker through the mouth of the writer but retains the peculiarities of the speaker's mode of expression. Represented speech exists in two varieties: 1) uttered represented speech and 2) unuttered or inner represented speech. a) Uttered Represented Speech.

  21. Inner represented speech

    Represented speech is form of utterance which conveys the actual words of the speaker through the mouth of the writer but retains the pecu­liarities of the speaker's mode of expression. Represented speech exists in two varieties: 1) uttered represented speech and 2) unuttered or inner represented speech. As has often been pointed out, language ...

  22. Not Everybody Has an Inner Voice: Behavioral Consequences of

    We found that adults who reported low levels of inner speech (N = 46) had lower performance on a verbal working memory task and more difficulty performing rhyme judgments compared with adults who reported high levels of inner speech (N = 47). Task-switching performance—previously linked to endogenous verbal cueing—and categorical effects on ...

  23. The Role of Inner Speech in Educational Processes

    At this initial stage, pointing is represented by the child's movement, which seems to be pointing to an object—that and nothing more. When the mother comes to the child's aid and realizes his movement indicates something, the situation changes fundamentally. ... Inner speech is built on the deepest motivations of consciousness. Thus, the ...

  24. Represented Speech

    Represented speech is that form of utterance which conveys the actual words of the speaker through the mouth of the writer but retains the peculiarities of the speaker's mode of expression. Represented speech exists in two varieties: 1) uttered represented speech and 2) unuttered or inner represented speech. << < Предыдущая 53 54 55 ...

  25. Biden's Inner Circle Braces For Protests at Commencement

    But before the speech comes a faculty vote scheduled for Thursday to sign-off on Biden's honorary degree. Such a vote is typically a formality. When Barack Obama visited campus in 2013 and ...

  26. Breaking Down Harrison Butker's Speech: Read His Most Controversial

    Writer Cyd Zeigler wrote, "Pretty awful to hear an NFL player so proudly tell women to 'stay in their lane,' serve their man and make babies. Not to mention comparing Pride month to a 'deadly sin ...

  27. Benedictine Sisters of Mount St. Scholastica: Butker graduation speech

    The sisters of Mount St. Scholastica established that they "do not believe that [Butker's] comments in his 2024 Benedictine College commencement address represent the Catholic, Benedictine ...

  28. NFL Responds to Harrison Butker's Controversial Graduation Speech

    Following the graduation address, the NFL clarified that Butker's comments do not represent the league as a whole. "Harrison Butker gave a speech in his personal capacity," the NFL's senior vice ...

  29. Efficient Representation Learning for Inner Speech Domain

    We use the Thinking Out Loud EEG dataset prepared by Nieto et al. [] to evaluate the proposed framework for low-resource multi-class inner speech classification.Inner speech as defined by the authors is the internalized process in which an individual thinks in pure meanings, generally associated with an auditory imagery of own inner "voice".

  30. I'll stay on as MP if Tories lose election, says Sunak

    Rishi Sunak said he would stay on as an MP even if the Conservatives lose the general election. The Prime Minister said he would "of course" continue to represent Richmond, his North Yorkshire ...