Page Statistics

Table of contents.

  • Introduction to Functional Computer
  • Fundamentals of Architectural Design

Data Representation

  • Instruction Set Architecture : Instructions and Formats
  • Instruction Set Architecture : Design Models
  • Instruction Set Architecture : Addressing Modes
  • Performance Measurements and Issues
  • Computer Architecture Assessment 1
  • Fixed Point Arithmetic : Addition and Subtraction
  • Fixed Point Arithmetic : Multiplication
  • Fixed Point Arithmetic : Division
  • Floating Point Arithmetic
  • Arithmetic Logic Unit Design
  • CPU's Data Path
  • CPU's Control Unit
  • Control Unit Design
  • Concepts of Pipelining
  • Computer Architecture Assessment 2
  • Pipeline Hazards
  • Memory Characteristics and Organization
  • Cache Memory
  • Virtual Memory
  • I/O Communication and I/O Controller
  • Input/Output Data Transfer
  • Direct Memory Access controller and I/O Processor
  • CPU Interrupts and Interrupt Handling
  • Computer Architecture Assessment 3

Course Computer Architecture

Digital computers store and process information in binary form as digital logic has only two values "1" and "0" or in other words "True or False" or also said as "ON or OFF". This system is called radix 2. We human generally deal with radix 10 i.e. decimal. As a matter of convenience there are many other representations like Octal (Radix 8), Hexadecimal (Radix 16), Binary coded decimal (BCD), Decimal etc.

Every computer's CPU has a width measured in terms of bits such as 8 bit CPU, 16 bit CPU, 32 bit CPU etc. Similarly, each memory location can store a fixed number of bits and is called memory width. Given the size of the CPU and Memory, it is for the programmer to handle his data representation. Most of the readers may be knowing that 4 bits form a Nibble, 8 bits form a byte. The word length is defined by the Instruction Set Architecture of the CPU. The word length may be equal to the width of the CPU.

The memory simply stores information as a binary pattern of 1's and 0's. It is to be interpreted as what the content of a memory location means. If the CPU is in the Fetch cycle, it interprets the fetched memory content to be instruction and decodes based on Instruction format. In the Execute cycle, the information from memory is considered as data. As a common man using a computer, we think computers handle English or other alphabets, special characters or numbers. A programmer considers memory content to be data types of the programming language he uses. Now recall figure 1.2 and 1.3 of chapter 1 to reinforce your thought that conversion happens from computer user interface to internal representation and storage.

  • Data Representation in Computers

Information handled by a computer is classified as instruction and data. A broad overview of the internal representation of the information is illustrated in figure 3.1. No matter whether it is data in a numeric or non-numeric form or integer, everything is internally represented in Binary. It is up to the programmer to handle the interpretation of the binary pattern and this interpretation is called Data Representation . These data representation schemes are all standardized by international organizations.

Choice of Data representation to be used in a computer is decided by

  • The number types to be represented (integer, real, signed, unsigned, etc.)
  • Range of values likely to be represented (maximum and minimum to be represented)
  • The Precision of the numbers i.e. maximum accuracy of representation (floating point single precision, double precision etc)
  • If non-numeric i.e. character, character representation standard to be chosen. ASCII, EBCDIC, UTF are examples of character representation standards.
  • The hardware support in terms of word width, instruction.

Before we go into the details, let us take an example of interpretation. Say a byte in Memory has value "0011 0001". Although there exists a possibility of so many interpretations as in figure 3.2, the program has only one interpretation as decided by the programmer and declared in the program.

  • Fixed point Number Representation

Fixed point numbers are also known as whole numbers or Integers. The number of bits used in representing the integer also implies the maximum number that can be represented in the system hardware. However for the efficiency of storage and operations, one may choose to represent the integer with one Byte, two Bytes, Four bytes or more. This space allocation is translated from the definition used by the programmer while defining a variable as integer short or long and the Instruction Set Architecture.

In addition to the bit length definition for integers, we also have a choice to represent them as below:

  • Unsigned Integer : A positive number including zero can be represented in this format. All the allotted bits are utilised in defining the number. So if one is using 8 bits to represent the unsigned integer, the range of values that can be represented is 28 i.e. "0" to "255". If 16 bits are used for representing then the range is 216 i.e. "0 to 65535".
  • Signed Integer : In this format negative numbers, zero, and positive numbers can be represented. A sign bit indicates the magnitude direction as positive or negative. There are three possible representations for signed integer and these are Sign Magnitude format, 1's Compliment format and 2's Complement format .

Signed Integer – Sign Magnitude format: Most Significant Bit (MSB) is reserved for indicating the direction of the magnitude (value). A "0" on MSB means a positive number and a "1" on MSB means a negative number. If n bits are used for representation, n-1 bits indicate the absolute value of the number. Examples for n=8:

Examples for n=8:

0010 1111 = + 47 Decimal (Positive number)

1010 1111 = - 47 Decimal (Negative Number)

0111 1110 = +126 (Positive number)

1111 1110 = -126 (Negative Number)

0000 0000 = + 0 (Postive Number)

1000 0000 = - 0 (Negative Number)

Although this method is easy to understand, Sign Magnitude representation has several shortcomings like

  • Zero can be represented in two ways causing redundancy and confusion.
  • The total range for magnitude representation is limited to 2n-1, although n bits were accounted.
  • The separate sign bit makes the addition and subtraction more complicated. Also, comparing two numbers is not straightforward.

Signed Integer – 1’s Complement format: In this format too, MSB is reserved as the sign bit. But the difference is in representing the Magnitude part of the value for negative numbers (magnitude) is inversed and hence called 1’s Complement form. The positive numbers are represented as it is in binary. Let us see some examples to better our understanding.

1101 0000 = - 47 Decimal (Negative Number)

1000 0001 = -126 (Negative Number)

1111 1111 = - 0 (Negative Number)

  • Converting a given binary number to its 2's complement form

Step 1 . -x = x' + 1 where x' is the one's complement of x.

Step 2 Extend the data width of the number, fill up with sign extension i.e. MSB bit is used to fill the bits.

Example: -47 decimal over 8bit representation

As you can see zero is not getting represented with redundancy. There is only one way of representing zero. The other problem of the complexity of the arithmetic operation is also eliminated in 2’s complement representation. Subtraction is done as Addition.

More exercises on number conversion are left to the self-interest of readers.

  • Floating Point Number system

The maximum number at best represented as a whole number is 2 n . In the Scientific world, we do come across numbers like Mass of an Electron is 9.10939 x 10-31 Kg. Velocity of light is 2.99792458 x 108 m/s. Imagine to write the number in a piece of paper without exponent and converting into binary for computer representation. Sure you are tired!!. It makes no sense to write a number in non- readable form or non- processible form. Hence we write such large or small numbers using exponent and mantissa. This is said to be Floating Point representation or real number representation. he real number system could have infinite values between 0 and 1.

Representation in computer

Unlike the two's complement representation for integer numbers, Floating Point number uses Sign and Magnitude representation for both mantissa and exponent . In the number 9.10939 x 1031, in decimal form, +31 is Exponent, 9.10939 is known as Fraction . Mantissa, Significand and fraction are synonymously used terms. In the computer, the representation is binary and the binary point is not fixed. For example, a number, say, 23.345 can be written as 2.3345 x 101 or 0.23345 x 102 or 2334.5 x 10-2. The representation 2.3345 x 101 is said to be in normalised form.

Floating-point numbers usually use multiple words in memory as we need to allot a sign bit, few bits for exponent and many bits for mantissa. There are standards for such allocation which we will see sooner.

  • IEEE 754 Floating Point Representation

We have two standards known as Single Precision and Double Precision from IEEE. These standards enable portability among different computers. Figure 3.3 picturizes Single precision while figure 3.4 picturizes double precision. Single Precision uses 32bit format while double precision is 64 bits word length. As the name suggests double precision can represent fractions with larger accuracy. In both the cases, MSB is sign bit for the mantissa part, followed by Exponent and Mantissa. The exponent part has its sign bit.

It is to be noted that in Single Precision, we can represent an exponent in the range -127 to +127. It is possible as a result of arithmetic operations the resulting exponent may not fit in. This situation is called overflow in the case of positive exponent and underflow in the case of negative exponent. The Double Precision format has 11 bits for exponent meaning a number as large as -1023 to 1023 can be represented. The programmer has to make a choice between Single Precision and Double Precision declaration using his knowledge about the data being handled.

The Floating Point operations on the regular CPU is very very slow. Generally, a special purpose CPU known as Co-processor is used. This Co-processor works in tandem with the main CPU. The programmer should be using the float declaration only if his data is in real number form. Float declaration is not to be used generously.

  • Decimal Numbers Representation

Decimal numbers (radix 10) are represented and processed in the system with the support of additional hardware. We deal with numbers in decimal format in everyday life. Some machines implement decimal arithmetic too, like floating-point arithmetic hardware. In such a case, the CPU uses decimal numbers in BCD (binary coded decimal) form and does BCD arithmetic operation. BCD operates on radix 10. This hardware operates without conversion to pure binary. It uses a nibble to represent a number in packed BCD form. BCD operations require not only special hardware but also decimal instruction set.

  • Exceptions and Error Detection

All of us know that when we do arithmetic operations, we get answers which have more digits than the operands (Ex: 8 x 2= 16). This happens in computer arithmetic operations too. When the result size exceeds the allotted size of the variable or the register, it becomes an error and exception. The exception conditions associated with numbers and number operations are Overflow, Underflow, Truncation, Rounding and Multiple Precision . These are detected by the associated hardware in arithmetic Unit. These exceptions apply to both Fixed Point and Floating Point operations. Each of these exceptional conditions has a flag bit assigned in the Processor Status Word (PSW). We may discuss more in detail in the later chapters.

  • Character Representation

Another data type is non-numeric and is largely character sets. We use a human-understandable character set to communicate with computer i.e. for both input and output. Standard character sets like EBCDIC and ASCII are chosen to represent alphabets, numbers and special characters. Nowadays Unicode standard is also in use for non-English language like Chinese, Hindi, Spanish, etc. These codes are accessible and available on the internet. Interested readers may access and learn more.

1. Track your progress [Earn 200 points]

Mark as complete

2. Provide your ratings to this chapter [Earn 100 points]

  • Engineering Mathematics
  • Discrete Mathematics
  • Operating System
  • Computer Networks
  • Digital Logic and Design
  • C Programming
  • Data Structures
  • Theory of Computation
  • Compiler Design
  • Computer Org and Architecture

Computer Organization and Architecture Tutorial

  • Basic Computer Instructions
  • What is Computer
  • Issues in Computer Design
  • Difference between assembly language and high level language
  • Addressing Modes
  • Difference between Memory based and Register based Addressing Modes
  • Computer Organization | Von Neumann architecture
  • Harvard Architecture
  • Interaction of a Program with Hardware
  • Simplified Instructional Computer (SIC)
  • Instruction Set used in simplified instructional Computer (SIC)
  • Instruction Set used in SIC/XE
  • RISC and CISC in Computer Organization
  • Vector processor classification
  • Essential Registers for Instruction Execution
  • Introduction of Single Accumulator based CPU organization
  • Introduction of Stack based CPU Organization
  • Machine Control Instructions in Microprocessor
  • Very Long Instruction Word (VLIW) Architecture
  • Input and Output Systems
  • Computer Organization | Different Instruction Cycles
  • Machine Instructions
  • Computer Organization | Instruction Formats (Zero, One, Two and Three Address Instruction)
  • Difference between 2-address instruction and 1-address instructions
  • Difference between 3-address instruction and 0-address instruction
  • Register content and Flag status after Instructions
  • Debugging a machine level program
  • Vector Instruction Format in Vector Processors
  • Vector instruction types
  • Instruction Design and Format
  • Introduction of ALU and Data Path
  • Computer Arithmetic | Set - 1
  • Computer Arithmetic | Set - 2
  • Difference between 1's Complement representation and 2's Complement representation Technique
  • Restoring Division Algorithm For Unsigned Integer
  • Non-Restoring Division For Unsigned Integer
  • Computer Organization | Booth's Algorithm
  • How the negative numbers are stored in memory?
  • Microprogrammed Control
  • Computer Organization | Micro-Operation
  • Microarchitecture and Instruction Set Architecture
  • Types of Program Control Instructions
  • Difference between CALL and JUMP instructions
  • Computer Organization | Hardwired v/s Micro-programmed Control Unit
  • Implementation of Micro Instructions Sequencer
  • Performance of Computer in Computer Organization
  • Introduction of Control Unit and its Design
  • Computer Organization | Amdahl's law and its proof
  • Subroutine, Subroutine nesting and Stack memory
  • Different Types of RAM (Random Access Memory )
  • Random Access Memory (RAM) and Read Only Memory (ROM)
  • 2D and 2.5D Memory organization

Input and Output Organization

  • Priority Interrupts | (S/W Polling and Daisy Chaining)
  • I/O Interface (Interrupt and DMA Mode)
  • Direct memory access with DMA controller 8257/8237
  • Computer Organization | Asynchronous input output synchronization
  • Programmable peripheral interface 8255
  • Synchronous Data Transfer in Computer Organization
  • Introduction of Input-Output Processor
  • MPU Communication in Computer Organization
  • Memory mapped I/O and Isolated I/O
  • Memory Organization
  • Introduction to memory and memory units
  • Memory Hierarchy Design and its Characteristics
  • Register Allocations in Code Generation
  • Cache Memory
  • Cache Organization | Set 1 (Introduction)
  • Multilevel Cache Organisation
  • Difference between RAM and ROM
  • What's difference between CPU Cache and TLB?
  • Introduction to Solid-State Drive (SSD)
  • Read and Write operations in Memory
  • Instruction Level Parallelism
  • Computer Organization and Architecture | Pipelining | Set 1 (Execution, Stages and Throughput)
  • Computer Organization and Architecture | Pipelining | Set 3 (Types and Stalling)
  • Computer Organization and Architecture | Pipelining | Set 2 (Dependencies and Data Hazard)
  • Last Minute Notes Computer Organization

COA GATE PYQ's AND COA Quiz

  • Computer Organization and Architecture
  • Digital Logic & Number representation
  • Number Representation
  • Microprocessor
  • GATE CS Preparation

Computer Organization and Architecture is used to design computer systems. Computer Architecture is considered to be those attributes of a system that are visible to the user like addressing techniques, instruction sets, and bits used for data, and have a direct impact on the logic execution of a program, It defines the system in an abstract manner, It deals with What does the system do.

Whereas, Computer Organization is the way in which a system has to structure and It is operational units and the interconnections between them that achieve the architectural specifications, It is the realization of the abstract model, and It deals with How to implement the system.

In this Computer Organization and Architecture Tutorial, you’ll learn all the basic to advanced concepts like pipelining, microprogrammed control, computer architecture, instruction design, and format.

Recent Articles on Computer Organisation

  • Computer Arithmetic
  • Miscellaneous
  • Quick Links

Basic Computer Instructions :

  • A simple understanding of Computer
  • Computer System Level Hierarchy
  • Computer Architecture and Computer Organization
  • Timing diagram of MOV Instruction in Microprocessor
  • Assembly language and High level language
  • Memory based Vs Register based addressing modes
  • Von Neumann architecture
  • Data Transfer instructions in AVR microcontroller
  • Arithmetic instructions in AVR microcontroller
  • Conditional Branch Instructions in AVR Microcontroller
  • CALL Instructions and Stack in AVR Microcontroller
  • Branch Instructions in AVR Microcontroller
  • Logical Instructions in AVR Microcontroller
  • Data Manipulation Instructions

Instruction Design and Format :

  • Different Instruction Cycles
  • Instruction Formats (Zero, One, Two and Three Address Instruction)
  • 2-address instruction and 1-address instructions
  • 3-address instruction and 0-address instruction
  • 3-address instruction and 2-address instructions

Computer Arithmetic :

  • Computer Arithmetic | ALU and Data Path
  • Computer Arithmetic | Set 1
  • Computer Arithmetic | Set 2
  • Difference between 1’s complement and 2’s complement
  • Booth’s Algorithm

Microprogrammed Control :

  • Micro-Operation
  • Hardwired v/s Micro-programmed Control Unit

Memory Organization :

  • What’s difference between CPU Cache and TLB?
  • Different Types of RAM
  • Types of computer memory (RAM and ROM)
  • Introduction to solid-state drive (SSD)

Input and Output Systems :

  • Asynchronous input output synchronization

Pipelining :

  • Execution, Stages and Throughput
  • Types and Stalling
  • Dependencies and Data Hazard

IEEE Number Statndards

Miscellaneous :

  • Generations of computer
  • Introduction to quantum computing
  • Conventional Computing vs Quantum Computing
  • Flynn’s taxonomy
  • Clusters In Computer Organisation
  • Program for Binary To Decimal Conversion
  • Program for Decimal to Binary Conversion
  • Program for decimal to octal conversion
  • Program for octal to decimal conversion
  • Program for hexadecimal to decimal

Quick Links :

  • ‘Quizzes’ on Computer Organization and Architecture !
  • ‘Practice Problems’ on Computer Organization and Architecture !

Please Login to comment...

Related articles.

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Essentials of Computer Organization and Architecture, 5th Edition by Linda Null, Julia Lobur

Get full access to Essentials of Computer Organization and Architecture, 5th Edition and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Data Representation in Computer Systems

There are 10 kinds of people in the world—those who understand binary and those who don’t. —Anonymous

2.1   INTRODUCTION

T he organization of any computer depends considerably on how it represents numbers, characters, and control information. The converse is also true: Standards and conventions established over the years have determined certain aspects of computer organization. This chapter describes the various ways in which computers can store and manipulate numbers and characters. The ideas presented in the following sections form the basis for understanding the organization and function of all types of digital systems.

The most basic unit of information in a digital computer is called ...

Get Essentials of Computer Organization and Architecture, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

data representation in computer architecture notes

data representation in computer architecture notes

  • Data representation

Bytes of memory

  • Abstract machine

Unsigned integer representation

Signed integer representation, pointer representation, array representation, compiler layout, array access performance, collection representation.

  • Consequences of size and alignment rules

Uninitialized objects

Pointer arithmetic, undefined behavior.

  • Computer arithmetic

Arena allocation

This course is about learning how computers work, from the perspective of systems software: what makes programs work fast or slow, and how properties of the machines we program impact the programs we write. We want to communicate ideas, tools, and an experimental approach.

The course divides into six units:

  • Assembly & machine programming
  • Storage & caching
  • Kernel programming
  • Process management
  • Concurrency

The first unit, data representation , is all about how different forms of data can be represented in terms the computer can understand.

Computer memory is kind of like a Lite Brite.

Lite Brite

A Lite Brite is big black backlit pegboard coupled with a supply of colored pegs, in a limited set of colors. You can plug in the pegs to make all kinds of designs. A computer’s memory is like a vast pegboard where each slot holds one of 256 different colors. The colors are numbered 0 through 255, so each slot holds one byte . (A byte is a number between 0 and 255, inclusive.)

A slot of computer memory is identified by its address . On a computer with M bytes of memory, and therefore M slots, you can think of the address as a number between 0 and M −1. My laptop has 16 gibibytes of memory, so M = 16×2 30 = 2 34 = 17,179,869,184 = 0x4'0000'0000 —a very large number!

The problem of data representation is the problem of representing all the concepts we might want to use in programming—integers, fractions, real numbers, sets, pictures, texts, buildings, animal species, relationships—using the limited medium of addresses and bytes.

Powers of ten and powers of two. Digital computers love the number two and all powers of two. The electronics of digital computers are based on the bit , the smallest unit of storage, which a base-two digit: either 0 or 1. More complicated objects are represented by collections of bits. This choice has many scale and error-correction advantages. It also refracts upwards to larger choices, and even into terminology. Memory chips, for example, have capacities based on large powers of two, such as 2 30 bytes. Since 2 10 = 1024 is pretty close to 1,000, 2 20 = 1,048,576 is pretty close to a million, and 2 30 = 1,073,741,824 is pretty close to a billion, it’s common to refer to 2 30 bytes of memory as “a giga byte,” even though that actually means 10 9 = 1,000,000,000 bytes. But for greater precision, there are terms that explicitly signal the use of powers of two. 2 30 is a gibibyte : the “-bi-” component means “binary.”
Virtual memory. Modern computers actually abstract their memory spaces using a technique called virtual memory . The lowest-level kind of address, called a physical address , really does take on values between 0 and M −1. However, even on a 16GiB machine like my laptop, the addresses we see in programs can take on values like 0x7ffe'ea2c'aa67 that are much larger than M −1 = 0x3'ffff'ffff . The addresses used in programs are called virtual addresses . They’re incredibly useful for protection: since different running programs have logically independent address spaces, it’s much less likely that a bug in one program will crash the whole machine. We’ll learn about virtual memory in much more depth in the kernel unit ; the distinction between virtual and physical addresses is not as critical for data representation.

Most programming languages prevent their users from directly accessing memory. But not C and C++! These languages let you access any byte of memory with a valid address. This is powerful; it is also very dangerous. But it lets us get a hands-on view of how computers really work.

C++ programs accomplish their work by constructing, examining, and modifying objects . An object is a region of data storage that contains a value, such as the integer 12. (The standard specifically says “a region of data storage in the execution environment, the contents of which can represent values”.) Memory is called “memory” because it remembers object values.

In this unit, we often use functions called hexdump to examine memory. These functions are defined in hexdump.cc . hexdump_object(x) prints out the bytes of memory that comprise an object named x , while hexdump(ptr, size) prints out the size bytes of memory starting at a pointer ptr .

For example, in datarep1/add.cc , we might use hexdump_object to examine the memory used to represent some integers:

This display reports that a , b , and c are each four bytes long; that a , b , and c are located at different, nonoverlapping addresses (the long hex number in the first column); and shows us how the numbers 1, 2, and 3 are represented in terms of bytes. (More on that later.)

The compiler, hardware, and standard together define how objects of different types map to bytes. Each object uses a contiguous range of addresses (and thus bytes), and objects never overlap (objects that are active simultaneously are always stored in distinct address ranges).

Since C and C++ are designed to help software interface with hardware devices, their standards are transparent about how objects are stored. A C++ program can ask how big an object is using the sizeof keyword. sizeof(T) returns the number of bytes in the representation of an object of type T , and sizeof(x) returns the size of object x . The result of sizeof is a value of type size_t , which is an unsigned integer type large enough to hold any representable size. On 64-bit architectures, such as x86-64 (our focus in this course), size_t can hold numbers between 0 and 2 64 –1.

Qualitatively different objects may have the same data representation. For example, the following three objects have the same data representation on x86-64, which you can verify using hexdump :

In C and C++, you can’t reliably tell the type of an object by looking at the contents of its memory. That’s why tricks like our different addf*.cc functions work.

An object can have many names. For example, here, local and *ptr refer to the same object:

The different names for an object are sometimes called aliases .

There are five objects here:

  • ch1 , a global variable
  • ch2 , a constant (non-modifiable) global variable
  • ch3 , a local variable
  • ch4 , a local variable
  • the anonymous storage allocated by new char and accessed by *ch4

Each object has a lifetime , which is called storage duration by the standard. There are three different kinds of lifetime.

  • static lifetime: The object lasts as long as the program runs. ( ch1 , ch2 )
  • automatic lifetime: The compiler allocates and destroys the object automatically as the program runs, based on the object’s scope (the region of the program in which it is meaningful). ( ch3 , ch4 )
  • dynamic lifetime: The programmer allocates and destroys the object explicitly. ( *allocated_ch )

Objects with dynamic lifetime aren’t easy to use correctly. Dynamic lifetime causes many serious problems in C programs, including memory leaks, use-after-free, double-free, and so forth. Those serious problems cause undefined behavior and play a “disastrously central role” in “our ongoing computer security nightmare” . But dynamic lifetime is critically important. Only with dynamic lifetime can you construct an object whose size isn’t known at compile time, or construct an object that outlives the function that created it.

The compiler and operating system work together to put objects at different addresses. A program’s address space (which is the range of addresses accessible to a program) divides into regions called segments . Objects with different lifetimes are placed into different segments. The most important segments are:

  • Code (also known as text or read-only data ). Contains instructions and constant global objects. Unmodifiable; static lifetime.
  • Data . Contains non-constant global objects. Modifiable; static lifetime.
  • Heap . Modifiable; dynamic lifetime.
  • Stack . Modifiable; automatic lifetime.

The compiler decides on a segment for each object based on its lifetime. The final compiler phase, which is called the linker , then groups all the program’s objects by segment (so, for instance, global variables from different compiler runs are grouped together into a single segment). Finally, when a program runs, the operating system loads the segments into memory. (The stack and heap segments grow on demand.)

We can use a program to investigate where objects with different lifetimes are stored. (See cs61-lectures/datarep2/mexplore0.cc .) This shows address ranges like this:

Constant global data and global data have the same lifetime, but are stored in different segments. The operating system uses different segments so it can prevent the program from modifying constants. It marks the code segment, which contains functions (instructions) and constant global data, as read-only, and any attempt to modify code-segment memory causes a crash (a “Segmentation violation”).

An executable is normally at least as big as the static-lifetime data (the code and data segments together). Since all that data must be in memory for the entire lifetime of the program, it’s written to disk and then loaded by the OS before the program starts running. There is an exception, however: the “bss” segment is used to hold modifiable static-lifetime data with initial value zero. Such data is common, since all static-lifetime data is initialized to zero unless otherwise specified in the program text. Rather than storing a bunch of zeros in the object files and executable, the compiler and linker simply track the location and size of all zero-initialized global data. The operating system sets this memory to zero during the program load process. Clearing memory is faster than loading data from disk, so this optimization saves both time (the program loads faster) and space (the executable is smaller).

Abstract machine and hardware

Programming involves turning an idea into hardware instructions. This transformation happens in multiple steps, some you control and some controlled by other programs.

First you have an idea , like “I want to make a flappy bird iPhone game.” The computer can’t (yet) understand that idea. So you transform the idea into a program , written in some programming language . This process is called programming.

A C++ program actually runs on an abstract machine . The behavior of this machine is defined by the C++ standard , a technical document. This document is supposed to be so precisely written as to have an exact mathematical meaning, defining exactly how every C++ program behaves. But the document can’t run programs!

C++ programs run on hardware (mostly), and the hardware determines what behavior we see. Mapping abstract machine behavior to instructions on real hardware is the task of the C++ compiler (and the standard library and operating system). A C++ compiler is correct if and only if it translates each correct program to instructions that simulate the expected behavior of the abstract machine.

This same rough series of transformations happens for any programming language, although some languages use interpreters rather than compilers.

A bit is the fundamental unit of digital information: it’s either 0 or 1.

C++ manages memory in units of bytes —8 contiguous bits that together can represent numbers between 0 and 255. C’s unit for a byte is char : the abstract machine says a byte is stored in char . That means an unsigned char holds values in the inclusive range [0, 255].

The C++ standard actually doesn’t require that a byte hold 8 bits, and on some crazy machines from decades ago , bytes could hold nine bits! (!?)

But larger numbers, such as 258, don’t fit in a single byte. To represent such numbers, we must use multiple bytes. The abstract machine doesn’t specify exactly how this is done—it’s the compiler and hardware’s job to implement a choice. But modern computers always use place–value notation , just like in decimal numbers. In decimal, the number 258 is written with three digits, the meanings of which are determined both by the digit and by their place in the overall number:

\[ 258 = 2\times10^2 + 5\times10^1 + 8\times10^0 \]

The computer uses base 256 instead of base 10. Two adjacent bytes can represent numbers between 0 and \(255\times256+255 = 65535 = 2^{16}-1\) , inclusive. A number larger than this would take three or more bytes.

\[ 258 = 1\times256^1 + 2\times256^0 \]

On x86-64, the ones place, the least significant byte, is on the left, at the lowest address in the contiguous two-byte range used to represent the integer. This is the opposite of how decimal numbers are written: decimal numbers put the most significant digit on the left. The representation choice of putting the least-significant byte in the lowest address is called little-endian representation. x86-64 uses little-endian representation.

Some computers actually store multi-byte integers the other way, with the most significant byte stored in the lowest address; that’s called big-endian representation. The Internet’s fundamental protocols, such as IP and TCP, also use big-endian order for multi-byte integers, so big-endian is also called “network” byte order.

The C++ standard defines five fundamental unsigned integer types, along with relationships among their sizes. Here they are, along with their actual sizes and ranges on x86-64:

Other architectures and operating systems implement different ranges for these types. For instance, on IA32 machines like Intel’s Pentium (the 32-bit processors that predated x86-64), sizeof(long) was 4, not 8.

Note that all values of a smaller unsigned integer type can fit in any larger unsigned integer type. When a value of a larger unsigned integer type is placed in a smaller unsigned integer object, however, not every value fits; for instance, the unsigned short value 258 doesn’t fit in an unsigned char x . When this occurs, the C++ abstract machine requires that the smaller object’s value equals the least -significant bits of the larger value (so x will equal 2).

In addition to these types, whose sizes can vary, C++ has integer types whose sizes are fixed. uint8_t , uint16_t , uint32_t , and uint64_t define 8-bit, 16-bit, 32-bit, and 64-bit unsigned integers, respectively; on x86-64, these correspond to unsigned char , unsigned short , unsigned int , and unsigned long .

This general procedure is used to represent a multi-byte integer in memory.

  • Write the large integer in hexadecimal format, including all leading zeros required by the type size. For example, the unsigned value 65534 would be written 0x0000FFFE . There will be twice as many hexadecimal digits as sizeof(TYPE) .
  • Divide the integer into its component bytes, which are its digits in base 256. In our example, they are, from most to least significant, 0x00, 0x00, 0xFF, and 0xFE.

In little-endian representation, the bytes are stored in memory from least to most significant. If our example was stored at address 0x30, we would have:

In big-endian representation, the bytes are stored in the reverse order.

Computers are often fastest at dealing with fixed-length numbers, rather than variable-length numbers, and processor internals are organized around a fixed word size . A word is the natural unit of data used by a processor design . In most modern processors, this natural unit is 8 bytes or 64 bits , because this is the power-of-two number of bytes big enough to hold those processors’ memory addresses. Many older processors could access less memory and had correspondingly smaller word sizes, such as 4 bytes (32 bits).

The best representation for signed integers—and the choice made by x86-64, and by the C++20 abstract machine—is two’s complement . Two’s complement representation is based on this principle: Addition and subtraction of signed integers shall use the same instructions as addition and subtraction of unsigned integers.

To see what this means, let’s think about what -x should mean when x is an unsigned integer. Wait, negative unsigned?! This isn’t an oxymoron because C++ uses modular arithmetic for unsigned integers: the result of an arithmetic operation on unsigned values is always taken modulo 2 B , where B is the number of bits in the unsigned value type. Thus, on x86-64,

-x is simply the number that, when added to x , yields 0 (mod 2 B ). For example, when unsigned x = 0xFFFFFFFFU , then -x == 1U , since x + -x equals zero (mod 2 32 ).

To obtain -x , we flip all the bits in x (an operation written ~x ) and then add 1. To see why, consider the bit representations. What is x + (~x + 1) ? Well, (~x) i (the i th bit of ~x ) is 1 whenever x i is 0, and vice versa. That means that every bit of x + ~x is 1 (there are no carries), and x + ~x is the largest unsigned integer, with value 2 B -1. If we add 1 to this, we get 2 B . Which is 0 (mod 2 B )! The highest “carry” bit is dropped, leaving zero.

Two’s complement arithmetic uses half of the unsigned integer representations for negative numbers. A two’s-complement signed integer with B bits has the following values:

  • If the most-significant bit is 1, the represented number is negative. Specifically, the represented number is – (~x + 1) , where the outer negative sign is mathematical negation (not computer arithmetic).
  • If every bit is 0, the represented number is 0.
  • If the most-significant but is 0 but some other bit is 1, the represented number is positive.

The most significant bit is also called the sign bit , because if it is 1, then the represented value depends on the signedness of the type (and that value is negative for signed types).

Another way to think about two’s-complement is that, for B -bit integers, the most-significant bit has place value 2 B –1 in unsigned arithmetic and negative 2 B –1 in signed arithmetic. All other bits have the same place values in both kinds of arithmetic.

The two’s-complement bit pattern for x + y is the same whether x and y are considered as signed or unsigned values. For example, in 4-bit arithmetic, 5 has representation 0b0101 , while the representation 0b1100 represents 12 if unsigned and –4 if signed ( ~0b1100 + 1 = 0b0011 + 1 == 4). Let’s add those bit patterns and see what we get:

Note that this is the right answer for both signed and unsigned arithmetic : 5 + 12 = 17 = 1 (mod 16), and 5 + -4 = 1.

Subtraction and multiplication also produce the same results for unsigned arithmetic and signed two’s-complement arithmetic. (For instance, 5 * 12 = 60 = 12 (mod 16), and 5 * -4 = -20 = -4 (mod 16).) This is not true of division. (Consider dividing the 4-bit representation 0b1110 by 2. In signed arithmetic, 0b1110 represents -2, so 0b1110/2 == 0b1111 (-1); but in unsigned arithmetic, 0b1110 is 14, so 0b1110/2 == 0b0111 (7).) And, of course, it is not true of comparison. In signed 4-bit arithmetic, 0b1110 < 0 , but in unsigned 4-bit arithmetic, 0b1110 > 0 . This means that a C compiler for a two’s-complement machine can use a single add instruction for either signed or unsigned numbers, but it must generate different instruction patterns for signed and unsigned division (or less-than, or greater-than).

There are a couple quirks with C signed arithmetic. First, in two’s complement, there are more negative numbers than positive numbers. A representation with sign bit is 1, but every other bit 0, has no positive counterpart at the same bit width: for this number, -x == x . (In 4-bit arithmetic, -0b1000 == ~0b1000 + 1 == 0b0111 + 1 == 0b1000 .) Second, and far worse, is that arithmetic overflow on signed integers is undefined behavior .

The C++ abstract machine requires that signed integers have the same sizes as their unsigned counterparts.

We distinguish pointers , which are concepts in the C abstract machine, from addresses , which are hardware concepts. A pointer combines an address and a type.

The memory representation of a pointer is the same as the representation of its address value. The size of that integer is the machine’s word size; for example, on x86-64, a pointer occupies 8 bytes, and a pointer to an object located at address 0x400abc would be stored as:

The C++ abstract machine defines an unsigned integer type uintptr_t that can hold any address. (You have to #include <inttypes.h> or <cinttypes> to get the definition.) On most machines, including x86-64, uintptr_t is the same as unsigned long . Cast a pointer to an integer address value with syntax like (uintptr_t) ptr ; cast back to a pointer with syntax like (T*) addr . Casts between pointer types and uintptr_t are information preserving, so this assertion will never fail:

Since it is a 64-bit architecture, the size of an x86-64 address is 64 bits (8 bytes). That’s also the size of x86-64 pointers.

To represent an array of integers, C++ and C allocate the integers next to each other in memory, in sequential addresses, with no gaps or overlaps. Here, we put the integers 0, 1, and 258 next to each other, starting at address 1008:

Say that you have an array of N integers, and you access each of those integers in order, accessing each integer exactly once. Does the order matter?

Computer memory is random-access memory (RAM), which means that a program can access any bytes of memory in any order—it’s not, for example, required to read memory in ascending order by address. But if we run experiments, we can see that even in RAM, different access orders have very different performance characteristics.

Our arraysum program sums up all the integers in an array of N integers, using an access order based on its arguments, and prints the resulting delay. Here’s the result of a couple experiments on accessing 10,000,000 items in three orders, “up” order (sequential: elements 0, 1, 2, 3, …), “down” order (reverse sequential: N , N −1, N −2, …), and “random” order (as it sounds).

Wow! Down order is just a bit slower than up, but random order seems about 40 times slower. Why?

Random order is defeating many of the internal architectural optimizations that make memory access fast on modern machines. Sequential order, since it’s more predictable, is much easier to optimize.

Foreshadowing. This part of the lecture is a teaser for the Storage unit, where we cover access patterns and caching, including the processor caches that explain this phenomenon, in much more depth.

The C++ programming language offers several collection mechanisms for grouping subobjects together into new kinds of object. The collections are arrays, structs, and unions. (Classes are a kind of struct. All library types, such as vectors, lists, and hash tables, use combinations of these collection types.) The abstract machine defines how subobjects are laid out inside a collection. This is important, because it lets C/C++ programs exchange messages with hardware and even with programs written in other languages: messages can be exchanged only when both parties agree on layout.

Array layout in C++ is particularly simple: The objects in an array are laid out sequentially in memory, with no gaps or overlaps. Assume a declaration like T x[N] , where x is an array of N objects of type T , and say that the address of x is a . Then the address of element x[i] equals a + i * sizeof(T) , and sizeof(a) == N * sizeof(T) .

Sidebar: Vector representation

The C++ library type std::vector defines an array that can grow and shrink. For instance, this function creates a vector containing the numbers 0 up to N in sequence:

Here, v is an object with automatic lifetime. This means its size (in the sizeof sense) is fixed at compile time. Remember that the sizes of static- and automatic-lifetime objects must be known at compile time; only dynamic-lifetime objects can have varying size based on runtime parameters. So where and how are v ’s contents stored?

The C++ abstract machine requires that v ’s elements are stored in an array in memory. (The v.data() method returns a pointer to the first element of the array.) But it does not define std::vector ’s layout otherwise, and C++ library designers can choose different layouts based on their needs. We found these to hold for the std::vector in our library:

sizeof(v) == 24 for any vector of any type, and the address of v is a stack address (i.e., v is located in the stack segment).

The first 8 bytes of the vector hold the address of the first element of the contents array—call it the begin address . This address is a heap address, which is as expected, since the contents must have dynamic lifetime. The value of the begin address is the same as that of v.data() .

Bytes 8–15 hold the address just past the contents array—call it the end address . Its value is the same as &v.data()[v.size()] . If the vector is empty, then the begin address and the end address are the same.

Bytes 16–23 hold an address greater than or equal to the end address. This is the capacity address . As a vector grows, it will sometimes outgrow its current location and move its contents to new memory addresses. To reduce the number of copies, vectors usually to request more memory from the operating system than they immediately need; this additional space, which is called “capacity,” supports cheap growth. Often the capacity doubles on each growth spurt, since this allows operations like v.push_back() to execute in O (1) time on average.

Compilers must also decide where different objects are stored when those objects are not part of a collection. For instance, consider this program:

The abstract machine says these objects cannot overlap, but does not otherwise constrain their positions in memory.

On Linux, GCC will put all these variables into the stack segment, which we can see using hexdump . But it can put them in the stack segment in any order , as we can see by reordering the declarations (try declaration order i1 , c1 , i2 , c2 , c3 ), by changing optimization levels, or by adding different scopes (braces). The abstract machine gives the programmer no guarantees about how object addresses relate. In fact, the compiler may move objects around during execution, as long as it ensures that the program behaves according to the abstract machine. Modern optimizing compilers often do this, particularly for automatic objects.

But what order does the compiler choose? With optimization disabled, the compiler appears to lay out objects in decreasing order by declaration, so the first declared variable in the function has the highest address. With optimization enabled, the compiler follows roughly the same guideline, but it also rearranges objects by type—for instance, it tends to group char s together—and it can reuse space if different variables in the same function have disjoint lifetimes. The optimizing compiler tends to use less space for the same set of variables. This is because it’s arranging objects by alignment.

The C++ compiler and library restricts the addresses at which some kinds of data appear. In particular, the address of every int value is always a multiple of 4, whether it’s located on the stack (automatic lifetime), the data segment (static lifetime), or the heap (dynamic lifetime).

A bunch of observations will show you these rules:

These are the alignment restrictions for an x86-64 Linux machine.

These restrictions hold for most x86-64 operating systems, except that on Windows, the long type has size and alignment 4. (The long long type has size and alignment 8 on all x86-64 operating systems.)

Just like every type has a size, every type has an alignment. The alignment of a type T is a number a ≥1 such that the address of every object of type T must be a multiple of a . Every object with type T has size sizeof(T) —it occupies sizeof(T) contiguous bytes of memory; and has alignment alignof(T) —the address of its first byte is a multiple of alignof(T) . You can also say sizeof(x) and alignof(x) where x is the name of an object or another expression.

Alignment restrictions can make hardware simpler, and therefore faster. For instance, consider cache blocks. CPUs access memory through a transparent hardware cache. Data moves from primary memory, or RAM (which is large—a couple gigabytes on most laptops—and uses cheaper, slower technology) to the cache in units of 64 or 128 bytes. Those units are always aligned: on a machine with 128-byte cache blocks, the bytes with memory addresses [127, 128, 129, 130] live in two different cache blocks (with addresses [0, 127] and [128, 255]). But the 4 bytes with addresses [4n, 4n+1, 4n+2, 4n+3] always live in the same cache block. (This is true for any small power of two: the 8 bytes with addresses [8n,…,8n+7] always live in the same cache block.) In general, it’s often possible to make a system faster by leveraging restrictions—and here, the CPU hardware can load data faster when it can assume that the data lives in exactly one cache line.

The compiler, library, and operating system all work together to enforce alignment restrictions.

On x86-64 Linux, alignof(T) == sizeof(T) for all fundamental types (the types built in to C: integer types, floating point types, and pointers). But this isn’t always true; on x86-32 Linux, double has size 8 but alignment 4.

It’s possible to construct user-defined types of arbitrary size, but the largest alignment required by a machine is fixed for that machine. C++ lets you find the maximum alignment for a machine with alignof(std::max_align_t) ; on x86-64, this is 16, the alignment of the type long double (and the alignment of some less-commonly-used SIMD “vector” types ).

We now turn to the abstract machine rules for laying out all collections. The sizes and alignments for user-defined types—arrays, structs, and unions—are derived from a couple simple rules or principles. Here they are. The first rule applies to all types.

1. First-member rule. The address of the first member of a collection equals the address of the collection.

Thus, the address of an array is the same as the address of its first element. The address of a struct is the same as the address of the first member of the struct.

The next three rules depend on the class of collection. Every C abstract machine enforces these rules.

2. Array rule. Arrays are laid out sequentially as described above.

3. Struct rule. The second and subsequent members of a struct are laid out in order, with no overlap, subject to alignment constraints.

4. Union rule. All members of a union share the address of the union.

In C, every struct follows the struct rule, but in C++, only simple structs follow the rule. Complicated structs, such as structs with some public and some private members, or structs with virtual functions, can be laid out however the compiler chooses. The typical situation is that C++ compilers for a machine architecture (e.g., “Linux x86-64”) will all agree on a layout procedure for complicated structs. This allows code compiled by different compilers to interoperate.

That next rule defines the operation of the malloc library function.

5. Malloc rule. Any non-null pointer returned by malloc has alignment appropriate for any type. In other words, assuming the allocated size is adequate, the pointer returned from malloc can safely be cast to T* for any T .

Oddly, this holds even for small allocations. The C++ standard (the abstract machine) requires that malloc(1) return a pointer whose alignment is appropriate for any type, including types that don’t fit.

And the final rule is not required by the abstract machine, but it’s how sizes and alignments on our machines work.

6. Minimum rule. The sizes and alignments of user-defined types, and the offsets of struct members, are minimized within the constraints of the other rules.

The minimum rule, and the sizes and alignments of basic types, are defined by the x86-64 Linux “ABI” —its Application Binary Interface. This specification standardizes how x86-64 Linux C compilers should behave, and lets users mix and match compilers without problems.

Consequences of the size and alignment rules

From these rules we can derive some interesting consequences.

First, the size of every type is a multiple of its alignment .

To see why, consider an array with two elements. By the array rule, these elements have addresses a and a+sizeof(T) , where a is the address of the array. Both of these addresses contain a T , so they are both a multiple of alignof(T) . That means sizeof(T) is also a multiple of alignof(T) .

We can also characterize the sizes and alignments of different collections .

  • The size of an array of N elements of type T is N * sizeof(T) : the sum of the sizes of its elements. The alignment of the array is alignof(T) .
  • The size of a union is the maximum of the sizes of its components (because the union can only hold one component at a time). Its alignment is also the maximum of the alignments of its components.
  • The size of a struct is at least as big as the sum of the sizes of its components. Its alignment is the maximum of the alignments of its components.

Thus, the alignment of every collection equals the maximum of the alignments of its components.

It’s also true that the alignment equals the least common multiple of the alignments of its components. You might have thought lcm was a better answer, but the max is the same as the lcm for every architecture that matters, because all fundamental alignments are powers of two.

The size of a struct might be larger than the sum of the sizes of its components, because of alignment constraints. Since the compiler must lay out struct components in order, and it must obey the components’ alignment constraints, and it must ensure different components occupy disjoint addresses, it must sometimes introduce extra space in structs. Here’s an example: the struct will have 3 bytes of padding after char c , to ensure that int i2 has the correct alignment.

Thanks to padding, reordering struct components can sometimes reduce the total size of a struct. Padding can happen at the end of a struct as well as the middle. Padding can never happen at the start of a struct, however (because of Rule 1).

The rules also imply that the offset of any struct member —which is the difference between the address of the member and the address of the containing struct— is a multiple of the member’s alignment .

To see why, consider a struct s with member m at offset o . The malloc rule says that any pointer returned from malloc is correctly aligned for s . Every pointer returned from malloc is maximally aligned, equalling 16*x for some integer x . The struct rule says that the address of m , which is 16*x + o , is correctly aligned. That means that 16*x + o = alignof(m)*y for some integer y . Divide both sides by a = alignof(m) and you see that 16*x/a + o/a = y . But 16/a is an integer—the maximum alignment is a multiple of every alignment—so 16*x/a is an integer. We can conclude that o/a must also be an integer!

Finally, we can also derive the necessity for padding at the end of structs. (How?)

What happens when an object is uninitialized? The answer depends on its lifetime.

  • static lifetime (e.g., int global; at file scope): The object is initialized to 0.
  • automatic or dynamic lifetime (e.g., int local; in a function, or int* ptr = new int ): The object is uninitialized and reading the object’s value before it is assigned causes undefined behavior.

Compiler hijinks

In C++, most dynamic memory allocation uses special language operators, new and delete , rather than library functions.

Though this seems more complex than the library-function style, it has advantages. A C compiler cannot tell what malloc and free do (especially when they are redefined to debugging versions, as in the problem set), so a C compiler cannot necessarily optimize calls to malloc and free away. But the C++ compiler may assume that all uses of new and delete follow the rules laid down by the abstract machine. That means that if the compiler can prove that an allocation is unnecessary or unused, it is free to remove that allocation!

For example, we compiled this program in the problem set environment (based on test003.cc ):

The optimizing C++ compiler removes all calls to new and delete , leaving only the call to m61_printstatistics() ! (For instance, try objdump -d testXXX to look at the compiled x86-64 instructions.) This is valid because the compiler is explicitly allowed to eliminate unused allocations, and here, since the ptrs variable is local and doesn’t escape main , all allocations are unused. The C compiler cannot perform this useful transformation. (But the C compiler can do other cool things, such as unroll the loops .)

One of C’s more interesting choices is that it explicitly relates pointers and arrays. Although arrays are laid out in memory in a specific way, they generally behave like pointers when they are used. This property probably arose from C’s desire to explicitly model memory as an array of bytes, and it has beautiful and confounding effects.

We’ve already seen one of these effects. The hexdump function has this signature (arguments and return type):

But we can just pass an array as argument to hexdump :

When used in an expression like this—here, as an argument—the array magically changes into a pointer to its first element. The above call has the same meaning as this:

C programmers transition between arrays and pointers very naturally.

A confounding effect is that unlike all other types, in C arrays are passed to and returned from functions by reference rather than by value. C is a call-by-value language except for arrays. This means that all function arguments and return values are copied, so that parameter modifications inside a function do not affect the objects passed by the caller—except for arrays. For instance: void f ( int a[ 2 ]) { a[ 0 ] = 1 ; } int main () { int x[ 2 ] = { 100 , 101 }; f(x); printf( "%d \n " , x[ 0 ]); // prints 1! } If you don’t like this behavior, you can get around it by using a struct or a C++ std::array . #include <array> struct array1 { int a[ 2 ]; }; void f1 (array1 arg) { arg.a[ 0 ] = 1 ; } void f2 (std :: array < int , 2 > a) { a[ 0 ] = 1 ; } int main () { array1 x = {{ 100 , 101 }}; f1(x); printf( "%d \n " , x.a[ 0 ]); // prints 100 std :: array < int , 2 > x2 = { 100 , 101 }; f2(x2); printf( "%d \n " , x2[ 0 ]); // prints 100 }

C++ extends the logic of this array–pointer correspondence to support arithmetic on pointers as well.

Pointer arithmetic rule. In the C abstract machine, arithmetic on pointers produces the same result as arithmetic on the corresponding array indexes.

Specifically, consider an array T a[n] and pointers T* p1 = &a[i] and T* p2 = &a[j] . Then:

Equality : p1 == p2 if and only if (iff) p1 and p2 point to the same address, which happens iff i == j .

Inequality : Similarly, p1 != p2 iff i != j .

Less-than : p1 < p2 iff i < j .

Also, p1 <= p2 iff i <= j ; and p1 > p2 iff i > j ; and p1 >= p2 iff i >= j .

Pointer difference : What should p1 - p2 mean? Using array indexes as the basis, p1 - p2 == i - j . (But the type of the difference is always ptrdiff_t , which on x86-64 is long , the signed version of size_t .)

Addition : p1 + k (where k is an integer type) equals the pointer &a[i + k] . ( k + p1 returns the same thing.)

Subtraction : p1 - k equals &a[i - k] .

Increment and decrement : ++p1 means p1 = p1 + 1 , which means p1 = &a[i + 1] . Similarly, --p1 means p1 = &a[i - 1] . (There are also postfix versions, p1++ and p1-- , but C++ style prefers the prefix versions.)

No other arithmetic operations on pointers are allowed. You can’t multiply pointers, for example. (You can multiply addresses by casting the pointers to the address type, uintptr_t —so (uintptr_t) p1 * (uintptr_t) p2 —but why would you?)

From pointers to iterators

Let’s write a function that can sum all the integers in an array.

This function can compute the sum of the elements of any int array. But because of the pointer–array relationship, its a argument is really a pointer . That allows us to call it with subarrays as well as with whole arrays. For instance:

This way of thinking about arrays naturally leads to a style that avoids sizes entirely, using instead a sentinel or boundary argument that defines the end of the interesting part of the array.

These expressions compute the same sums as the above:

Note that the data from first to last forms a half-open range . iIn mathematical notation, we care about elements in the range [first, last) : the element pointed to by first is included (if it exists), but the element pointed to by last is not. Half-open ranges give us a simple and clear way to describe empty ranges, such as zero-element arrays: if first == last , then the range is empty.

Note that given a ten-element array a , the pointer a + 10 can be formed and compared, but must not be dereferenced—the element a[10] does not exist. The C/C++ abstract machines allow users to form pointers to the “one-past-the-end” boundary elements of arrays, but users must not dereference such pointers.

So in C, two pointers naturally express a range of an array. The C++ standard template library, or STL, brilliantly abstracts this pointer notion to allow two iterators , which are pointer-like objects, to express a range of any standard data structure—an array, a vector, a hash table, a balanced tree, whatever. This version of sum works for any container of int s; notice how little it changed:

Some example uses:

Addresses vs. pointers

What’s the difference between these expressions? (Again, a is an array of type T , and p1 == &a[i] and p2 == &a[j] .)

The first expression is defined analogously to index arithmetic, so d1 == i - j . But the second expression performs the arithmetic on the addresses corresponding to those pointers. We will expect d2 to equal sizeof(T) * d1 . Always be aware of which kind of arithmetic you’re using. Generally arithmetic on pointers should not involve sizeof , since the sizeof is included automatically according to the abstract machine; but arithmetic on addresses almost always should involve sizeof .

Although C++ is a low-level language, the abstract machine is surprisingly strict about which pointers may be formed and how they can be used. Violate the rules and you’re in hell because you have invoked the dreaded undefined behavior .

Given an array a[N] of N elements of type T :

Forming a pointer &a[i] (or a + i ) with 0 ≤ i ≤ N is safe.

Forming a pointer &a[i] with i < 0 or i > N causes undefined behavior.

Dereferencing a pointer &a[i] with 0 ≤ i < N is safe.

Dereferencing a pointer &a[i] with i < 0 or i ≥ N causes undefined behavior.

(For the purposes of these rules, objects that are not arrays count as single-element arrays. So given T x , we can safely form &x and &x + 1 and dereference &x .)

What “undefined behavior” means is horrible. A program that executes undefined behavior is erroneous. But the compiler need not catch the error. In fact, the abstract machine says anything goes : undefined behavior is “behavior … for which this International Standard imposes no requirements.” “Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).” Other possible behaviors include allowing hackers from the moon to steal all of a program’s data, take it over, and force it to delete the hard drive on which it is running. Once undefined behavior executes, a program may do anything, including making demons fly out of the programmer’s nose.

Pointer arithmetic, and even pointer comparisons, are also affected by undefined behavior. It’s undefined to go beyond and array’s bounds using pointer arithmetic. And pointers may be compared for equality or inequality even if they point to different arrays or objects, but if you try to compare different arrays via less-than, like this:

that causes undefined behavior.

If you really want to compare pointers that might be to different arrays—for instance, you’re writing a hash function for arbitrary pointers—cast them to uintptr_t first.

Undefined behavior and optimization

A program that causes undefined behavior is not a C++ program . The abstract machine says that a C++ program, by definition, is a program whose behavior is always defined. The C++ compiler is allowed to assume that its input is a C++ program. (Obviously!) So the compiler can assume that its input program will never cause undefined behavior. Thus, since undefined behavior is “impossible,” if the compiler can prove that a condition would cause undefined behavior later, it can assume that condition will never occur.

Consider this program:

If we supply a value equal to (char*) -1 , we’re likely to see output like this:

with no assertion failure! But that’s an apparently impossible result. The printout can only happen if x + 1 > x (otherwise, the assertion will fail and stop the printout). But x + 1 , which equals 0 , is less than x , which is the largest 8-byte value!

The impossible happens because of undefined behavior reasoning. When the compiler sees an expression like x + 1 > x (with x a pointer), it can reason this way:

“Ah, x + 1 . This must be a pointer into the same array as x (or it might be a boundary pointer just past that array, or just past the non-array object x ). This must be so because forming any other pointer would cause undefined behavior.

“The pointer comparison is the same as an index comparison. x + 1 > x means the same thing as &x[1] > &x[0] . But that holds iff 1 > 0 .

“In my infinite wisdom, I know that 1 > 0 . Thus x + 1 > x always holds, and the assertion will never fail.

“My job is to make this code run fast. The fastest code is code that’s not there. This assertion will never fail—might as well remove it!”

Integer undefined behavior

Arithmetic on signed integers also has important undefined behaviors. Signed integer arithmetic must never overflow. That is, the compiler may assume that the mathematical result of any signed arithmetic operation, such as x + y (with x and y both int ), can be represented inside the relevant type. It causes undefined behavior, therefore, to add 1 to the maximum positive integer. (The ubexplore.cc program demonstrates how this can produce impossible results, as with pointers.)

Arithmetic on unsigned integers is much safer with respect to undefined behavior. Unsigned integers are defined to perform arithmetic modulo their size. This means that if you add 1 to the maximum positive unsigned integer, the result will always be zero.

Dividing an integer by zero causes undefined behavior whether or not the integer is signed.

Sanitizers, which in our makefiles are turned on by supplying SAN=1 , can catch many undefined behaviors as soon as they happen. Sanitizers are built in to the compiler itself; a sanitizer involves cooperation between the compiler and the language runtime. This has the major performance advantage that the compiler introduces exactly the required checks, and the optimizer can then use its normal analyses to remove redundant checks.

That said, undefined behavior checking can still be slow. Undefined behavior allows compilers to make assumptions about input values, and those assumptions can directly translate to faster code. Turning on undefined behavior checking can make some benchmark programs run 30% slower [link] .

Signed integer undefined behavior

File cs61-lectures/datarep5/ubexplore2.cc contains the following program.

What will be printed if we run the program with ./ubexplore2 0x7ffffffe 0x7fffffff ?

0x7fffffff is the largest positive value can be represented by type int . Adding one to this value yields 0x80000000 . In two's complement representation this is the smallest negative number represented by type int .

Assuming that the program behaves this way, then the loop exit condition i > n2 can never be met, and the program should run (and print out numbers) forever.

However, if we run the optimized version of the program, it prints only two numbers and exits:

The unoptimized program does print forever and never exits.

What’s going on here? We need to look at the compiled assembly of the program with and without optimization (via objdump -S ).

The unoptimized version basically looks like this:

This is a pretty direct translation of the loop.

The optimized version, though, does it differently. As always, the optimizer has its own ideas. (Your compiler may produce different results!)

The compiler changed the source’s less than or equal to comparison, i <= n2 , into a not equal to comparison in the executable, i != n2 + 1 (in both cases using signed computer arithmetic, i.e., modulo 2 32 )! The comparison i <= n2 will always return true when n2 == 0x7FFFFFFF , the maximum signed integer, so the loop goes on forever. But the i != n2 + 1 comparison does not always return true when n2 == 0x7FFFFFFF : when i wraps around to 0x80000000 (the smallest negative integer), then i equals n2 + 1 (which also wrapped), and the loop stops.

Why did the compiler make this transformation? In the original loop, the step-6 jump is immediately followed by another comparison and jump in steps 1 and 2. The processor jumps all over the place, which can confuse its prediction circuitry and slow down performance. In the transformed loop, the step-7 jump is never followed by a comparison and jump; instead, step 7 goes back to step 4, which always prints the current number. This more streamlined control flow is easier for the processor to make fast.

But the streamlined control flow is only a valid substitution under the assumption that the addition n2 + 1 never overflows . Luckily (sort of), signed arithmetic overflow causes undefined behavior, so the compiler is totally justified in making that assumption!

Programs based on ubexplore2 have demonstrated undefined behavior differences for years, even as the precise reasons why have changed. In some earlier compilers, we found that the optimizer just upgraded the int s to long s—arithmetic on long s is just as fast on x86-64 as arithmetic on int s, since x86-64 is a 64-bit architecture, and sometimes using long s for everything lets the compiler avoid conversions back and forth. The ubexplore2l program demonstrates this form of transformation: since the loop variable is added to a long counter, the compiler opportunistically upgrades i to long as well. This transformation is also only valid under the assumption that i + 1 will not overflow—which it can’t, because of undefined behavior.

Using unsigned type prevents all this undefined behavior, because arithmetic overflow on unsigned integers is well defined in C/C++. The ubexplore2u.cc file uses an unsigned loop index and comparison, and ./ubexplore2u and ./ubexplore2u.noopt behave exactly the same (though you have to give arguments like ./ubexplore2u 0xfffffffe 0xffffffff to see the overflow).

Computer arithmetic and bitwise operations

Basic bitwise operators.

Computers offer not only the usual arithmetic operators like + and - , but also a set of bitwise operators. The basic ones are & (and), | (or), ^ (xor/exclusive or), and the unary operator ~ (complement). In truth table form:

In C or C++, these operators work on integers. But they work bitwise: the result of an operation is determined by applying the operation independently at each bit position. Here’s how to compute 12 & 4 in 4-bit unsigned arithmetic:

These basic bitwise operators simplify certain important arithmetics. For example, (x & (x - 1)) == 0 tests whether x is zero or a power of 2.

Negation of signed integers can also be expressed using a bitwise operator: -x == ~x + 1 . This is in fact how we define two's complement representation. We can verify that x and (-x) does add up to zero under this representation:

Bitwise "and" ( & ) can help with modular arithmetic. For example, x % 32 == (x & 31) . We essentially "mask off", or clear, higher order bits to do modulo-powers-of-2 arithmetics. This works in any base. For example, in decimal, the fastest way to compute x % 100 is to take just the two least significant digits of x .

Bitwise shift of unsigned integer

x << i appends i zero bits starting at the least significant bit of x . High order bits that don't fit in the integer are thrown out. For example, assuming 4-bit unsigned integers

Similarly, x >> i appends i zero bits at the most significant end of x . Lower bits are thrown out.

Bitwise shift helps with division and multiplication. For example:

A modern compiler can optimize y = x * 66 into y = (x << 6) + (x << 1) .

Bitwise operations also allows us to treat bits within an integer separately. This can be useful for "options".

For example, when we call a function to open a file, we have a lot of options:

  • Open for reading?
  • Open for writing?
  • Read from the end?
  • Optimize for writing?

We have a lot of true/false options.

One bad way to implement this is to have this function take a bunch of arguments -- one argument for each option. This makes the function call look like this:

The long list of arguments slows down the function call, and one can also easily lose track of the meaning of the individual true/false values passed in.

A cheaper way to achieve this is to use a single integer to represent all the options. Have each option defined as a power of 2, and simply | (or) them together and pass them as a single integer.

Flags are usually defined as powers of 2 so we set one bit at a time for each flag. It is less common but still possible to define a combination flag that is not a power of 2, so that it sets multiple bits in one go.

File cs61-lectures/datarep5/mb-driver.cc contains a memory allocation benchmark. The core of the benchmark looks like this:

The benchmark tests the performance of memnode_arena::allocate() and memnode_arena::deallocate() functions. In the handout code, these functions do the same thing as new memnode and delete memnode —they are wrappers for malloc and free . The benchmark allocates 4096 memnode objects, then free-and-then-allocates them for noperations times, and then frees all of them.

We only allocate memnode s, and all memnode s are of the same size, so we don't need metadata that keeps track of the size of each allocation. Furthermore, since all dynamically allocated data are freed at the end of the function, for each individual memnode_free() call we don't really need to return memory to the system allocator. We can simply reuse these memory during the function and returns all memory to the system at once when the function exits.

If we run the benchmark with 100000000 allocation, and use the system malloc() , free() functions to implement the memnode allocator, the benchmark finishes in 0.908 seconds.

Our alternative implementation of the allocator can finish in 0.355 seconds, beating the heavily optimized system allocator by a factor of 3. We will reveal how we achieved this in the next lecture.

We continue our exploration with the memnode allocation benchmark introduced from the last lecture.

File cs61-lectures/datarep6/mb-malloc.cc contains a version of the benchmark using the system new and delete operators.

In this function we allocate an array of 4096 pointers to memnode s, which occupy 2 3 *2 12 =2 15 bytes on the stack. We then allocate 4096 memnode s. Our memnode is defined like this:

Each memnode contains a std::string object and an unsigned integer. Each std::string object internally contains a pointer points to an character array in the heap. Therefore, every time we create a new memnode , we need 2 allocations: one to allocate the memnode itself, and another one performed internally by the std::string object when we initialize/assign a string value to it.

Every time we deallocate a memnode by calling delete , we also delete the std::string object, and the string object knows that it should deallocate the heap character array it internally maintains. So there are also 2 deallocations occuring each time we free a memnode.

We make the benchmark to return a seemingly meaningless result to prevent an aggressive compiler from optimizing everything away. We also use this result to make sure our subsequent optimizations to the allocator are correct by generating the same result.

This version of the benchmark, using the system allocator, finishes in 0.335 seconds. Not bad at all.

Spoiler alert: We can do 15x better than this.

1st optimization: std::string

We only deal with one file name, namely "datarep/mb-filename.cc", which is constant throughout the program for all memnode s. It's also a string literal, which means it as a constant string has a static life time. Why can't we just simply use a const char* in place of the std::string and let the pointer point to the static constant string? This saves us the internal allocation/deallocation performed by std::string every time we initialize/delete a string.

The fix is easy, we simply change the memnode definition:

This version of the benchmark now finishes in 0.143 seconds, a 2x improvement over the original benchmark. This 2x improvement is consistent with a 2x reduction in numbers of allocation/deallocation mentioned earlier.

You may ask why people still use std::string if it involves an additional allocation and is slower than const char* , as shown in this benchmark. std::string is much more flexible in that it also deals data that doesn't have static life time, such as input from a user or data the program receives over the network. In short, when the program deals with strings that are not constant, heap data is likely to be very useful, and std::string provides facilities to conveniently handle on-heap data.

2nd optimization: the system allocator

We still use the system allocator to allocate/deallocate memnode s. The system allocator is a general-purpose allocator, which means it must handle allocation requests of all sizes. Such general-purpose designs usually comes with a compromise for performance. Since we are only memnode s, which are fairly small objects (and all have the same size), we can build a special- purpose allocator just for them.

In cs61-lectures/datarep5/mb2.cc , we actually implement a special-purpose allocator for memnode s:

This allocator maintains a free list (a C++ vector ) of freed memnode s. allocate() simply pops a memnode off the free list if there is any, and deallocate() simply puts the memnode on the free list. This free list serves as a buffer between the system allocator and the benchmark function, so that the system allocator is invoked less frequently. In fact, in the benchmark, the system allocator is only invoked for 4096 times when it initializes the pointer array. That's a huge reduction because all 10-million "recycle" operations in the middle now doesn't involve the system allocator.

With this special-purpose allocator we can finish the benchmark in 0.057 seconds, another 2.5x improvement.

However this allocator now leaks memory: it never actually calls delete ! Let's fix this by letting it also keep track of all allocated memnode s. The modified definition of memnode_arena now looks like this:

With the updated allocator we simply need to invoke arena.destroy_all() at the end of the function to fix the memory leak. And we don't even need to invoke this method manually! We can use the C++ destructor for the memnode_arena struct, defined as ~memnode_arena() in the code above, which is automatically called when our arena object goes out of scope. We simply make the destructor invoke the destroy_all() method, and we are all set.

Fixing the leak doesn't appear to affect performance at all. This is because the overhead added by tracking the allocated list and calling delete only affects our initial allocation the 4096 memnode* pointers in the array plus at the very end when we clean up. These 8192 additional operations is a relative small number compared to the 10 million recycle operations, so the added overhead is hardly noticeable.

Spoiler alert: We can improve this by another factor of 2.

3rd optimization: std::vector

In our special purpose allocator memnode_arena , we maintain an allocated list and a free list both using C++ std::vector s. std::vector s are dynamic arrays, and like std::string they involve an additional level of indirection and stores the actual array in the heap. We don't access the allocated list during the "recycling" part of the benchmark (which takes bulk of the benchmark time, as we showed earlier), so the allocated list is probably not our bottleneck. We however, add and remove elements from the free list for each recycle operation, and the indirection introduced by the std::vector here may actually be our bottleneck. Let's find out.

Instead of using a std::vector , we could use a linked list of all free memnode s for the actual free list. We will need to include some extra metadata in the memnode to store pointers for this linked list. However, unlike in the debugging allocator pset, in a free list we don't need to store this metadata in addition to actual memnode data: the memnode is free, and not in use, so we can use reuse its memory, using a union:

We then maintain the free list like this:

Compared to the std::vector free list, this free list we always directly points to an available memnode when it is not empty ( free_list !=nullptr ), without going through any indirection. In the std::vector free list one would first have to go into the heap to access the actual array containing pointers to free memnode s, and then access the memnode itself.

With this change we can now finish the benchmark under 0.3 seconds! Another 2x improvement over the previous one!

Compared to the benchmark with the system allocator (which finished in 0.335 seconds), we managed to achieve a speedup of nearly 15x with arena allocation.

Pokhara University

Course content.

  • Old Questions
  • Unit Wise Questions
  • Question Bank

Other Courses

  • Data Structures and Algorithms
  • Numerical Method
  • Computer Graphics
  • Statistics II

Unit 1: Data Representation

Getuplearn – Communication, Marketing, HRM, Tutorial

Data Representation in Computer: Number Systems, Characters, Audio, Image and Video

  • Post author: Anuj Kumar
  • Post published: 16 July 2021
  • Post category: Computer Science
  • Post comments: 0 Comments

Table of Contents

  • 1 What is Data Representation in Computer?
  • 2.1 Binary Number System
  • 2.2 Octal Number System
  • 2.3 Decimal Number System
  • 2.4 Hexadecimal Number System
  • 3.4 Unicode
  • 4 Data Representation of Audio, Image and Video
  • 5.1 What is number system with example?

What is Data Representation in Computer?

A computer uses a fixed number of bits to represent a piece of data which could be a number, a character, image, sound, video, etc. Data representation is the method used internally to represent data in a computer. Let us see how various types of data can be represented in computer memory.

Before discussing data representation of numbers, let us see what a number system is.

Number Systems

Number systems are the technique to represent numbers in the computer system architecture, every value that you are saving or getting into/from computer memory has a defined number system.

A number is a mathematical object used to count, label, and measure. A number system is a systematic way to represent numbers. The number system we use in our day-to-day life is the decimal number system that uses 10 symbols or digits.

The number 289 is pronounced as two hundred and eighty-nine and it consists of the symbols 2, 8, and 9. Similarly, there are other number systems. Each has its own symbols and method for constructing a number.

A number system has a unique base, which depends upon the number of symbols. The number of symbols used in a number system is called the base or radix of a number system.

Let us discuss some of the number systems. Computer architecture supports the following number of systems:

Binary Number System

Octal number system, decimal number system, hexadecimal number system.

Number Systems

A Binary number system has only two digits that are 0 and 1. Every number (value) represents 0 and 1 in this number system. The base of the binary number system is 2 because it has only two digits.

The octal number system has only eight (8) digits from 0 to 7. Every number (value) represents with 0,1,2,3,4,5,6 and 7 in this number system. The base of the octal number system is 8, because it has only 8 digits.

The decimal number system has only ten (10) digits from 0 to 9. Every number (value) represents with 0,1,2,3,4,5,6, 7,8 and 9 in this number system. The base of decimal number system is 10, because it has only 10 digits.

A Hexadecimal number system has sixteen (16) alphanumeric values from 0 to 9 and A to F. Every number (value) represents with 0,1,2,3,4,5,6, 7,8,9,A,B,C,D,E and F in this number system. The base of the hexadecimal number system is 16, because it has 16 alphanumeric values.

Here A is 10, B is 11, C is 12, D is 13, E is 14 and F is 15 .

Data Representation of Characters

There are different methods to represent characters . Some of them are discussed below:

Data Representation of Characters

The code called ASCII (pronounced ‘􀀏’.S-key”), which stands for American Standard Code for Information Interchange, uses 7 bits to represent each character in computer memory. The ASCII representation has been adopted as a standard by the U.S. government and is widely accepted.

A unique integer number is assigned to each character. This number called ASCII code of that character is converted into binary for storing in memory. For example, the ASCII code of A is 65, its binary equivalent in 7-bit is 1000001.

Since there are exactly 128 unique combinations of 7 bits, this 7-bit code can represent only128 characters. Another version is ASCII-8, also called extended ASCII, which uses 8 bits for each character, can represent 256 different characters.

For example, the letter A is represented by 01000001, B by 01000010 and so on. ASCII code is enough to represent all of the standard keyboard characters.

It stands for Extended Binary Coded Decimal Interchange Code. This is similar to ASCII and is an 8-bit code used in computers manufactured by International Business Machines (IBM). It is capable of encoding 256 characters.

If ASCII-coded data is to be used in a computer that uses EBCDIC representation, it is necessary to transform ASCII code to EBCDIC code. Similarly, if EBCDIC coded data is to be used in an ASCII computer, EBCDIC code has to be transformed to ASCII.

ISCII stands for Indian Standard Code for Information Interchange or Indian Script Code for Information Interchange. It is an encoding scheme for representing various writing systems of India. ISCII uses 8-bits for data representation.

It was evolved by a standardization committee under the Department of Electronics during 1986-88 and adopted by the Bureau of Indian Standards (BIS). Nowadays ISCII has been replaced by Unicode.

Using 8-bit ASCII we can represent only 256 characters. This cannot represent all characters of written languages of the world and other symbols. Unicode is developed to resolve this problem. It aims to provide a standard character encoding scheme, which is universal and efficient.

It provides a unique number for every character, no matter what the language and platform be. Unicode originally used 16 bits which can represent up to 65,536 characters. It is maintained by a non-profit organization called the Unicode Consortium.

The Consortium first published version 1.0.0 in 1991 and continues to develop standards based on that original work. Nowadays Unicode uses more than 16 bits and hence it can represent more characters. Unicode can represent characters in almost all written languages of the world.

Data Representation of Audio, Image and Video

In most cases, we may have to represent and process data other than numbers and characters. This may include audio data, images, and videos. We can see that like numbers and characters, the audio, image, and video data also carry information.

We will see different file formats for storing sound, image, and video .

Multimedia data such as audio, image, and video are stored in different types of files. The variety of file formats is due to the fact that there are quite a few approaches to compressing the data and a number of different ways of packaging the data.

For example, an image is most popularly stored in Joint Picture Experts Group (JPEG ) file format. An image file consists of two parts – header information and image data. Information such as the name of the file, size, modified data, file format, etc. is stored in the header part.

The intensity value of all pixels is stored in the data part of the file. The data can be stored uncompressed or compressed to reduce the file size. Normally, the image data is stored in compressed form. Let us understand what compression is.

Take a simple example of a pure black image of size 400X400 pixels. We can repeat the information black, black, …, black in all 16,0000 (400X400) pixels. This is the uncompressed form, while in the compressed form black is stored only once and information to repeat it 1,60,000 times is also stored.

Numerous such techniques are used to achieve compression. Depending on the application, images are stored in various file formats such as bitmap file format (BMP), Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Portable (Public) Network Graphic (PNG).

What we said about the header file information and compression is also applicable for audio and video files. Digital audio data can be stored in different file formats like WAV, MP3, MIDI, AIFF, etc. An audio file describes a format, sometimes referred to as the ‘container format’, for storing digital audio data.

For example, WAV file format typically contains uncompressed sound and MP3 files typically contain compressed audio data. The synthesized music data is stored in MIDI(Musical Instrument Digital Interface) files.

Similarly, video is also stored in different files such as AVI (Audio Video Interleave) – a file format designed to store both audio and video data in a standard package that allows synchronous audio with video playback, MP3, JPEG-2, WMV, etc.

FAQs About Data Representation in Computer

What is number system with example.

Let us discuss some of the number systems. Computer architecture supports the following number of systems: 1. Binary Number System 2. Octal Number System 3. Decimal Number System 4. Hexadecimal Number System

Related posts:

  • 10 Types of Computers | History of Computers, Advantages

What is Microprocessor? Evolution of Microprocessor, Types, Features

What is operating system functions, types, types of user interface, what is cloud computing classification, characteristics, principles, types of cloud providers.

  • What is Debugging? Types of Errors
  • What are Functions of Operating System? 6 Functions

What is Flowchart in Programming? Symbols, Advantages, Preparation

Advantages and disadvantages of flowcharts, what is c++ programming language c++ character set, c++ tokens, what are c++ keywords set of 59 keywords in c ++, what are data types in c++ types, what are operators in c different types of operators in c.

  • What are Expressions in C? Types

What are Decision Making Statements in C? Types

Types of storage devices, advantages, examples, you might also like.

Types of Computer Memory

Types of Computer Memory, Characteristics, Primary Memory, Secondary Memory

What is Computer System

What is Computer System? Definition, Characteristics, Functional Units, Components

Flowchart in Programming

What is Artificial Intelligence? Functions, 6 Benefits, Applications of AI

Process Operating System

Advantages and Disadvantages of Operating System

What is big data

What is Big Data? Characteristics, Tools, Types, Internet of Things (IOT)

Problem Solving Algorithm

What is Problem Solving Algorithm?, Steps, Representation

Types of Storage Devices

Generations of Computer First To Fifth, Classification, Characteristics, Features, Examples

Types of Computer Software

Types of Computer Software: Systems Software, Application Software

Data and Information

Data and Information: Definition, Characteristics, Types, Channels, Approaches

  • Entrepreneurship
  • Organizational Behavior
  • Financial Management
  • Communication
  • Human Resource Management
  • Sales Management
  • Marketing Management

UNIT - 1 The Basic Computer Architecture and Data Representation

Lesson Structure

1.0 Objective

1.1 Introduction

1.2 The Von Neumann Architecture

1.3 Instruction Cycle

1.4 Interrupts

1.4.1 Interrupts and Instruction Cycle

1.5 Data Representation

1.6 Number System

1.7 Decimal Representation in Computers

1.8 Alphanumeric Representation

1.9 Error Detection and Correction Codes

1.10 Summary

1.11 Questions 1.10 Suggested Readings

After going through this unit we will be able to :  Define the logical structure of the computer.  Define the instruction cycle;  Use binary, octal and hexadecimal numbers; The Basic Computer Architecture and Data Representation

 Convert decimal numbers to other systems and vice versa;  Describe the characcter representation in computers;  Describe the data error checking mechanism and error detection and correction codes.

Information Technology (IT) has become a must for the survival of all business houses with the growing information technology trends. The use of IT is well recognised. Computer is the main component of an Information Technology network. Today, computer technology has it existence in every field starting from railway reservations to medical diagnosis, from TV programmes to satellite launching, from matchmaking to criminal catching. Everywhere we witness the elegance, sophistication and efficiency possible only with the help of computers. In this unit, we will be introduced to one of the important computer system structures : the von Neumann Architecture. In addition, we will be introduced to the concepts of a simple model of Instruction execution. The computer system is based on the binary system, therefore, we will explain the concepts of binary Data Representation in the Computer System. This unit will introduce us to the number system concepts which include the Binary, Octal, and Hexadecimal notations. In addition, details of various number representations such as floating-point representation, BCD representation and character-based representations have been described in this Unit. Finally the Error detection and correction codes have been described in the Unit.

1.2 The Von Neumann Architecture John Von Neumann (1903-57) a mathematician was first to publish in 1925 the proposal for a new computer EDVAC (Electronic Discrete Variable Program Concept). The EDVAC differed from most of its predecessors in that it stored an processed numbers in true binary or base of 2 form. The von Neumann architecture was the first proposed structure for a general-purpose computer. But before describing the main components of Von Neumann architecture, we will define the term ‘computer’ as this will help us in discussing about Von Neumann architecture in detail. Computer is defined as “An automatic electronic device for making calculations or controlling operations that are expressible in numerical or logical terms”. It is a device that takes some input, processes it and produces the desired output.

The Basic Computer Architecture and Data Representation

The definition clearly categorises computer as an electronic device although the first computers were mechanical and electro-mechanical devices. The definition also points towards the two major areas of computer applications (a) data processing and (b) controls or operations Another important aspect of the definition is the fact that the computer can perform only those operations/calculations, which can be expressed in Logical or Numerical terms. The basic function performed by a computer is the execution of the program. A program is defined as a sequence of instructions, which operates on data, to perform certain tasks such as finding the greatest of three numbers. The computer controls the execution of the program. In modern digital computers data is represented in binary form by using two symbols 0 and 1. These are called Binary digits or bits . But the data which we deal with consists of numeric data and characters such as decimal digits 0 to 9, alphabets A to Z, arithmetic operators (e.g.+,-, etc.), relations operators (e.g. =, >, etc.), and many other special characters (e.g.; @, {,], etc.). Therefore, there has to be a mechanism for data representation and data conversion. Old computers use eight bits to represent a character. This allows up to 28 = 256 different items to be represented uniquely. This collection of eight bits is called a byte . Thus, one byte is used to represent one character internally. Most computers use two bytes or four bytes to represent numbers (positive and negative) internally. The data also includes the operational data such as integer, decimal number etc. We will discuss more about data representation in the next section. Thus, the prime task of a computer is to perform instruction execution. The two most important questions that can be asked here are. (a) how are the instructions given to the computer ? (b) how are the instructions interpreted and executed ? Let us answer the second question first. All computers have a Unit that performs the arithmetic and logical functions. This Unit is called the Arithmetic Logic Unit (ALU). But how will the computer determine what operation is to be performed by ALU or in other words who will interpret the operation that is to be performed by ALU ? This interpretation is done by the Control Unit of the computer. The control unit accepts the binary form of instruction and interprets the instruction to generate control signals. These control signals then direct the ALU to perform a specified arithmetic or logic function on the data. Therefore, by changing the control signal [ 3 ]

The Basic Computer Architecture and Data Representation the desired function can be performed on data. Or conversely, the operations that need to be performed on the data can be obtained by providing a set of control signals. Thus, for a new operation one only needs to change the set of control signals. The unit that interprets a code ( a machine instruction) to generate respective control signals is termed as Control Unit (CU). A program now consists of a sequence of codes. Each code is, in effect, an instruction, for the computer. The hardware interprets each of these instructions and generates respective control signals such that the desired operation is performed on the data. The Arithmetic Logic Unit (ALU) and the Control Unit (CU) together are termed as the Central Processing Unit (CPU). The CPU is the most important component of a computer’s hardware. All these arithmetic and logical operations are performed in the CPU in special storage areas called registers. The size of the register is one of the important considerations in determining the processing capabilities of the CPU. Register size refers to the amount of information that can be held in a register at a time for processing. The larger the register size, the faster may be the speed of processing. Now let us answer the first question, how can the instructions and data be put into the computers ? The instructions and data to a computer are given by the external environment. Therefore input devices are needed in the computer to supply data to the computer. The main responsibility of input devices will be to put the data in the form of signals that can be recognized by the system. Similarly, we need another component, which will report the results in proper format. This component is called output device. These components together are referred to as Input/Output (I/O) devices. In addition, to transfer the information, the computer system internally needs the system interconnections. At present we will not discuss about Input/Output devices and system interconnections in details, except the information that most common input/output devices are keyboard, monitor and printer, and the most common interconnection structure is the Bus structure. Input devices can bring instructions or data only sequentially. But a program may not be executed sequentially. It may consist of jumps, looping and decision- making instructions. In addition, more than one data element may be required at a time. Therefore, a temporary storage area is needed in a computer to store temporarily the instructions and the data. This component is referred to as memory. The memory unit stores all the information in a group of memory cells such as a group of 8 binary digits (that is a byte) or 16 bits or 32 bits etc. These groups of memory cells or bits are called memory locations. Each memory location has a [ 4 ]

The Basic Computer Architecture and Data Representation unique address and can be addressed independently. The contents of the desired memory locations are provided to the CPU by referring to the address of the memory location. The amount of information that can be held in the main memory is known as memory capacity. The capacity of the main memory is measured in Mega Bytes (MB) or Giga Bytes (GB). One-kilo byte stands for 210 bytes, which are 1024 bytes (or approximately 1000 bytes). A Mega byte stands for 220 bytes, which is approximately a little over one million bytes, a giga byte is 230 bytes. Let us now define the key features of von Neumann Architecture :

 The basic function performed by a computer is the execution of a program, which involves : the execution of an instruction, which supplies the information about an operation,  and the data on which the operation is to be performed. The control unit (CU) interprets each of these instructions and generates respective control signals.  The Arithmetic Logic Unit (ALU) performs the arithmetic and logical operations in special storage area called registers as per the instructions of control unit. The size of the register is one of the important considerations in determining the processing capabilities of the CPU. Register size refers to the amount of information that can be held in a register at a time for processing. The larger the register size, the faster may be the speed of processing.  An Input/ Output system involving I/O devices allows data input and reporting of the results in proper form and format. For transfer of information a computer system internally needs the system interconnections. One such interconnection structure is BUS interconnection.  Main memory is needed in a computer to store instructions and the data at the time of Program execution. Memory to CPU is an important data transfer path. The amount of information, which can be transferred between CPU and memory, depends on the size of BUS connecting the two.  It was pointed out by von-Neumann that the same memory can be used for storing data and instructions. In such a case the data can be treated as data on which processing can be performed, while instructions can be treated as data, which can be used for the generation of control signals.  The von Neumann machine uses stored program concept, i.e., the program and data are stored in the same memory unit for execution. The computers prior to this idea used to store programs and data on separate memories.

Entering and modifying these programs was very difficult as they were entered manually by setting switches, plugging, and unplugging.  Execution of instructions in von Neumann machine is carried out in a sequential fashion (unless explicitly altered by the program itself) from one instruction to the next. The block diagram of Von Neumann machine is given below in figure 1.

Central Processing Unit

Control Unit

Input Output Arithmetic/Logic Unit Device Device

Memory Unit

Figure 1 : Von Neumann Architecture

A Von Neumann machine has only a single path between the main memory and control unit (CU). This is referred to as the Von Neumann bottleneck and often limits the performance of the system. Several other architectures have been suggested for modern computers. We can know about non Von Neumann architectures in suggested readings.

1.3 Instruction Cycle The sequence of operations performed by the CPU in processing an instruction constitute an instruction cycle. While the details of instruction cycle vary with the type of instruction, all instruction require two major steps. Step 1 : A fetch during which a new instruction is read from the external memory M and Step 2 : An execute step during which the operations specified by the instruction are executed. The flow chart for instruction cycle is given below in the figure-2, which shows how the Computer executes a Program, the various types of operations that may be required by computer for execution of program or instruction.

Data Transfer Operation Processing steps of CPU Calculate the address of next instruction

Fetch Instruction

Decode Instruction

Calculate operand address

Fetch Operand More Yes operands

Perform calculation

Store Results

No Is Program execution complete ?

Yes Load next program in memory/jump to other program already in memory

Figure 2 : Instruction Cycle

Thus we can see in the diagram that, the execution cycle for a particular instruction may involve more than one stage and memory references. In addition, an instruction may ask for an I/O operation. Please note that in the diagram above some steps may be bypassed while some may be visited more than once. The instruction cycle shown in figure 2 consists of following stages :  First the address of the next instruction is calculated, based on the size of instruction and memory organization. For example, if in a computer an instruction is of 16 bits and if memory is organized as 16-bits words, then the address of the next instruction is evaluated by adding one in the address of the current instruction. In case, the memory is organized as bytes, which can be addressed individually, then we need to add two in the current instruction address to get the address of the next instruction to be executed in sequence.  Now, the next instruction is fetched from a memory location to the CPU registers such as instruction register.  The next state decodes the instruction to determine the type of operation desired and the operands to be used.  In case the operands need to be fetched from memory or via Input devices, then the address of the memory location or Input device is calculated.  Next, the operand is fetched (or operands are fetched one by one) from the memory or read from the Input devices.  Now, the operation, asked by the instruction is performed.  Finally, the results are written back to memory or Output devices, wherever desired by first calculating the address of the operands and then transferring the values to desired destination. In a number of cases multiple operands and multiple results are allowed in many computers. An example of such a case may be an instruction MUL X, Y. This instruction requires operand X and Y to be fetched. In certain machines an operation to be performed may be on an array of numbers or a string of characters. Such an operation involves repeated fetch for the operands without fetching the instruction again, that is, the instruction cycle loops at operand fetch. Thus, a Program is executed as per the instruction cycle of figure 2. Let us explain this with the help of an example written in C language. C programs is simple to understand. Problem : Write a program to multiply two numbers.

A sample C program (Assuming two fixed values of numbers as x = 10 any y = 20) 1. #include 2. main () 3. { 4. int x = 10, y = 20, product; 5. product = x*y; 6. printf (“\n The product value is: % d”, product); 7. } The program at line 4 declares variables that will be equivalent to 3 memory locations namely x, y and product. At line 5 these variables are added and at line 6 the value of product is printed. Now let us see how these instructions be executed by CPU. First we need to compile this program to convert it to machine language because C is a high level language. Let us take an example of an instruction set of a machines of a size of 16 binary digits (bits) instructions and data. Each instruction of the machine consists of two components : (a) Operation code : that specifies the operation that is to be performed by the instruction, (b) Address of the operand : in memory on which the given operation is to be performed. Let us further assume that the size of operation code is assumed to be of six bits. Therefore, rest 10 bits are for the address of the operand. Also the memory word size is assumed to be of 16 bits. Figure 3. Shows the instruction and data formats for this machine. However, to simplify our discussion, let us present the operation code using pneumonic like LOAD, ADD, STORE and decimal values of operand addresses and signed decimal values for data.

0 5 6 15 15 Operation Sign and magnitude of the number code Operand Address (a) Instruction format (b) Data format

Figure 3 : Instruction and data format of an assumed machine

The instruction execution is performed in the CPU registers. Let us give details on Registers, the temporary storage location in CPU for program execution. Let us define the minimum set of registers required for von Neumann machines : Accumulator Register (AC) : This register is to store data temporarily for computation by ALU. AC is considered to contain one of the operands. The result of computation by ALU is also stored back to AC. It implies that the operands value is over-written by the result. Memory Address Register (MAR) : It specifies the address of memory location from which data or instruction is to be accessed (read operation) or to which the data is to be stored (write operation). Refer to figure 3. Memory Buffer Register (MBR) : It is a register, which contains the data to be written in the memory (write operation) or it receives the data from the memory (read operation). Program Counter (PC) : It keeps track of the instruction that is to be executed next, that is, after the execution of an on-going instruction. Instruction Register (IR) : Here the instructions are loaded prior to execution. But some time the program is required to terminate in between. The reason of termination may be a program error or an interrupt due to some other reason. We need to answer a question that at what point of time is an interruption to a program execution allowed ? To answer this questions, let us discuss the process used in computer that is called interrupt handling.

1.4 Interrupts The term interrupt is an exceptional event that causes CPU to temporarily transfer its control from currently executing program to a different program which provides service to the exceptional event. In other words interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention. An interrupt alerts the processor to a high-priority condition requiring the interruption of the current code the processor is executing. It is like asking a question in a class. When we ask a question in a class by raising hands, the teacher who is explaining some point may respond to our request only after completion of his/her point. Similarly, an interrupt is acknowledged by the CPU when it has completed the currently executing instruction. An interrupt may be generated by a number of sources, which may be either internal or external to the CPU. There are basically two types of interrupts : hardware interrupts and software interrupts.

Hardware interrupts are used by devices to communicate that they require attention from the operating system. Internally, hardware interrupts are implemented using electronic alerting signals that are sent to the processor from an external device, which is either a part of the computer itself, such as a disk controller, or an external peripheral. For example, pressing a key on the keyboard or moving the mouse triggers hardware interrupts that cause the processor to read the keystroke or mouse position. Unlike the software type (described below), hardware interrupts are asynchronous and can occur in the middle of instruction execution, requiring additional care in programming. The act of initiating a hardware interrupt is referred to as an Interrupt Request (IRQ). A software interrupt is caused either by an exceptional condition in the processor itself, or a special instruction in the instruction set which causes an interrupt when it is executed. The former is often called a trap or exception and is used for errors or events occurring during program execution that are exceptional enough that they cannot be handled within the program itself. Each interrupt has its own interrupt handler. The number of hardware interrupts is limited by the number of interrupt request (IRQ) lines to the processor, but there may be hundreds of different software interrupts. Interrupts are a commonly used technique for computer multitasking, especially in real-time computing. Such a system is said to be interrupt-driven. Some of the basic issues of interrupt are :  What are the different kinds of interrupt conditions ?  What are the advantages of having an interruption mechanism ?  How is the CPU informed about the occurrence of an interrupt ?  What does the CPU do on occurrence of an interrupt ? Table 1, gives the list of some common interrupts and events that cause the occurrence of those interrupts.

Interrupt Condition Occurrence of Event Interrupt are generated by executing * division by zero program itself (also called traps) * The number exceeds the maximum allowed. * Attempt of executing an illegal/ privileged instruction * Trying to reference memory location other than allowed for that program.

Interrupt generated by clock in the * Generally used on expiry of time processor allocated for a program, in multiprogramming operating systems. Interrupts generated by I/O devices and * Request of starting an Input/Output their interfaces operation. * Normal completion of an Input/Output operation. * Occurrence of an error in Input/Output operation. Interrupts on Hardware failure * Power failure * Memory parity error. Table 1 : Types of Interrupts and their causes Interrupts are a useful mechanism. They are useful in improving the efficiency of processing. How ? This is to the fact that almost all the external devices are slower than the processor, therefore, in a typical system, a processor has to continually test whether an input value has arrived or a printout has been completed, in turn wasting a lot of CPU time. With the interrupt facility CPU is freed from the task of testing status of Input/Output devices and can do useful processing during this time, thus increasing the processing efficiency. Now the most important question is : How does the CPU know that an interrupt has occurred ? There needs to be a line or a register or status word in CPU that can be raised on occurrence of interrupt condition. Once a CPU knows that an interrupt has occurred then following need to be done :  First the condition is to be checked as to why the interrupt has occurred. That includes not only the device but also why that device has raised the interrupt.  Once the interrupt condition is determined the necessary program called ISRs (Interrupt Servicing Routines) must be executed such that the CPU can resume further operations. For example, assume that the interrupt occurs due to an attempt by an executing program for execution of an illegal or privileged instruction, then ISR for such interrupt may terminate the execution of the program that has caused this condition. Thus, on occurrence of an Interrupt the related ISR is executed by the CPU. The ISRs are pre-defined programs written for specific interrupt conditions. Considering these requirements let us work out the steps, which CPU must perform on the occurrence of an interrupt.  The CPU must find out the source of the interrupt, as this will determine which interrupt service routine is to be executed.  The CPU then acquires the address of the interrupt service routine, which are stored in the memory (in general). [ 12 ]

 What happens to the program the CPU was executing before the interrupt ? this program needs to be interrupted till the CPU executes the Interrupt Service Program. Do we need to do something for this program ? Well the context of this program is to be saved. We will discuss this a bit later.  Finally, the CPU executes the interrupt service routine till the completion of the routine. A RETURN statement marks the end of this routine. After that, the control is passed back to the interrupted program. Let us first discuss saving the context of a program. The execution of a program in the CPU is done using certain set of registers and their respective circuitry. As the CPU register are also used for execution of the interrupt service routine, it is highly likely that these routines alter the content of several registers. Therefore, it is the responsibility of the operating system that before an interrupt service routine is executed the previous content of the CPU registers should be stored, such that the execution of interrupted program can be restarted without any change from the point of interruption. Therefore, at the beginning of interrupt processing the essential context of the processor is saved either into a special save area in main memory or into a stack. This context is restored when the interrupt service routine is finished, thus, the interrupted program execution can be restarted from the point of interruption. 1.4.1 Interrupts and Instruction Cycle Let us summaries the interrupt process, on the occurrence of an interrupt, an interrupt request (in the form of a signal) is issued to the CPU. The CPU on receipt of interrupt request suspends the operation of the currently executing program, saves the context of the currently executing program and starts executing the program which service that interrupt request. This program is also known as interrupt handler. After the interrupting condition/device has been serviced the execution of original program is resumed. Thus, an interrupt can be considered as the interruption of the execution of an ongoing user program. The execution of user program resumes as soon as the interrupt processing is completed. Therefore, the user program does not contain any code for interrupt handling. This job is to be performed by the processor and the operating system, which in turn are also responsible for suspending the execution of the user program, and later after interrupt handling, resumes the user program from the point of interruption. But when can a user program execution be interrupted ? It will not be desirable to interrupt a program while an instruction is getting executed and is in a state like instruction decode. The most desirable place for program interruption would be when it has completed the previous instruction and is [ 13 ]

The Basic Computer Architecture and Data Representation about to start a new instruction. The complete interrupt cycle is given below in the Figure 4.

Data Transfer Operation Processing steps of CPU Calculate the address of next instruction Fetch Instruction

Fetch Operand

More Yes operands

No Interrupt

Is Yes No execution * Ack and process interrupt by saving content complete ? * Load the address of first instruction of selected ISR

Yes No Is Program execution complete ?

Yes * If ISR is over start the interrupt program again * If program is over then jump to other program

Figure 4 : Instruction Cycle with Interrupt Cycle

Figure 4. shows instruction execution cycle with interrupt cycle, where the interrupt condition is acknowledged. Please note that even interrupt service routine is also a program and after acknowledging interrupt the next instruction executed through instruction cycle is the first instruction of interrupt servicing routine. In the interrupt cycle, the responsibility of the CPU/Processor is to check whether any interrupts have occurred checking the presence of the interrupt signal. In case no interrupt needs service, the processor proceeds to the next instruction of the current program. In case an interrupt needs servicing then the interrupt is processed as per the following.  Suspend the execution of current program and save its context.  Set the Program counter to the starting address of the interrupt service routine of the interrupt acknowledged.  The processor then executes the instructions in the interrupt-servicing program.  After completing the interrupt servicing program the CPU can resume the program it has suspended in step 1 above.

1.5 Data Representation A computer must be able to take input, process it and produce output. The two most important questions and their respective answers here are : Q.1. How is the Information represented in a computer ? Ans. It is in the form of Binary Digit popularly called Bit. Q.2. How is the input and output presented in a form that is understood by us ? Ans. One of the minimum requirements in this case may be to have a representation for characters. Thus, a mechanism that fulfils such requirement is needed. In Computers information is represented in digital form, therefore, to represent characters in computer we need codes. Some common character codes are ASCII, EBCDIC etc. These character codes are discussed in the sections below.

1.6 Number System Number system is used to represent information in quantitative form. Some of the common number system are binary, octal, decimal and hexadecimal.

A number system of base (also called radix) r is a system which has r distinct symbols for r digits. A string of these symbolic digits represents a number. To determine the value that a number represents, we multiply the number by its place value that is an integer power of r depending on the place it is located and then find the sum of weighted digits. (We will use subscript 8 for an octal number, similarly, a subscript 2 for binary, 10 for decimal and 16 for hexadecimal number. In case no subscript is specified then number will be treasted as decimal number of else whatever number system is specified before it .) Decimal Numbers : Decimal number system has ten digits represented from 0 to 9. Any decimal number can be represented as a string of these digits and since there are ten decimal digits, therefore, the base or radix of this system is 10. Thus, a number 60.5 can be represented as : 6 × 101 + 0 × 100 + 5 × 10–1 Binary numbers : In binary numbers we have two digits 0 and 1 and they can be represented, as a string of these two-digits called bits. The base of binary number system is 2. A group of 4 bits and 8 bits called a nibble and byte respectively. The highest decimal number that can be represented by n bits binary n number is 2 – 1. Thus with an 8 bit number the maximum decimal number that be 8 represented is 2 –1 is 255. For example 53.625 is a decimal number which can be converted to is binary equivalent as follows: Step 1 : The integer is 53 and fraction is 0.625. Step 2 : Integer conversion.

2 2 6 = 1 (110101)2 2 1 3 = 0 2 6 = 1 2 3 = 0 1 = 1

We read the remainders from bottom to top which give the binary equivalent.

Therefore, (53)10 = (110101)2

Step 3 : Fractional conversion : If the decimal number is a fraction, its binary equivalent is obtained by multiplying the number continuously by 2, recording a carry in the integer position each time. The carrier in the forward order give the required binary number

Multiplication Generated Integer 0.625 × 2 = 1.25  1 0.250 × 2 = 0.50  0 0.500 × 2 = 1.00  1 0.000 × 2 = 0.00  0 Further multiplication by two is not possible since the product is zero. The binary equivalent is obtained by reading the carry terms from top to bottom. The combined number will give the binary equivalent as :

(53.625)10 = (110101.101)2

Binary to Decimal Conversion : For example 101111.1101 is a valid Binary number. A binary number can be converted into a decimal number by multiplying the Binary numbers 1 or 0 by their weights and adding the products.

1 0 1 1 1 1

1 × 20 = 1 1 × 21 = 2

1 × 25 = 32

Therefore (101111)2 = (47)10

Conversion of (.1101)2 is done is a similar manner.

1 × 2–4 = 0.0625

0 × 2–3 = 0.0000 1 × 2–2 = 0.2500

1 × 2–1 = 0.5000 0.8125

Therefore (0.1101)2 = (0.8125)10 Combining the numbers we have

(101111.1101)2 = (47.8125)10 Octal Numbers : An Octal number use the digits from 0 to 7. The base or radix of octal number is 8. Each significant position in an octal number has a positional weight. The least significant position has a weight of 8º that is 1. The octal equivalent of a decimal number can be obtained by dividing given decimal number by 8 repeatedly until quotient of 0 is obtained. We will explain the decimal to octal conversion in the following example :

Ex : Convert (444.456)10 to an octal number. Step 1 : Integer conversion :

Division Remainder

Reading the remainders from bottom to top, the decimal number (444)10 =

(674)8. Step 2 : Fractional conversion : Multiplication Generated Integer 0.456 × 8 = 3.648  3 0.648 × 8 = 5.184  5 0.184 × 8 = 1.472  1 0.472 × 8 = 3.776  3 0.776 × 8 = 6.208  6 [ 18 ]

The process is terminated then significant digits are obtained. Thus combining the integer and fractional part we have

(444.456)10 = (674.35136)8

Octal to decimal conversion

Ex. (i) convert (120)8 to its decimal equivalent. 2 1 Solv  (120)8 = 1 × 8 + 2 × 8 + 0 × 8º = 1 × 64 + 2 × 8 + 0 = 64 + 16 + 0

= (80)10. Octal to Binary conversion : Conversion from octal to Binary and vice-versa can be easily carried out. For obtaining the Binary equivalent of an octal number, each significant digit in the given number is replaced by its 3-bit binary equivalent. For example :

(376)8 = 3 7 6 = 011 111 110  binary equivalent

Thus (376)8 = (011 111110)2. Similarly for converting binary to its octal equivalent, we reverse the process, as follows

(1 010 111 111 101)2 = 001 010 111 111 101

Thus (1010111111101)2 = (12 775)8 (* we make a group of 3-bits from left to right). Hexadecimal Number : The hexadecimal number system has base or radix of 16. It use symbols, namely 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F. the symbols A to F represent the decimal 10, 11, 12, 13, 14 and 15 respectively. Each significant position in an hexadecimal number has a positional weight. The hexadecimal equivalent of a decimal number can be obtained by dividing the given decimal number by 16 respectively until a quotient of 0 is obtained.

Ex. Convert (235)10 to hexadecimal number

16 1 4 11 B

Reading the remainders from bottom to top, the decimal number (235)10 =

(EB)16. The conversion from hexadecimal to its decimal number can be carried out by multiplying each significant digit of hexadecimal by its respective weight and adding the product. For example.

Convert (A 3 B)16 to its decimal equivalent. 2 1 0 Solv. (A 3 B)16 = A × 16 + 3 × 16 + B × 16 = 10 × 162 + 3 × 161 + 11 × 1 = 10 × 256 + 3 × 16 + 11 = 2560 + 48 + 11

= (2619)10 Hexadecimal to Binary conversion Conversion from hexadecimal to binary and vice-versa can easily be carried out by putting four bit binary equivalent for each hexadecimal number. For example :

(2 D 5)16 = 2 D 5 0010 1101 0101

Thus (2 D 5)16 = (001011010101)2. The reverse procedure is used for converting a binary number to an hexadecimal. For example.

(111 1011 0101)2 = 0111 1011 0101 7 B 5

Thus (11110110101)2 = (7 B 5)16 Hexadecimal Octal Conversion Conversion from hexadecimal to octal and vice-versa is sometimes required. To convert a hexadecimal number to octal, the following steps are applied :

(i) Convert the given hexadecimal number to its binary equivalent using 4 bits. (ii) For m group of 3 bits starting from least significant bit (LSB). (iii) Write the equivalent octal number for each group of 3 bits. For example, MSB LSB

(47)16 = (0100 0 111) 2

=1 0 7 = (107)8 * (Try and convert octal to hexadecimal.) Floating point representation of number In the decimal system very large and very small numbers are expressed in scientific notation as follows : 4.69 × 1023 and 1.601 × 10–19. Binary number can also be expressed by the floating point representation. The floating point representation of a number consist of the parts : (i) The first part represent a signed, fixed point number called the mantissa (m); (ii) Second part designates the positon of the decimal (or Binary) point and called the exponent (e). The fixed point mantissa may be fraction or an integer. The number of bits required to express the exponent and mantissa is determined by the accuracy desired from the computing system as well as its capability to handle such numbers. For example, the decimal number + 8145.679 is represented in floating point as follows : Sign Sign 0 0.81456790 04 Mantissa Exponent The mantissa has a 0 in the leftmost position to denote a plus. Here, the mantissa is considered to be a fixed point fraction. This representation is equivalent to the number expressed as a fraction 10 times by an exponent, that is 0.8145679 × 10+4. Mantissa is sometimes called the fraction part. Let us consider for example a computer that assumes integer representation for the mantissa and radix 8 for the numbers. The octal number +36.754 = 36754 × 8–3 in its floating point representation will look like this :

Sign Sign 0 367541 03 Mantissa Exponent When this number is represented in a register in its binary-coded form, the actual value of the register becomes : 0 0111 110 111 101 100 1 000 011 and 0 3 6 7 5 4 1 0 3 Most computer and all electronic calculators have a built in capacity to perform floating point arithmetic operations. Institute of Electrical and Electronics Engineers (IEEE) is a society which has created lots of standards regarding various aspects of computer, has created IEEE standard 754 for floating point representation and arithmetic. The basic objective of developing this standard has to facilitate the portability of program from one to another computer. This standard has resulted in development of standard numerical capabilities in various microprocessors.

1.7 Decimal Representation in Computers Normally, the decimal digits are coded in 7 ro 8 bits as alphanumeric characters. But for the purpose of arithmetic calculation the decimal digits are treated as four bit binary code. As we know that 2 binary bits can represent 22 = 4 different combinations, 3-bits can represent 23 = 8 combination and 4-bit can represent 24 = 16 combinations. To represent decimal digits into binary form we use 4-bit for each decimal digits from 0 to 9. This is called Binary-coded-Decimal (BCD) number. The table below shows the BCD numbers.

Decimal Number Binary Number BCD 0 0000 0000 1 0001 0001 2 0010 0010 3 0011 0011 4 0100 0100 5 0101 0101 6 0110 0110 7 0111 0111 8 1000 1000 9 1001 1001

Decimal Number Binary Number BCD 10 1010 0001 0000 11 1011 0001 0001 12 1100 0001 0010 13 1101 0001 0011 14 1110 0001 0100 15 1111 0001 0101 Let us represent 86.065 in BCD

8 6  0 6 5 1000 0110  0000 0110 0101 * If we try to compare the equivalent BCD with equivalent Binary value, we will find that both are different.

1.8 Alphanumeric Representation The codes that represent numbers, alphabetic letters and special symbols are called alphanumeric codes. A complete set of necessary characters include : (i) 26 lower case letters (ii) 26 upper case letters (iii) 10 numeric digits (0 to 9). (iv) about 25 special characters such as 4, /, #, $ ….. etc. These total upto about 87 symbols. To represent these 87 symbols with some binary code will require at least 7-bits. ASCII Code The full form of ASCII is American Standard Code for Information Interchange. It uses 7 bits to represent 27 = 128 characters. ASCII was extended to 8 bits to represent 28 = 256 characters called extended ASCII codes. The extended ASCII codes are the codes used in most of the microcomputers. EBCDIC Another alphanumeric code used in IBM equipment is the EBCDIC or extended Binary Coded Decimal Information code. It differs from ASCII only in its code grouping for the different alphanumeric characters. It uses eight bits for each character. ISCII (Indian Standard Code for Information Interchange).

It is an eight-bit code that contains the standard ASCII values till 127 from 128-225. It contains the characters required in the ten Brahmi-based. Indian scripts. It is defined in IS 13194 : 1991 standard. It support INSCRIPT keyboard which provides a logical arrangement of vowels and consonants based on the phonetic properties and usage frequencies of the letter of Brahmi-Scripts. Any software that uses ISCII codes can be used in any Indian script, enhancing its commercial viability. UNICODE This is a newer International standard for character representation. Unicode provides a unique number for every character, no matter what the platform, no matter what the program and no matter what the language is. The Unicode standard has been adopted by industry leaders like Apple, HP, IBM, Microsoft, ORACLE, SAP etc. One of the advantage of Unicode in the client server or multi-tiered applications and websites is the cost saving over the use of legacy character sets that results in targeting websites and software products across multiple plateform, languages and countries without re-engineering. It allows dots to be transported through many different systems without corruption.

1.9 Error Dectection and Correction Codes Before ending the unit we must discuss about the errors, how it is detected and corrected. During the process of binary dots transmission, errors may occur. In order to detect and correct such errors, two types of codes, namely (i) error-detecting codes and (ii) error-correcting codes, may be used. If a single error transform a valid code word into an invalid one, it is said to be single error detecting code. The most simple and commonly used error detecting method is the parity check. In this method an extra parity bit is included with the binary message to make the total number of 1’s either odd or even. This results in either even parity or odd parity. In even parity method the total number of 1’s in the code group (including the parity bit) must be an even number. Similarly, in the odd- parity method, the total number of 1’s (including the parity bit) must be an odd number. The parity bit can be placed at the either end of the code. Table 2 shows a message of 4 bits and its corresponding even and odd parity bits.

Message Even-parity code (P) Odd-parity-code (I) P (P) 0000 0000 0 0000 1 0001 0001 1 0001 0 0101 0101 0 0101 1 1110 1110 1 1110 0 If a single error occurs, it transform the valid code into an invalid one. This helps in the detection of single bit errors. Though the parity code is meant for single error detection, it can detect only odd number of errors. However in both the cases, original code word can not be found. If an even number of error occurs, then the parity check is satisfied, giving erroneous results. Hamming : Error detection and correction codes R. W Hamming developed a system that provides a methodical way to add one or more parity bits to a data character in order to detect and correct errors. The code words is defined as the number of bits changed from one code word to another. Let us assume Ci and Cj be any two words in a particular block code. Harming distance dij between the two words Ci and Cj is defined by the number of bits in which they differ. The minimum value of dij is called Hamming distance dmin. For linear block codes minimum weight is equal to minimum distance. For example Ci = 1 0 0 0 1 1 1  Cj = 0 0 0 1 0 1 1 Here, these codes differ in the left most bit position and in the 4th and 5th position from the left. Therefore dij = 3. From Humming’s analysis of code distances the following important properties have been derived : (i) A minimum distance of at least two is required for single error deletion. dmin 1 (ii) Since the number of errors, E  a minimum distance of three 2 is required for single error correction. (iii) Greater distance will provide deletion and/or correction of more number of errors.

The 7 bit Hamming (7,4) code word h1 h2 h3 h4 h5 h6 h7 associated with a form bit binary number b3 b2 b1 b0 is : [ 25 ]

h1 = b3  b2  b0 h3 = b3

h2 = b3  b1  b0 h5 = b2

h4 = b2 b1  b0 h6 = b1

h7 = b0 Where  denoted the exclusive OR operation. Note that bits h1, h2 and h4 are even parity bits for the bit fields b3 b2 b0, b3 b1 b0 and b2 b1 b0, respectively. In general, the parity bits (h1, h2, h4, h8 ……) are located in the positions corresponding to ascending power of two (i.e, 20, 21, 22, 23 ….. = 1, 2, 4, 8 ….. ) To decode a Hamming code, one must check for odd parity over the bit fields in which even parity has already been taken out. For example, a single bit error is indicated by a non – zero parity word c4 c2 c1, where

c1 = h1  h3  h5  h7

c2 = h2  h3  h6  h7

c4 = h4  h5  h6  h7

If C4 C2 C1 = 0 0 0, there is no error in the Hamming code. If it has non zero value, it indicates the bit position in error. For example, if C4 C2 C1 = 101, then bit 5 is in error. To correct this error, bit 5 has to be complemented. Example 1: Encode data bits 0 1 0 1 into a 7 bit even parity Hamming code.

Solution: Given b3 b2 b1 b0 = 0 1 0 1. Therefore,

h1 = b3  b2  b0 = 0 110

h2 = b3  b1  b0 = 0 011

h4 = b2 b1  b0 = 1 010 

h3 = b3 = 0

h5 = b2 = 1 Therefore, h1 h2 h3 h4 h5 h6 h7

h6 = b1 = 0 0 1 0 0 1 0 1

h7 = b0 = 1 Example 2 : A 7 bit Hamming code is received as 0 1 0 11 0 1. What is its correct code ? Solution : h1 h2 h3 h4 h5 h6 h7 0 1 0 1 1 0 1 Now, to find the errors

C1 = h1  h3  h5 h7 = 0 

C2 = h2  h3  h6 h7 = 1 

C4 = h4  h5  h6 h7 = 1 

Thus , C4 C2 C1 = 100. Therefore, bit 4 is in error and corrected code word can be obtained by complementing the fourth bit in the received code word as 0 1 0 0 1 0 1.

In this unit we completed the basic concepts of computer architecture. We discussed the von-Neumann architecture. This unit provides an in-depth coverage of instruction cycle, interrupts, and the data representation in a computer system. We have also covered aspects relating to error detection mechanism. The unit covers number system, conversion of number system, conversion of number to a different number system. It introduces the concept of computer arithmetic using 2’s complement notation and provides introduction to information representation codes like ASCII EBCDIC, etc. The concept of floating point numbers has also been covered. Finally error detection and correction mechanism is detailed. The information given on various topics such as data representation, error detection codes etc, is although exhaustive yet can be supplemented with additional reading In fact, a course in an area of computer must be supplemented by further reading to keep your knowledge up to date, as the computer world is changing with leaps and bounds. In addition to further readings the student is advised to study several Indian journals on computers to enhance his knowledge.

1.11 Questions

1. What is von Neumann Architecture ? Describe the block diagram of von Neumann. 2. What is an instruction cycle ? Explain each step of instruction cycle. 3. What is the role of memory in a computer ? 4. Define interrupt. Describe different types of interrupt condition and the events that causes the occurrence of those interrupts. 5. Explain the flowchart of instruction cycle with interrupt cycle. 6. Convert the following binary numbers to decimal. (i) 11001 (ii) 1010.10 (iii) 11101.110 (iv) 010101.001

7. Convert the following decimal numbers to binary, octal and hexadecimal. (i) 65 (ii) 861 (iii) 1234 (iv) 2656.01 (v) 6060.60 (vi) 3426.006 8. Convert the following binary numbers to octal and hexadecimal. (i) 11010101 (ii) 1110110.001 (iii) 1111010.010 (iv) 100011.0100 9. Write the BCD equivalent of the given numbers. (i) 45 (ii) 73 (iii) 146 (iv) 621.10 (v) 553.02 (vi) 653.321 10. Find the even and odd parity bits for the following 7-bit dots. (i) 0101010 (ii) 0001011 (iii) 1110101 (iv) 1001001 11. Explain Hamming error correcting codes. 12. A 7-bit Hamming code is received as 0110010. What is its correct code. 13. Encode data bits 1101 into a 7-bit even parity Hamming code.

1.12 Suggested Readings 1. Digital Logic and computer Design by Morris Mono. 2. Computer Organisation and Architecture, William Stallings.

Web Analytics

Computer System Architecture Notes pdf | csa notes pdf 2023

Computer system architecture notes pdf.

Free Computer System Architecture notes pdf are provided here for Computer System Architecture students so that they can prepare and score high marks in their Computer System Architecture exam.

In these free Computer System Architecture notes pdf, we will introduce the students to the fundamental concepts of digital computer organization, design, and architecture. It aims to develop a basic understanding of the building blocks of the computer system and highlights how these blocks are organized together to architect a digital computer system.

We have provided complete Computer System Architecture handwritten notes pdf for any university student of BCA, MCA, B.Sc, B.Tech CSE, M.Tech branch to enhance more knowledge about the subject and to score better marks in their Computer System Architecture exam.

Free Computer System Architecture notes pdf are very useful for Computer System Architecture students in enhancing their preparation and improving their chances of success in Computer System Architecture exam.

These free Computer System Architecture pdf notes will help students tremendously in their preparation for Computer System Architecture exam. Please help your friends in scoring good marks by sharing these free Computer System Architecture handwritten notes pdf from below links:

Topics in our Computer System Architecture Notes PDF

The topics we will cover in these Computer System Architecture PDF Notes will be taken from the following list:

Digital Logic Circuits:   Logic Gates, truth tables, Boolean Algebra, digital circuits, combinational circuits, sequential circuits, circuit simplification using Karnaugh map, Don’t Care Conditions, flip-flops, characteristic tables

Digital Components:  Half Adder, Full Adder, Decoders, Multiplexers, Registers, and Memory Units

Data Representation and Basic Computer Arithmetic : Number system, complements, fixed, and floating-point representation. Alphanumeric representation. Addition, subtraction.

Basic Computer Organization and Design : Common Bus system, instruction codes, instruction format, instruction set completeness, Sequence Counter, timing and control, instruction cycle, memory reference instructions, and their implementation using arithmetic, logical, program control, transfer and input-output micro-operations, interrupt cycle.

Central Processing Unit : Micro programmed Control vs Hardwired Control, lower-level programming languages, Instruction format, accumulator, general register organization, stack organization, zero-address instructions, one-address instructions, two-address instructions, three address instructions, Addressing Modes, RISC, CISC architectures, pipelining and parallel processing.

Memory Organization and Input-Output Organization:  Input-Output Organization: Peripheral Devices, I/O interface, I/O vs. Memory Bus, Programmed I/O, Interrupt-Driven I/O, Direct Memory Access

Computer System Architecture Notes PDF FREE Download

Computer System Architecture students can easily make use of all these complete Computer System Architecture notes pdf by downloading them from below links:

Computer System Architecture Handwritten Notes by Abhishek.pdf

Computer System Architecture Handwritten Notes by Abhishek.pdf

Computer System Architecture Handwritten Notes by Riya.pdf

Computer System Architecture Handwritten Notes by Riya.pdf

Computer Organization and Architecture Handwritten Notes by Abhishek.pdf

Computer Organization and Architecture Handwritten Notes by Abhishek.pdf

Computer Organization and Architecture Notes.pdf

Computer Organization and Architecture Notes.pdf

computer architecture notes pdf free download

Computer architecture notes pdf free download Source: nptel.ac.in

computer architecture notes for bca pdf

Computer architecture notes for bca pdf Source: srividyaengg.ac.in

computer system architecture notes for mca pdf

Computer system architecture notes for mca pdf Source: iare.ac.in

Computer architecture notes for bca 3rd sem pdf

Computer architecture notes for bca 3rd sem pdf Source: ddegjust.ac.in

computer system architecture notes for bsc computer science

Computer system architecture notes for bsc computer science Source: vssut.ac.in

computer architecture notes for diploma

Computer architecture notes for diploma Source: iitk.ac.in

BCA computer architecture notes

BCA computer architecture notes Source: iitm.ac.in

computer architecture mca notes

Computer architecture mca notes Source: iitb.ac.in

computer architecture bca 3rd sem notes

Computer architecture bca 3rd sem notes Source: ocw.mit.edu

bca 3rd sem computer architecture notes

bca 3rd sem computer architecture notes Source: researchgate.net

computer organization and architecture notes in hindi pdf

Computer organization and architecture notes in hindi pdf Source: tutorialspoint.com

Computer system architecture notes in hindi pdf

Computer system architecture notes in hindi pdf Source: geeksforgeeks.org

How to Download FREE Computer System Architecture Notes PDF?

Computer System Architecture students can easily download free Computer System Architecture notes pdf by following the below steps:

  • Visit TutorialsDuniya.com to download free Computer System Architecture notes pdf
  • Select ‘College Notes’ and then select ‘Computer Science Course’
  • Select ‘Computer System Architecture Notes’
  • Now, you can easily view or download free Computer System Architecture handwritten notes pdf

Computer System Architecture Books

We have listed the best Computer System Architecture Books that can help in your Computer System Architecture exam preparation:

Computer System Architecture

Computer System Architecture

Get this Book

Computer System Architecture

Computer Organization & Architecture

Computer Systems Design and Architecture

Computer Systems Design and Architecture

Computer Systems Architecture: A Networking Approach

Computer Systems Architecture: A Networking Approach

A Handbook on Computer System Architecture

A Handbook on Computer System Architecture

Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach

Advanced Computer Architectures

Advanced Computer Architectures

Computer Architecture and Organization

Computer Architecture and Organization

Computer Architecture and Organization

Computer Systems: A Programmer’s Perspective

Computer System Architecture and Organization

Computer System Architecture and Organization

Benefits of FREE Computer System Architecture Notes PDF

Free Computer System Architecture notes pdf provide learners with a flexible and efficient way to study and reference Computer System Architecture concepts. Benefits of these complete free Computer System Architecture pdf notes are given below:

  • Accessibility: These free Computer System Architecture handwritten notes pdf files can be easily accessed on various devices that makes it convenient for students to study Computer System Architecture wherever they are.
  • Printable: These Computer System Architecture free notes pdf can be printed that allows learners to have physical copies of their Computer System Architecture notes for their reference and offline reading.
  • Structured content: These free Computer System Architecture notes pdf are well-organized with headings, bullet points and formatting that make complex topics easier to follow and understand.
  • Self-Paced Learning: Free Computer System Architecture handwritten notes pdf offers many advantages for both beginners and experienced students that make it a valuable resource for self-paced learning and reference.
  • Visual Elements: These free Computer System Architecture pdf notes include diagrams, charts and illustrations to help students visualize complex concepts in an easier way.

We hope our free Computer System Architecture notes pdf has helped you and please share these Computer System Architecture handwritten notes free pdf with your friends as well 🙏

Download FREE Study Material App for school and college students for FREE high-quality educational resources such as notes, books, tutorials, projects and question papers.

If you have any questions feel free to reach us at [email protected] and we will get back to you at the earliest.

TutorialsDuniya.com wishes you Happy Learning! 🙂

Computer Science Notes

  • Design and Analysis of Algorithms Notes
  • Artificial Intelligence Notes
  • C++ Programming Notes
  • Combinatorial Optimization Notes
  • Computer Graphics Notes
  • Computer Networks Notes
  • Computer System Architecture Notes
  • Data Analysis & Visualization Notes
  • Data Mining Notes
  • Data Science Notes
  • Data Structures Notes
  • Deep Learning Notes
  • Digital Image Processing Notes
  • Discrete Mathematics Notes
  • Information Security Notes
  • Internet Technologies Notes
  • Java Programming Notes
  • Machine Learning Notes
  • Microprocessor and Microcontrollers Notes
  • Operating System Notes
  • Operational Research Notes
  • PHP Lecture Notes
  • Python Programming Notes
  • R Programming Notes
  • Software Engineering Notes
  • System Programming Notes
  • Theory of Computation Notes
  • Unix Network Programming Notes
  • Web Design & Development Notes

Computer System Architecture Notes FAQs

Q: Where can I get complete Computer System Architecture Notes pdf FREE Download?

A: TutorialsDuniya.com have provided complete Computer System Architecture free Notes pdf so that students can easily download and score good marks in your Computer System Architecture exam.

Q: How to download Computer System Architecture notes pdf?

A: Computer System Architecture students can easily make use of all these complete free Computer System Architecture pdf notes by downloading them from TutorialsDuniya.com

Software Engineering Projects with Source & Documentation

Computer System Architecture Notes pdf | csa notes pdf 2023

You will always find the updated list of top and best free Software Engineering projects with source code in an easy and quick way. Our Free Software Engineering projects list has projects for beginners, intermediates as well as experts to learn in 2023.

URL: https://www.tutorialsduniya.com/software-engineering-projects-pdf/

Author: Delhi University

Share on Facebook

Javatpoint Logo

  • Computer Fundamentals

Computer Network

Control System

  • Interview Q

COA Tutorial

Basic co and design, computer instructions, digital logic circuits, map simplification, combinational circuits, flip - flops, digital components, register transfer, micro-operations, memory organization.

JavaTpoint

  • Send your Feedback to [email protected]

Help Others, Please Share

facebook

Learn Latest Tutorials

Splunk tutorial

Transact-SQL

Tumblr tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras tutorial

Preparation

Aptitude

Verbal Ability

Interview Questions

Interview Questions

Company Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Artificial Intelligence

AWS Tutorial

Cloud Computing

Hadoop tutorial

Data Science

Angular 7 Tutorial

Machine Learning

DevOps Tutorial

B.Tech / MCA

DBMS tutorial

Data Structures

DAA tutorial

Operating System

Computer Network tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

html tutorial

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

C Programming

C++ tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

RSS Feed

Talk to our experts

1800-120-456-456

  • Introduction to Data Representation
  • Computer Science

ffImage

About Data Representation

Data can be anything, including a number, a name, musical notes, or the colour of an image. The way that we stored, processed, and transmitted data is referred to as data representation. We can use any device, including computers, smartphones, and iPads, to store data in digital format. The stored data is handled by electronic circuitry. A bit is a 0 or 1 used in digital data representation.

Data Representation Techniques

Data Representation Techniques

Classification of Computers

Computer scans are classified broadly based on their speed and computing power.

1. Microcomputers or PCs (Personal Computers): It is a single-user computer system with a medium-power microprocessor. It is referred to as a computer with a microprocessor as its central processing unit.

Microcomputer

Microcomputer

2. Mini-Computer: It is a multi-user computer system that can support hundreds of users at the same time.

Types of Mini Computers

Types of Mini Computers

3. Mainframe Computer: It is a multi-user computer system that can support hundreds of users at the same time. Software technology is distinct from minicomputer technology.

Mainframe Computer

Mainframe Computer

4. Super-Computer: With the ability to process hundreds of millions of instructions per second, it is a very quick computer. They  are used for specialised applications requiring enormous amounts of mathematical computations, but they are very expensive.

Supercomputer

Supercomputer

Types of Computer Number System

Every value saved to or obtained from computer memory uses a specific number system, which is the method used to represent numbers in the computer system architecture. One needs to be familiar with number systems in order to read computer language or interact with the system. 

Types of Number System

Types of Number System

1. Binary Number System 

There are only two digits in a binary number system: 0 and 1. In this number system, 0 and 1 stand in for every number (value). Because the binary number system only has two digits, its base is 2.

A bit is another name for each binary digit. The binary number system is also a positional value system, where each digit's value is expressed in powers of 2.

Characteristics of Binary Number System

The following are the primary characteristics of the binary system:

It only has two digits, zero and one.

Depending on its position, each digit has a different value.

Each position has the same value as a base power of two.

Because computers work with internal voltage drops, it is used in all types of computers.

Binary Number System

Binary Number System

2. Decimal Number System

The decimal number system is a base ten number system with ten digits ranging from 0 to 9. This means that these ten digits can represent any numerical quantity. A positional value system is also a decimal number system. This means that the value of digits will be determined by their position. 

Characteristics of Decimal Number System

Ten units of a given order equal one unit of the higher order, making it a decimal system.

The number 10 serves as the foundation for the decimal number system.

The value of each digit or number will depend on where it is located within the numeric figure because it is a positional system.

The value of this number results from multiplying all the digits by each power.

Decimal Number System

Decimal Number System

Decimal Binary Conversion Table

3. octal number system.

There are only eight (8) digits in the octal number system, from 0 to 7. In this number system, each number (value) is represented by the digits 0, 1, 2, 3,4,5,6, and 7. Since the octal number system only has 8 digits, its base is 8.

Characteristics of Octal Number System:

Contains eight digits: 0,1,2,3,4,5,6,7.

Also known as the base 8 number system.

Each octal number position represents a 0 power of the base (8). 

An octal number's last position corresponds to an x power of the base (8).

Octal Number System

Octal Number System

4. Hexadecimal Number System

There are sixteen (16) alphanumeric values in the hexadecimal number system, ranging from 0 to 9 and A to F. In this number system, each number (value) is represented by 0, 1, 2, 3, 5, 6, 7, 8, 9, A, B, C, D, E, and F. Because the hexadecimal number system has 16 alphanumeric values, its base is 16. Here, the numbers are A = 10, B = 11, C = 12, D = 13, E = 14, and F = 15.

Characteristics of Hexadecimal Number System:

A system of positional numbers.

Has 16 symbols or digits overall (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). Its base is, therefore, 16.

Decimal values 10, 11, 12, 13, 14, and 15 are represented by the letters A, B, C, D, E, and F, respectively.

A single digit may have a maximum value of 15. 

Each digit position corresponds to a different base power (16).

Since there are only 16 digits, any hexadecimal number can be represented in binary with 4 bits.

Hexadecimal Number System

Hexadecimal Number System

So, we've seen how to convert decimals and use the Number System to communicate with a computer. The full character set of the English language, which includes all alphabets, punctuation marks, mathematical operators, special symbols, etc., must be supported by the computer in addition to numerical data. 

Learning By Doing

Choose the correct answer:.

1. Which computer is the largest in terms of size?

Minicomputer

Micro Computer

2. The binary number 11011001 is converted to what decimal value?

Solved Questions

1. Give some examples where Supercomputers are used.

Ans: Weather Prediction, Scientific simulations, graphics, fluid dynamic calculations, Nuclear energy research, electronic engineering and analysis of geological data.

2. Which of these is the most costly?

Mainframe computer

Ans: C) Supercomputer

arrow-right

FAQs on Introduction to Data Representation

1. What is the distinction between the Hexadecimal and Octal Number System?

The octal number system is a base-8 number system in which the digits 0 through 7 are used to represent numbers. The hexadecimal number system is a base-16 number system that employs the digits 0 through 9 as well as the letters A through F to represent numbers.

2. What is the smallest data representation?

The smallest data storage unit in a computer's memory is called a BYTE, which comprises 8 BITS.

3. What is the largest data unit?

The largest commonly available data storage unit is a terabyte or TB. A terabyte equals 1,000 gigabytes, while a tebibyte equals 1,024 gibibytes.

Book cover

Computer Organisation and Architecture pp 10–25 Cite as

Data representation and computer arithmetic

  • B. S. Chalk ,
  • A. T. Carter &
  • R. W. Hind  

98 Accesses

Part of the book series: Grassroots ((GRASS))

Data is represented and stored in a computer using groups of binary digits called words . This chapter begins by describing binary codes and how words re used to represent characters. It then concentrates on the representation of positive and negative integers and how binary arithmetic is performed within the ALU. The chapter concludes with a discussion on the representation of real numbers and floating point arithmetic.

This is a preview of subscription content, log in via an institution .

Unable to display preview.  Download preview PDF.

You can also search for this author in PubMed   Google Scholar

Copyright information

© 2004 B.S. Chalk, A.T. Carter and R.W. Hind

About this chapter

Cite this chapter.

Chalk, B.S., Carter, A.T., Hind, R.W. (2004). Data representation and computer arithmetic. In: Computer Organisation and Architecture. Grassroots. Palgrave, London. https://doi.org/10.1007/978-0-230-00060-5_2

Download citation

DOI : https://doi.org/10.1007/978-0-230-00060-5_2

Publisher Name : Palgrave, London

Print ISBN : 978-1-4039-0164-4

Online ISBN : 978-0-230-00060-5

eBook Packages : Computer Science Computer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. 11 Computer Science-Data Representation -Notes

    data representation in computer architecture notes

  2. Data Representation

    data representation in computer architecture notes

  3. chapter2: data representation by COMPUTER SYSTEMS ARCHITECTURE

    data representation in computer architecture notes

  4. Data Representation and Computer Architecture

    data representation in computer architecture notes

  5. Data Representation

    data representation in computer architecture notes

  6. Organização do computador

    data representation in computer architecture notes

VIDEO

  1. DATA REPRESENTATION || Computer awareness|| By ARIHANT class 5th #jkssb #ssc #jkpsi #jkssbvlw

  2. MCS-12 (Computer Organization and Assembly Language Programming)Block01 Unit-2 DATA REPRESENTATION#4

  3. Representation of data in computer system #computer notes shorts #pw

  4. Chapter 13 Data Representation

  5. Data Representation

  6. A levels Computer science 9618

COMMENTS

  1. Data Representation

    Mantissa, Significand and fraction are synonymously used terms. In the computer, the representation is binary and the binary point is not fixed. For example, a number, say, 23.345 can be written as 2.3345 x 101 or 0.23345 x 102 or 2334.5 x 10-2. The representation 2.3345 x 101 is said to be in normalised form.

  2. Computer Organization and Architecture Tutorial

    Computer Organization and Architecture is used to design computer systems. Computer Architecture is considered to be those attributes of a system that are visible to the user like addressing techniques, instruction sets, and bits used for data, and have a direct impact on the logic execution of a program, It defines the system in an abstract manner, It deals with What does the system do.

  3. PDF COMP1005/1405 Notes 1

    COMP2401 - Chapter 2 - Data Representation Fall 2020 - 44 - 2.1 Number Representation and Bit Models All data stored in a computer must somehow be represented numerically in some way whether it is numerical to begin with, a series of characters or an image. Ultimately, everything digitally breaks down to ones and zeros.

  4. PDF Data Representation

    This information is called static data. static data 0000 stack FFFF • Each time you call a method, Java allocates a new block of memory called a stack frame to hold its local variables. These stack frames come from a region of memory called the stack. • Whenever you create a new object, Java allocates space from a pool of memory called the ...

  5. PDF Data Representation

    So, we'd want to represent -1 as: -1: 1111 1111 1111 1111. 2's Complement Observations. To negate an integer, with one exception*, just invert the bits and add 1. 25985: 0110 0101 1000 0001. -25985: 1001 1010 0111 1111. --25985: 0110 0101 1000 0001. The sign of the integer is indicated by the leading bit.

  6. PDF Lecture Notes on Data Representation

    L9.2 Data Representation The constructor for elements of recursive types is fold, while unfold destructs elements. 'e: [ˆ :˝= ]˝ 'folde: ˆ :˝ 'e: ˆ :˝ 'unfolde: [ˆ :˝= ]˝ This "unfolding" of the recursion seems like a strange operation, and it is. For example, for all other data constructors the components have a smaller

  7. PDF Data Representation

    Internal representations. Usually two states, which we interpret as 0 and 1. Volatile representations: Capacitor (DRAM) charged or not. Flip-flop circuit (SRAM) one of two output signals is high. Non-volatile representations: Region of a magnetized surface (hard disks, tape) positive or negative. Floating gate transistor (flash)

  8. Data representation 1: Introduction

    This is a hexadecimal (base-16) number indicating the value of the address of the object. A line contains one to sixteen bytes of memory starting at this address. The contents of memory starting at the given address, such as 3d 00 00 00. Memory is printed as a sequence of bytes, which are 8-bit numbers between 0 and 255.

  9. CHAPTER 2 Data Representation in Computer Systems

    CHAPTER2 Data Representation in Computer Systems There are 10 kinds of people in the world—those who understand binary and those who don't. —Anonymous 2.1 INTRODUCTION The organization of … - Selection from Essentials of Computer Organization and Architecture, 5th Edition [Book]

  10. PDF Digital Notes on Computer Organization and Architecture B.tech ...

    Functional blocks of a computer: CPU, memory, input-output subsystems, control unit. Computer Organization and Architecture - Von Neumann Data representation: signed number representation, fixed and floating point Representations, Character representation. Computer arithmetic - integer addition and Subtraction, Ripple carry adder,

  11. Data representation

    The first unit, data representation, is all about how different forms of data can be represented in terms the computer can understand. Bytes of memory. Computer memory is kind of like a Lite Brite. ... Since it is a 64-bit architecture, the size of an x86-64 address is 64 bits (8 bytes). That's also the size of x86-64 pointers.

  12. Computer Architecture Unit 1: Data Representation

    Unit 1: Data Representation. Explore a focused and insightful single note tailored to Computer Architecture. This concise note of Unit 1: Data Representation encapsulates key concepts, providing a targeted resource for understanding specific topics within the course. Ideal for quick reference and reinforcing your grasp on essential information.

  13. PDF Computer Organization: Data Representation

    Digital representation of data •Bits:All digital data are sequences of 0s and 1s (binary) •Amenable to high-low/on-off electromagnetism •Data types: First layer of abstraction to interpret a bit sequence with human-understandable category of information •Example data types: Boolean, byte, integer, floating

  14. Data Representation in Computer: Number Systems, Characters

    A computer uses a fixed number of bits to represent a piece of data which could be a number, a character, image, sound, video, etc. Data representation is the method used internally to represent data in a computer. Let us see how various types of data can be represented in computer memory. Before discussing data representation of numbers, let ...

  15. PDF Data Representation

    Signed Integers: 2's Complement Form. For non-negative integers, represent the value in base-2, using up to n - 1 bits, and pad to. 32 bits with leading 0's: 42: 101010 --> 0010 1010. For negative integers, take the base-2 representation of the value (ignoring the sign) pad with 0's to n - 1 bits, invert the bits and add 1:

  16. Data Representation

    We also cover the basics of digital circuits and logic gates, and explain how they are used to represent and process data in computer systems. Our guide includes real-world examples and case studies to help you master data representation principles and prepare for your computer science exams. Check out the links below:

  17. The Basic Computer Architecture and Data Representation

    The Basic Computer Architecture and Data Representation. (i) Convert the given hexadecimal number to its binary equivalent using 4 bits. (ii) For m group of 3 bits starting from least significant bit (LSB). (iii) Write the equivalent octal number for each group of 3 bits.

  18. PDF Chapter 1: Data Representation

    A bit is a 0 or 1 used in the digital representation of data. In digital computers, the user input is first converted and transmitted as electrical pulses that can be represented by two unique states ON and OFF. The ON state may be represented by a "1" and the off state by a "0".The sequence of ON'S and OFF'S forms the electrical ...

  19. Computer System Architecture Notes pdf

    22nd Apr 2024 - Computer system architecture notes free pdf download are provided so that students can prepare and score high marks. ... Data Representation and Basic Computer Arithmetic: Number system, complements, fixed, and floating-point representation. Alphanumeric representation. Addition, subtraction.

  20. Data Representation in Computer Organization

    Data can be anything like a number, a name, notes in a musical composition, or the color in a photograph. Data representation can be referred to as the form in which we stored the data, processed it and transmitted it. In order to store the data in digital format, we can use any device like computers, smartphones, and iPads.

  21. Introduction to Data Representation

    The way that we stored, processed, and transmitted data is referred to as data representation. We can use any device, including computers, smartphones, and iPads, to store data in digital format. The stored data is handled by electronic circuitry. A bit is a 0 or 1 used in digital data representation. Data Representation Techniques.

  22. Data representation and computer arithmetic

    Abstract. Data is represented and stored in a computer using groups of binary digits called words. This chapter begins by describing binary codes and how words re used to represent characters. It then concentrates on the representation of positive and negative integers and how binary arithmetic is performed within the ALU.

  23. PDF Department of Information Technology Computer Organization and Architecture

    Computer Organization and Architecture Page 3 Often called microarchitecture (low level) Computer architecture (a bit higher level) Transparent from programmer (ex. a programmer does not worry much how addition is implemented in hardware) Programmer view (i.e. Programmer has to be aware of which instruction set used)