Data Representation

Class 11 - computer science with python sumita arora, checkpoint 2.1.

What are the bases of decimal, octal, binary and hexadecimal systems ?

The bases are:

  • Decimal — Base 10
  • Octal — Base 8
  • Binary — Base 2
  • Hexadecimal — Base 16

What is the common property of decimal, octal, binary and hexadecimal number systems ?

Decimal, octal, binary and hexadecimal number systems are all positional-value system .

Complete the sequence of following binary numbers : 100, 101, 110, ............... , ............... , ............... .

100, 101, 110, 111 , 1000 , 1001 .

Complete the sequence of following octal numbers : 525, 526, 527, ............... , ............... , ............... .

525, 526, 527, 530 , 531 , 532 .

Complete the sequence of following hexadecimal numbers : 17, 18, 19, ............... , ............... , ............... .

17, 18, 19, 1A , 1B , 1C .

Convert the following binary numbers to decimal and hexadecimal:

(c) 101011111

(e) 10010101

(f) 11011100

Converting to decimal:

Equivalent decimal number = 8 + 2 = 10

Therefore, (1010) 2 = (10) 10

Converting to hexadecimal:

Grouping in bits of 4:

1010 undefined \underlinesegment{1010} 1010 ​

Therefore, (1010) 2 = (A) 16

Equivalent decimal number = 32 + 16 + 8 + 2 = 58

Therefore, (111010) 2 = (58) 10

0011 undefined 1010 undefined \underlinesegment{0011} \quad \underlinesegment{1010} 0011 ​ 1010 ​

Therefore, (111010) 2 = (3A) 16

Equivalent decimal number = 256 + 64 + 16 + 8 + 4 + 2 + 1 = 351

Therefore, (101011111) 2 = (351) 10

0001 undefined 0101 undefined 1111 undefined \underlinesegment{0001} \quad \underlinesegment{0101} \quad \underlinesegment{1111} 0001 ​ 0101 ​ 1111 ​

Therefore, (101011111) 2 = (15F) 16

Equivalent decimal number = 8 + 4 = 12

Therefore, (1100) 2 = (12) 10

1100 undefined \underlinesegment{1100} 1100 ​

Therefore, (1100) 2 = (C) 16

Equivalent decimal number = 1 + 4 + 16 + 128 = 149

Therefore, (10010101) 2 = (149) 10

1001 undefined 0101 undefined \underlinesegment{1001} \quad \underlinesegment{0101} 1001 ​ 0101 ​

Therefore, (101011111) 2 = (95) 16

Equivalent decimal number = 4 + 8 + 16 + 64 + 128 = 220

Therefore, (11011100) 2 = (220) 10

1101 undefined 1100 undefined \underlinesegment{1101} \quad \underlinesegment{1100} 1101 ​ 1100 ​

Therefore, (11011100) 2 = (DC) 16

Convert the following decimal numbers to binary and octal :

Converting to binary:

Therefore, (23) 10 = (10111) 2

Converting to octal:

Therefore, (23) 10 = (27) 8

Therefore, (100) 10 = (1100100) 2

Therefore, (100) 10 = (144) 8

Therefore, (145) 10 = (10010001) 2

Therefore, (145) 10 = (221) 8

Therefore, (19) 10 = (10011) 2

Therefore, (19) 10 = (23) 8

Therefore, (121) 10 = (1111001) 2

Therefore, (121) 10 = (171) 8

Therefore, (161) 10 = (10100001) 2

Therefore, (161) 10 = (241) 8

Convert the following hexadecimal numbers to binary :

(A6) 16 = (10100110) 2

(A07) 16 = (101000000111) 2

(7AB4) 16 = (111101010110100) 2

(BE) 16 = (10111110) 2

(BC9) 16 = (101111001001) 2

(9BC8) 16 = (1001101111001000) 2

Convert the following binary numbers to hexadecimal and octal :

(a) 10011011101

(b) 1111011101011011

(c) 11010111010111

(d) 1010110110111

(e) 10110111011011

(f) 1111101110101111

0100 undefined 1101 undefined 1101 undefined \underlinesegment{0100} \quad \underlinesegment{1101} \quad \underlinesegment{1101} 0100 ​ 1101 ​ 1101 ​

Therefore, (10011011101) 2 = (4DD) 16

Converting to Octal:

Grouping in bits of 3:

010 undefined 011 undefined 011 undefined 101 undefined \underlinesegment{010} \quad \underlinesegment{011} \quad \underlinesegment{011} \quad \underlinesegment{101} 010 ​ 011 ​ 011 ​ 101 ​

Therefore, (10011011101) 2 = (2335) 8

1111 undefined 0111 undefined 0101 undefined 1011 undefined \underlinesegment{1111} \quad \underlinesegment{0111} \quad \underlinesegment{0101} \quad \underlinesegment{1011} 1111 ​ 0111 ​ 0101 ​ 1011 ​

Therefore, (1111011101011011) 2 = (F75B) 16

001 undefined 111 undefined 011 undefined 101 undefined 011 undefined 011 undefined \underlinesegment{001} \quad \underlinesegment{111} \quad \underlinesegment{011} \quad \underlinesegment{101} \quad \underlinesegment{011} \quad \underlinesegment{011} 001 ​ 111 ​ 011 ​ 101 ​ 011 ​ 011 ​

Therefore, (1111011101011011) 2 = (173533) 8

0011 undefined 0101 undefined 1101 undefined 0111 undefined \underlinesegment{0011} \quad \underlinesegment{0101} \quad \underlinesegment{1101} \quad \underlinesegment{0111} 0011 ​ 0101 ​ 1101 ​ 0111 ​

Therefore, (11010111010111) 2 = (35D7) 16

011 undefined 010 undefined 111 undefined 010 undefined 111 undefined \underlinesegment{011} \quad \underlinesegment{010} \quad \underlinesegment{111} \quad \underlinesegment{010} \quad \underlinesegment{111} 011 ​ 010 ​ 111 ​ 010 ​ 111 ​

Therefore, (11010111010111) 2 = (32727) 8

0001 undefined 0101 undefined 1011 undefined 0111 undefined \underlinesegment{0001} \quad \underlinesegment{0101} \quad \underlinesegment{1011} \quad \underlinesegment{0111} 0001 ​ 0101 ​ 1011 ​ 0111 ​

Therefore, (1010110110111) 2 = (15B7) 16

001 undefined 010 undefined 110 undefined 110 undefined 111 undefined \underlinesegment{001} \quad \underlinesegment{010} \quad \underlinesegment{110} \quad \underlinesegment{110} \quad \underlinesegment{111} 001 ​ 010 ​ 110 ​ 110 ​ 111 ​

Therefore, (1010110110111) 2 = (12667) 8

0010 undefined 1101 undefined 1101 undefined 1011 undefined \underlinesegment{0010} \quad \underlinesegment{1101} \quad \underlinesegment{1101} \quad \underlinesegment{1011} 0010 ​ 1101 ​ 1101 ​ 1011 ​

Therefore, (10110111011011) 2 = (2DDB) 16

010 undefined 110 undefined 111 undefined 011 undefined 011 undefined \underlinesegment{010} \quad \underlinesegment{110} \quad \underlinesegment{111} \quad \underlinesegment{011} \quad \underlinesegment{011} 010 ​ 110 ​ 111 ​ 011 ​ 011 ​

Therefore, (10110111011011) 2 = (26733) 8

1111 undefined 1011 undefined 1010 undefined 1111 undefined \underlinesegment{1111} \quad \underlinesegment{1011} \quad \underlinesegment{1010} \quad \underlinesegment{1111} 1111 ​ 1011 ​ 1010 ​ 1111 ​

Therefore, (1111101110101111) 2 = (FBAF) 16

001 undefined 111 undefined 101 undefined 110 undefined 101 undefined 111 undefined \underlinesegment{001} \quad \underlinesegment{111} \quad \underlinesegment{101} \quad \underlinesegment{110} \quad \underlinesegment{101} \quad \underlinesegment{111} 001 ​ 111 ​ 101 ​ 110 ​ 101 ​ 111 ​

Therefore, (1111101110101111) 2 = (175657) 8

Checkpoint 2.2

Multiple choice questions.

The value of radix in binary number system is ..........

The value of radix in octal number system is ..........

The value of radix in decimal number system is ..........

The value of radix in hexadecimal number system is ..........

Which of the following are not valid symbols in octal number system ?

Which of the following are not valid symbols in hexadecimal number system ?

Which of the following are not valid symbols in decimal number system ?

The hexadecimal digits are 1 to 0 and A to ..........

The binary equivalent of the decimal number 10 is ..........

Question 10

ASCII code is a 7 bit code for ..........

  • other symbol
  • all of these ✓

Question 11

How many bytes are there in 1011 1001 0110 1110 numbers?

Question 12

The binary equivalent of the octal Numbers 13.54 is.....

  • 1101.1110 ✓
  • None of these

Question 13

The octal equivalent of 111 010 is.....

Question 14

The input hexadecimal representation of 1110 is ..........

Question 15

Which of the following is not a binary number ?

Question 16

Convert the hexadecimal number 2C to decimal:

Question 17

UTF8 is a type of .......... encoding.

  • extended ASCII

Question 18

UTF32 is a type of .......... encoding.

Question 19

Which of the following is not a valid UTF8 representation?

  • 2 octet (16 bits)
  • 3 octet (24 bits)
  • 4 octet (32 bits)
  • 8 octet (64 bits) ✓

Question 20

Which of the following is not a valid encoding scheme for characters ?

Fill in the Blanks

The Decimal number system is composed of 10 unique symbols.

The Binary number system is composed of 2 unique symbols.

The Octal number system is composed of 8 unique symbols.

The Hexadecimal number system is composed of 16 unique symbols.

The illegal digits of octal number system are 8 and 9 .

Hexadecimal number system recognizes symbols 0 to 9 and A to F .

Each octal number is replaced with 3 bits in octal to binary conversion.

Each Hexadecimal number is replaced with 4 bits in Hex to binary conversion.

ASCII is a 7 bit code while extended ASCII is a 8 bit code.

The Unicode encoding scheme can represent all symbols/characters of most languages.

The ISCII encoding scheme represents Indian Languages' characters on computers.

UTF8 can take upto 4 bytes to represent a symbol.

UTF32 takes exactly 4 bytes to represent a symbol.

Unicode value of a symbol is called code point .

True/False Questions

A computer can work with Decimal number system. False

A computer can work with Binary number system. True

The number of unique symbols in Hexadecimal number system is 15. False

Number systems can also represent characters. False

ISCII is an encoding scheme created for Indian language characters. True

Unicode is able to represent nearly all languages' characters. True

UTF8 is a fixed-length encoding scheme. False

UTF32 is a fixed-length encoding scheme. True

UTF8 is a variable-length encoding scheme and can represent characters in 1 through 4 bytes. True

UTF8 and UTF32 are the only encoding schemes supported by Unicode. False

Type A: Short Answer Questions

What are some number systems used by computers ?

The most commonly used number systems are decimal, binary, octal and hexadecimal number systems.

What is the use of Hexadecimal number system on computers ?

The Hexadecimal number system is used in computers to specify memory addresses (which are 16-bit or 32-bit long). For example, a memory address 1101011010101111 is a big binary address but with hex it is D6AF which is easier to remember. The Hexadecimal number system is also used to represent colour codes. For example, FFFFFF represents White, FF0000 represents Red, etc.

What does radix or base signify ?

The radix or base of a number system signifies how many unique symbols or digits are used in the number system to represent numbers. For example, the decimal number system has a radix or base of 10 meaning it uses 10 digits from 0 to 9 to represent numbers.

What is the use of encoding schemes ?

Encoding schemes help Computers represent and recognize letters, numbers and symbols. It provides a predetermined set of codes for each recognized letter, number and symbol. Most popular encoding schemes are ASCI, Unicode, ISCII, etc.

Discuss UTF-8 encoding scheme.

UTF-8 is a variable width encoding that can represent every character in Unicode character set. The code unit of UTF-8 is 8 bits called an octet. It uses 1 to maximum 6 octets to represent code points depending on their size i.e. sometimes it uses 8 bits to store the character, other times 16 or 24 or more bits. It is a type of multi-byte encoding.

How is UTF-8 encoding scheme different from UTF-32 encoding scheme ?

UTF-8 is a variable length encoding scheme that uses different number of bytes to represent different characters whereas UTF-32 is a fixed length encoding scheme that uses exactly 4 bytes to represent all Unicode code points.

What is the most significant bit and the least significant bit in a binary code ?

In a binary code, the leftmost bit is called the most significant bit or MSB. It carries the largest weight. The rightmost bit is called the least significant bit or LSB. It carries the smallest weight. For example:

1 M S B 0 1 1 0 1 1 0 L S B \begin{matrix} \underset{\bold{MSB}}{1} & 0 & 1 & 1 & 0 & 1 & 1 & \underset{\bold{LSB}}{0} \end{matrix} MSB 1 ​ ​ 0 ​ 1 ​ 1 ​ 0 ​ 1 ​ 1 ​ LSB 0 ​ ​

What are ASCII and extended ASCII encoding schemes ?

ASCII encoding scheme uses a 7-bit code and it represents 128 characters. Its advantages are simplicity and efficiency. Extended ASCII encoding scheme uses a 8-bit code and it represents 256 characters.

What is the utility of ISCII encoding scheme ?

ISCII or Indian Standard Code for Information Interchange can be used to represent Indian languages on the computer. It supports Indian languages that follow both Devanagari script and other scripts like Tamil, Bengali, Oriya, Assamese, etc.

What is Unicode ? What is its significance ?

Unicode is a universal character encoding scheme that can represent different sets of characters belonging to different languages by assigning a number to each of the character. It has the following significance:

  • It defines all the characters needed for writing the majority of known languages in use today across the world.
  • It is a superset of all other character sets.
  • It is used to represent characters across different platforms and programs.

What all encoding schemes does Unicode use to represent characters ?

Unicode uses UTF-8, UTF-16 and UTF-32 encoding schemes.

What are ASCII and ISCII ? Why are these used ?

ASCII stands for American Standard Code for Information Interchange. It uses a 7-bit code and it can represent 128 characters. ASCII code is mostly used to represent the characters of English language, standard keyboard characters as well as control characters like Carriage Return and Form Feed. ISCII stands for Indian Standard Code for Information Interchange. It uses a 8-bit code and it can represent 256 characters. It retains all ASCII characters and offers coding for Indian scripts also. Majority of the Indian languages can be represented using ISCII.

What are UTF-8 and UTF-32 encoding schemes. Which one is more popular encoding scheme ?

UTF-8 is a variable length encoding scheme that uses different number of bytes to represent different characters whereas UTF-32 is a fixed length encoding scheme that uses exactly 4 bytes to represent all Unicode code points. UTF-8 is the more popular encoding scheme.

What do you understand by code point ?

Code point refers to a code from a code space that represents a single character from the character set represented by an encoding scheme. For example, 0x41 is one code point of ASCII that represents character 'A'.

What is the difference between fixed length and variable length encoding schemes ?

Variable length encoding scheme uses different number of bytes or octets (set of 8 bits) to represent different characters whereas fixed length encoding scheme uses a fixed number of bytes to represent different characters.

Type B: Application Based Questions

Convert the following binary numbers to decimal:

Equivalent decimal number = 1 + 4 + 8 = 13

Therefore, (1101) 2 = (13) 10

Equivalent decimal number = 2 + 8 + 16 + 32 = 58

Equivalent decimal number = 1 + 2 + 4 + 8 + 16 + 64 + 256 = 351

Convert the following binary numbers to decimal :

Equivalent decimal number = 4 + 8 = 12

(b) 10010101

(c) 11011100

Convert the following decimal numbers to binary:

Therefore, (0.25) 10 = (0.01) 2

Therefore, (122) 10 = (1111010) 2

(We stop after 5 iterations if fractional part doesn't become 0)

Therefore, (0.675) 10 = (0.10101) 2

Convert the following decimal numbers to octal:

Therefore, (122) 10 = (172) 8

Therefore, (0.675) 10 = (0.53146) 8

Convert the following hexadecimal numbers to binary:

(23D) 16 = (1000111101) 2

Convert the following binary numbers to hexadecimal:

(a) 1010110110111

(b) 10110111011011

(c) 0110101100

0001 undefined 1010 undefined 1100 undefined \underlinesegment{0001} \quad \underlinesegment{1010} \quad \underlinesegment{1100} 0001 ​ 1010 ​ 1100 ​

Therefore, (0110101100) 2 = (1AC) 16

Convert the following octal numbers to decimal:

Equivalent decimal number = 7 + 40 + 128 = 175

Therefore, (257) 8 = (175) 10

Equivalent decimal number = 7 + 16 + 320 + 1536 = 1879

Therefore, (3527) 8 = (1879) 10

Equivalent decimal number = 3 + 16 + 64 = 83

Therefore, (123) 8 = (83) 10

Integral part

Fractional part.

Equivalent decimal number = 5 + 384 + 0.125 + 0.0312 = 389.1562

Therefore, (605.12) 8 = (389.1562) 10

Convert the following hexadecimal numbers to decimal:

Equivalent decimal number = 6 + 160 = 166

Therefore, (A6) 16 = (166) 10

Equivalent decimal number = 11 + 48 + 256 + 40960 = 41275

Therefore, (A13B) 16 = (41275) 10

Equivalent decimal number = 5 + 160 + 768 = 933

Therefore, (3A5) 16 = (933) 10

Equivalent decimal number = 9 + 224 = 233

Therefore, (E9) 16 = (233) 10

Equivalent decimal number = 3 + 160 + 3072 + 28672 = 31907

Therefore, (7CA3) 16 = (31907) 10

Convert the following decimal numbers to hexadecimal:

Therefore, (132) 10 = (84) 16

Therefore, (2352) 10 = (930) 16

Therefore, (122) 10 = (7A) 16

Therefore, (0.675) 10 = (0.ACCCC) 16

Therefore, (206) 10 = (CE) 16

Therefore, (3619) 10 = (E23) 16

Convert the following hexadecimal numbers to octal:

(38AC) 16 = (11100010101100) 2

011 undefined   100 undefined   010 undefined   101 undefined   100 undefined \underlinesegment{011}\medspace\underlinesegment{100}\medspace\underlinesegment{010}\medspace\underlinesegment{101}\medspace\underlinesegment{100} 011 ​ 100 ​ 010 ​ 101 ​ 100 ​

(38AC) 16 = (34254) 8

(7FD6) 16 = (111111111010110) 2

111 undefined   111 undefined   111 undefined   010 undefined   110 undefined \underlinesegment{111}\medspace\underlinesegment{111}\medspace\underlinesegment{111}\medspace\underlinesegment{010}\medspace\underlinesegment{110} 111 ​ 111 ​ 111 ​ 010 ​ 110 ​

(7FD6) 16 = (77726) 8

(ABCD) 16 = (1010101111001101) 2

001 undefined   010 undefined   101 undefined   111 undefined   001 undefined   101 undefined \underlinesegment{001}\medspace\underlinesegment{010}\medspace\underlinesegment{101}\medspace\underlinesegment{111}\medspace\underlinesegment{001}\medspace\underlinesegment{101} 001 ​ 010 ​ 101 ​ 111 ​ 001 ​ 101 ​

(ABCD) 16 = (125715) 8

Convert the following octal numbers to binary:

Therefore, (123) 8 = ( 001 undefined   010 undefined   011 undefined \bold{\underlinesegment{001}}\medspace\bold{\underlinesegment{010}}\medspace\bold{\underlinesegment{011}} 001 ​ 010 ​ 011 ​ ) 2

Therefore, (3527) 8 = ( 011 undefined   101 undefined   010 undefined   111 undefined \bold{\underlinesegment{011}}\medspace\bold{\underlinesegment{101}}\medspace\bold{\underlinesegment{010}}\medspace\bold{\underlinesegment{111}} 011 ​ 101 ​ 010 ​ 111 ​ ) 2

Therefore, (705) 8 = ( 111 undefined   000 undefined   101 undefined \bold{\underlinesegment{111}}\medspace\bold{\underlinesegment{000}}\medspace\bold{\underlinesegment{101}} 111 ​ 000 ​ 101 ​ ) 2

Therefore, (7642) 8 = ( 111 undefined   110 undefined   100 undefined   010 undefined \bold{\underlinesegment{111}}\medspace\bold{\underlinesegment{110}}\medspace\bold{\underlinesegment{100}}\medspace\bold{\underlinesegment{010}} 111 ​ 110 ​ 100 ​ 010 ​ ) 2

Therefore, (7015) 8 = ( 111 undefined   000 undefined   001 undefined   101 undefined \bold{\underlinesegment{111}}\medspace\bold{\underlinesegment{000}}\medspace\bold{\underlinesegment{001}}\medspace\bold{\underlinesegment{101}} 111 ​ 000 ​ 001 ​ 101 ​ ) 2

Therefore, (3576) 8 = ( 011 undefined   101 undefined   111 undefined   110 undefined \bold{\underlinesegment{011}}\medspace\bold{\underlinesegment{101}}\medspace\bold{\underlinesegment{111}}\medspace\bold{\underlinesegment{110}} 011 ​ 101 ​ 111 ​ 110 ​ ) 2

Convert the following binary numbers to octal

111 undefined 010 undefined \underlinesegment{111} \quad \underlinesegment{010} 111 ​ 010 ​

Therefore, (111010) 2 = (72) 8

(b) 110110101

110 undefined 110 undefined 101 undefined \underlinesegment{110} \quad \underlinesegment{110} \quad \underlinesegment{101} 110 ​ 110 ​ 101 ​

Therefore, (110110101) 2 = (665) 8

(c) 1101100001

001 undefined 101 undefined 100 undefined 001 undefined \underlinesegment{001} \quad \underlinesegment{101} \quad \underlinesegment{100} \quad \underlinesegment{001} 001 ​ 101 ​ 100 ​ 001 ​

Therefore, (1101100001) 2 = (1541) 8

011 undefined 001 undefined \underlinesegment{011} \quad \underlinesegment{001} 011 ​ 001 ​

Therefore, (11001) 2 = (31) 8

(b) 10101100

010 undefined 101 undefined 100 undefined \underlinesegment{010} \quad \underlinesegment{101} \quad \underlinesegment{100} 010 ​ 101 ​ 100 ​

Therefore, (10101100) 2 = (254) 8

(c) 111010111

111 undefined 010 undefined 111 undefined \underlinesegment{111} \quad \underlinesegment{010} \quad \underlinesegment{111} 111 ​ 010 ​ 111 ​

Therefore, (111010111) 2 = (727) 8

Add the following binary numbers:

(i) 10110111 and 1100101

1 1 0 1 1 1 0 1 1 1 1 1 1 + 1 1 0 0 1 0 1 1 0 0 0 1 1 1 0 0 \begin{matrix} & & \overset{1}{1} & \overset{1}{0} & 1 & 1 & \overset{1}{0} & \overset{1}{1} & \overset{1}{1} & 1 \\ + & & & 1 & 1 & 0 & 0 & 1 & 0 & 1 \\ \hline & \bold{1} & \bold{0} & \bold{0} & \bold{0} & \bold{1} & \bold{1} & \bold{1} & \bold{0} & \bold{0} \end{matrix} + ​ 1 ​ 1 1 0 ​ 0 1 1 0 ​ 1 1 0 ​ 1 0 1 ​ 0 1 0 1 ​ 1 1 1 1 ​ 1 1 0 0 ​ 1 1 0 ​ ​

Therefore, (10110111) 2 + (1100101) 2 = (100011100) 2

(ii) 110101 and 101111

1 1 1 1 0 1 1 1 0 1 1 + 1 0 1 1 1 1 1 1 0 0 1 0 0 \begin{matrix} & & \overset{1}{1} & \overset{1}{1} & \overset{1}{0} & \overset{1}{1} & \overset{1}{0} & 1 \\ + & & 1 & 0 & 1 & 1 & 1 & 1 \\ \hline & \bold{1} & \bold{1} & \bold{0} & \bold{0} & \bold{1} & \bold{0} & \bold{0} \end{matrix} + ​ 1 ​ 1 1 1 1 ​ 1 1 0 0 ​ 0 1 1 0 ​ 1 1 1 1 ​ 0 1 1 0 ​ 1 1 0 ​ ​

Therefore, (110101) 2 + (101111) 2 = (1100100) 2

(iii) 110111.110 and 11011101.010

0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 . 1 1 1 0 + 1 1 0 1 1 1 0 1 . 0 1 0 1 0 0 0 1 0 1 0 1 . 0 0 0 \begin{matrix} & & \overset{1}{0} & \overset{1}{0} & \overset{1}{1} & \overset{1}{1} & \overset{1}{0} & \overset{1}{1} & \overset{1}{1} & \overset{1}{1} & . & \overset{1}{1} & 1 & 0 \\ + & & 1 & 1 & 0 & 1 & 1 & 1 & 0 & 1 & . & 0 & 1 & 0 \\ \hline & \bold{1} & \bold{0} & \bold{0} & \bold{0} & \bold{1} & \bold{0} & \bold{1} & \bold{0} & \bold{1} & \bold{.} & \bold{0} & \bold{0} & \bold{0} \end{matrix} + ​ 1 ​ 0 1 1 0 ​ 0 1 1 0 ​ 1 1 0 0 ​ 1 1 1 1 ​ 0 1 1 0 ​ 1 1 1 1 ​ 1 1 0 0 ​ 1 1 1 1 ​ . . . ​ 1 1 0 0 ​ 1 1 0 ​ 0 0 0 ​ ​

Therefore, (110111.110) 2 + (11011101.010) 2 = (100010101) 2

(iv) 1110.110 and 11010.011

0 1 1 1 1 1 1 0 1 . 1 1 1 0 + 1 1 0 1 0 . 0 1 1 1 0 1 0 0 1 . 0 0 1 \begin{matrix} & & \overset{1}{0} & \overset{1}{1} & \overset{1}{1} & 1 & \overset{1}{0} & . & \overset{1}{1} & 1 & 0 \\ + & & 1 & 1 & 0 & 1 & 0 & . & 0 & 1 & 1 \\ \hline & \bold{1} & \bold{0} & \bold{1} & \bold{0} & \bold{0} & \bold{1} & \bold{.} & \bold{0} & \bold{0} & \bold{1} \end{matrix} + ​ 1 ​ 0 1 1 0 ​ 1 1 1 1 ​ 1 1 0 0 ​ 1 1 0 ​ 0 1 0 1 ​ . . . ​ 1 1 0 0 ​ 1 1 0 ​ 0 1 1 ​ ​

Therefore, (1110.110) 2 + (11010.011) 2 = (101001.001) 2

Question 21

Given that A's code point in ASCII is 65, and a's code point is 97. What is the binary representation of 'A' in ASCII ? (and what's its hexadecimal representation). What is the binary representation of 'a' in ASCII ?

Binary representation of 'A' in ASCII will be binary representation of its code point 65.

Converting 65 to binary:

Therefore, binary representation of 'A' in ASCII is 1000001.

Converting 65 to Hexadecimal:

Therefore, hexadecimal representation of 'A' in ASCII is (41) 16 .

Similarly, converting 97 to binary:

Therefore, binary representation of 'a' in ASCII is 1100001.

Question 22

Convert the following binary numbers to decimal, octal and hexadecimal numbers.

(i) 100101.101

Decimal Conversion of integral part:

Decimal Conversion of fractional part:

Equivalent decimal number = 1 + 4 + 32 + 0.5 + 0.125 = 37.625

Therefore, (100101.101) 2 = (37.625) 10

Octal Conversion

100 undefined 101 undefined . 101 undefined \underlinesegment{100} \quad \underlinesegment{101} \quad \bold{.} \quad \underlinesegment{101} 100 ​ 101 ​ . 101 ​

Therefore, (100101.101) 2 = (45.5) 8

Hexadecimal Conversion

0010 undefined 0101 undefined   .   1010 undefined \underlinesegment{0010} \quad \underlinesegment{0101} \medspace . \medspace \underlinesegment{1010} 0010 ​ 0101 ​ . 1010 ​

Therefore, (100101.101) 2 = (25.A) 16

(ii) 10101100.01011

Equivalent decimal number = 4 + 8 + 32 + 128 + 0.25 + 0.0625 + 0.03125 = 172.34375

Therefore, (10101100.01011) 2 = (172.34375) 10

010 undefined 101 undefined 100 undefined . 010 undefined 110 undefined \underlinesegment{010} \quad \underlinesegment{101} \quad \underlinesegment{100} \quad \bold{.} \quad \underlinesegment{010} \quad \underlinesegment{110} 010 ​ 101 ​ 100 ​ . 010 ​ 110 ​

Therefore, (10101100.01011) 2 = (254.26) 8

1010 undefined 1100 undefined   .   0101 undefined   1000 undefined \underlinesegment{1010} \quad \underlinesegment{1100} \medspace . \medspace \underlinesegment{0101} \medspace \underlinesegment{1000} 1010 ​ 1100 ​ . 0101 ​ 1000 ​

Therefore, (10101100.01011) 2 = (AC.58) 16

Decimal Conversion:

Equivalent decimal number = 2 + 8 = 10

001 undefined 010 undefined \underlinesegment{001} \quad \underlinesegment{010} 001 ​ 010 ​

Therefore, (1010) 2 = (12) 8

(iv) 10101100.010111

Equivalent decimal number = 4 + 8 + 32 + 128 + 0.25 + 0.0625 + 0.03125 + 0.015625 = 172.359375

Therefore, (10101100.010111) 2 = (172.359375) 10

010 undefined 101 undefined 100 undefined . 010 undefined 111 undefined \underlinesegment{010} \quad \underlinesegment{101} \quad \underlinesegment{100} \quad \bold{.} \quad \underlinesegment{010} \quad \underlinesegment{111} 010 ​ 101 ​ 100 ​ . 010 ​ 111 ​

Therefore, (10101100.010111) 2 = (254.27) 8

1010 undefined 1100 undefined   .   0101 undefined   1100 undefined \underlinesegment{1010} \quad \underlinesegment{1100} \medspace . \medspace \underlinesegment{0101} \medspace \underlinesegment{1100} 1010 ​ 1100 ​ . 0101 ​ 1100 ​

Therefore, (10101100.010111) 2 = (AC.5C) 16

data representation computer science questions

Data representation exercises

Exercises not as directly relevant to this year’s class are marked with ⚠️.

DATAREP-1. Sizes and alignments

QUESTION DATAREP-1A. True or false: For any non-array type X, the size of X ( sizeof(X) ) is greater than or equal to the alignment of type X ( alignof(X) ).

True. This also mostly true for arrays. The exception is zero-length arrays: sizeof(X[0]) == 0 , but alignof(X[0]) == alignof(X) .

QUESTION DATAREP-1B. True or false: For any type X, the size of struct Y { X a; char newc; } is greater than the size of X.

QUESTION DATAREP-1C. True or false: For any types A1 ... An (with n ≥ 1), the size of struct Y is greater than the size of struct X , given:

struct X { A1 a1; ... An an; };
struct Y { A1 a1; ... An an; char newc; };
False (example: A1 = int , A2 = char )

QUESTION DATAREP-1D. True or false: For any types A1 ... An (with n ≥ 1), the size of struct Y is greater than the size of union X , given:

union X { A1 a1; ... An an; };
struct Y { A1 a1; ... An an; };
False (if n = 1 )

QUESTION DATAREP-1E. Assume that structure struct Y { ... } contains K char members and M int members, with K ≤ M , and nothing else. Write an expression defining the maximum sizeof(struct Y) .

QUESTION DATAREP-1F. You are given a structure struct Z { T1 a; T2 b; T3 c; } that contains no padding. What does (sizeof(T1) + sizeof(T2) + sizeof(T3)) % alignof(struct Z) equal?

QUESTION DATAREP-1G. Arrange the following types in increasing order by size. Sample answer: “1 < 2 = 4 < 3” (choose this if #1 has smaller size than #2, which has equal size to #4, which has smaller size than #3).

  • struct minipoint { uint8_t x; uint8_t y; uint8_t z; }
  • unsigned short[1]
#6 < #1 < #4 < #2 < #3 < #5

DATAREP-2. Expressions

QUESTION DATAREP-2A. Here are eight expressions. Group the expressions into four pairs so that the two expressions in each pair have the same value, and each pair has a different value from every other pair. There is one unique answer that meets these constraints. m has the same type and value everywhere it appears (there’s one unique value for m that meets the problem’s constraints). Assume an x86-32 machine: a 32-bit architecture in which pointers are 32 bits long.

  • sizeof(&m)
  • 16 >> 2
1—5; 2—7; 3—8; 4—6 1—5 is easy. m + ~m + 1 == m + (-m) == 0 , and m & ~m == 0 , giving us 3—8. Now what about the others? m & -m (#3) is either 0 or a power of 2, so it cannot be -1 (#2). The remaining possiblities are m and 1 . If (m & -m) == m , then the remaining pair would be 1 and -1, which clearly doesn’t work. Thus m & -m matches with 1, and m == -1 .

DATAREP-3. Hello binary

This problem locates 8-bit numbers horizontally and vertically in the following 16x16 image. Black pixels represent 1 bits and white pixels represent 0 bits. For horizontal arrangements, the most significant bit is on the left as usual. For vertical arrangements, the most significant bit is on top.

Examples : The 8-bit number 15 (hexadecimal 0x0F, binary 0b00001111) is located horizontally at 3,4, which means X=3, Y=4.

  • The pixel at 3,4 is white, which has bit value 0.
  • 4,4 is white, also 0.
  • 5,4 is white, also 0.
  • 6,4 is white, also 0.
  • 7,4 is black, which has bit value 1.
  • 8,4, 9,4, and 10,4 are black, giving three more 1s.
  • Reading them all off, this is 0b00001111, or 15.

15 is also located horizontally at 7,6.

The 8-bit number 0 is located vertically at 0,0. It is also located horizontally at 0,0 and 1,0.

The 8-bit number 134 (hexadecimal 0x86, binary 0b10000110) is located vertically at 8,4.

QUESTION DATAREP-3A. Where is 3 located vertically? (All questions refer to 8-bit numbers.)

QUESTION DATAREP-3B. Where is 12 located horizontally?

QUESTION DATAREP-3C. Where is 255 located vertically?

DATAREP-4. Hello memory

Shintaro Tsuji wants to represent the image of Question DATAREP-3 in computer memory. He stores it in an array of 16-bit unsigned integers:

Row Y of the image is stored in integer cute[Y] .

QUESTION DATAREP-4A. What is sizeof(cute) , 2, 16, 32, or 64?

QUESTION DATAREP-4B. printf("%d\n", cute[0]); prints 16384 . Is Shintaro’s machine big-endian or little-endian?

Little-endian

DATAREP-5. Hello program

Now that Shintaro has represented the image in memory as an array of unsigned short objects, he can manipulate the image using C. For example, here’s a function.

Running swap produces the following image:

Shintaro has written several other functions. Here are some images (A is the original):

For each function, what image does that function create?

QUESTION DATAREP-5A.

H. The code flips all bits in the input.

QUESTION DATAREP-5B.

QUESTION DATAREP-5C.

The following programs generated the other images. Can you match them with their images?

f3 —I; f4 —B; f5 —C; f6 —F; f7 —G; f8 —A; f9 —E

DATAREP-6. Memory regions

Consider the following program:

This program allocates objects a through g on the heap and then stores those pointers in some stack and global variables. We recommend you draw a picture of the state setup creates.

QUESTION DATAREP-6A. Assume that (uintptr_t) a == 0x8300000 , and that malloc returns increasing addresses. Match each address to the most likely expression with that address value. The expressions are evaluated within the context of main . You will not reuse an expression.

1—D; 2—C; 3—E; 4—B; 5—A Since p has automatic storage duration, it is located on the stack, giving us 5—A. The global variable has static storage duration, and so does its component global.y ; so the pointer &global.y has an address that is below all heap-allocated pointers. This gives us 2—C. The remaining expressions go like this: global.y == e ; p.y == &e[1] , so *p.y == e[1] == (int) &d[100000] , and (int *) *p.y == &d[100000] ; p.x == g , so p.x[0] == g[0] == *g == c , and *p.x[0] == *c == (int) a . Address #4 has value 0x8300000, which by assumption is a ’s address; so 4—B. Address #3 is much larger than the other heap addresses, so 3—E. This leaves 1—D.

DATAREP-7. ⚠️ Garbage collection ⚠️

⚠️ Here is the top-level function for the conservative garbage collector we wrote in class.

This garbage collector is not correct because it doesn’t capture all memory roots.

Consider the program from the previous section, and assume that an object is reachable if do_stuff can access an address within the object via variable references and memory dereferences without casts or pointer arithmetic . Then:

QUESTION DATAREP-7A. Which reachable objects will m61_collect() free? Circle all that apply.

b , f . The collector searches the stack for roots. This yields just the values in struct ptrs p (the only pointer-containing variable with automatic storage duration at the time m61_collect is called). The objects directly pointed to by p are g and e . The collector then recursively marks objects pointed to by these objects. From g , it finds c . From e , it finds nothing. Then it checks one more time. From c , it finds the value of a ! Now, a is actually not a pointer here—the type of *c is int —so by the definition above, a is not actually reachable. But the collector doesn’t know this. Putting it together, the collector marks a , c , e , and g . It won’t free these objects; it will free the others ( b , d , and f ). But b and f are reachable from global .

QUESTION DATAREP-7B. Which unreachable objects will m61_collect() not free? Circle all that apply.

QUESTION DATAREP-7C. Conservative garbage collection in C is often slower than precise garbage collection in languages such as Java. Why? Circle all that apply.

  • C is generally slower than other languages.
  • Conservative garbage collectors must search all reachable memory for pointers. Precise garbage collectors can ignore values that do not contain pointers, such as large character buffers.
  • C programs generally use the heap more than programs in other languages.
  • None of the above.

DATAREP-8. Memory errors

The following function constructs and returns a lower-triangular matrix of size N . The elements are random 2-dimensional points in the unit square. The matrix is represented as an array of pointers to arrays.

This code is running on an x86- 32 machine ( size_t is 32 bits , not 64). You may assume that the machine has enough free physical memory and the process has enough available virtual address space to satisfy any memory allocation request.

QUESTION DATAREP-8A. Give a value of N so that, while make_random_lt_matrix(N) is running, no new fails, but a memory error (such as a null pointer dereference or an out-of-bounds dereference) happens on Line A. The memory error should happen specifically when i == 1 .

(This problem is probably easier when you write your answer in hexadecimal.)

We are asked to produce a value of N so that no memory error happens on Line A when i == 0 , but a memory error does happen when i == 1 . So reason that through. What memory errors could happen on Line A if malloc() returns non- nullptr ? There’s only one memory operation, namely the dereference m[i] . Perhaps this dereference is out of bounds. If no memory error happens when i == 0 , then a m[0] dereference must not cause a memory error. So the m object must contain at least 4 bytes. But a memory error does happen on Line A when i == 1 . So the m object must contain less than 8 bytes. How many bytes were allocated for m ? sizeof(point2_vector) * N == sizeof(point2 *) * N == 4 * N . So we have: (4 * N) ≥ 4 (4 * N) < 8 It seems like the only possible answer is N == 1 . But no, this doesn’t cause a memory error, because the loop body would never be executed with i == 1 ! The key insight is that the multiplications above use 32-bit unsigned computer arithmetic. Let’s write N as X + 1 . Then these inequalities become: 4 ≤ 4 * (X + 1) = 4 * X + 4 < 8 0 ≤ (4 * X) < 4 (Multiplication distributes over addition in computer arithmetic.) What values of X satisfy this inequality? It might be easier to see if we remember that multiplication by powers of two is equivalent to shifting: 0 ≤ (X << 2) < 4 The key insight is that this shift eliminates the top two bits of X . There are exactly four values for X that work: 0 , 0x40000000 , 0x80000000 , and 0xC0000000 . For any of these, 4 * X == 0 in 32-bit computer arithmetic, because 4× X = 0 (mod 2 32 ) in normal arithmetic. Plugging X back in to N , we see that N ∈ {0x40000001, 0x80000001, 0xC0000001} . These are the only values that work. Partial credit was awarded for values that acknowledged the possibility of overflow.

QUESTION DATAREP-8B. Give a value of N so that no new fails, and no memory error happens on Line A, but a memory error does happen on Line B.

If no memory error happens on Line A, then N < 2 30 (otherwise overflow would happen as seen above). But a memory error does happen on Line B. Line B dereferences m[i][j] , for 0 ≤ j ≤ i ; so how big is m[i] ? It was allocated on Line A with size `sizeof(point2) (i + 1) == 2 * sizeof(double) * (i + 1) == 16 * (i + 1) . If i + 1 ≥ 2<sup>32</sup> / 16 = 2<sup>28</sup>, this multiplication will overflow. Since i < N , we can finally reason that any N greater than or equal to 2<sup>28</sup> = 0x10000000 and less than 2<sup>30</sup> = 0x40000000` will cause the required memory error.

DATAREP-9. Data representation

Assume a 64-bit x86-64 architecture unless explicitly told otherwise.

Write your assumptions if a problem seems unclear, and write down your reasoning for partial credit.

QUESTION DATAREP-9A. Arrange the following values in increasing numeric order. Assume that x is an int with value 8192.

A possible answer might be “a < b < c = d < e < f < g < h.”

h < a = d = f < b < c < e < g

For each of the remaining questions, write one or more arguments that, when passed to the provided function, will cause it to return the integer 61 (which is 0x3d hexadecimal ). Write the expected number of arguments of the expected types.

QUESTION DATAREP-9B.

QUESTION DATAREP-9C.

"61"

QUESTION DATAREP-9D. Your answer should be different from the previous answer.

" 0x3d" , " 61 " , etc.

QUESTION DATAREP-9E. For this problem, you will also need to define a global variable. Give its type and value.

This code was compiled from: int f4 ( int a, int b) { extern unsigned char y; return (a & 5 ) * 2 + b - y; } A valid solution is a =0, b =61, unsigned char y =0.

DATAREP-10. Sizes and alignments

QUESTION DATAREP-10A. Use the following members to create a struct of size 16, using each member exactly once, and putting char a first; or say “impossible” if this is impossible.

  • char a; (we’ve written this for you)
  • unsigned char b;

QUESTION DATAREP-10B. Repeat Part A, but create a struct with size 12.

abdc, acbd, acdb, adbc, adcb, …

QUESTION DATAREP-10C. Repeat Part A, but create a struct with size 8.

QUESTION DATAREP-10D. Consider the following structs:

Give definitions for T, U, and V so that there is one byte of padding in struct x after x2 , and two bytes of padding in struct y after y1 .

Example: T = short[2] , U = char , V = int

DATAREP-11. Dynamic memory allocation

QUESTION DATAREP-11A. True or false?

  • free(nullptr) is an error.
  • malloc(0) can never return nullptr .
False, False

QUESTION DATAREP-11B. Give values for sz and nmemb so that calloc(sz, nmemb) will always return nullptr (on a 32-bit x86 machine), but malloc(sz * nmemb) might or might not return null.

(size_t) -1, (size_t) -1 —anything that causes an overflow

Consider the following 8 statements. ( p and q have type char* .)

  • q = nullptr;
  • p = (char*) malloc(12);
  • q = (char*) malloc(8);

QUESTION DATAREP-11C. Put the statements in an order that would execute without error or evoking undefined behavior. Memory leaks count as errors. Use each statement exactly once. Sample answer: “abcdefgh.”

cdefghab (and others). Expect “OK”

QUESTION DATAREP-11D. Put the statements in an order that would cause one double-free error, and no other error or undefined behavior (except possibly one memory leak). Use each statement exactly once.

efghbcad (and others). Expect “double-free + memory leak”

QUESTION DATAREP-11E. Put the statements in an order that would cause one memory leak (one allocated piece of memory is not freed), and no other error or undefined behavior. Use each statement exactly once.

efghadbc (and others). Expect “memory leak”

QUESTION DATAREP-11F. Put the statements in an order that would cause one boundary write error, and no other error or undefined behavior. Use each statement exactly once.

eafhcgbd (and others). Expect “out of bounds write”

DATAREP-12. Pointers and debugging allocators

You are debugging some students’ m61 code from Problem Set 1. The codes use the following metadata:

Their linked-list manipulations in m61_malloc are similar.

But their linked-list manipulations in m61_free differ.

Alice’s code: void m61_free ( void * ptr, ...) { ... meta * m = (meta * ) ptr - 1 ; if (m -> next != nullptr ) { m -> next -> prev = m -> prev; } if (m -> prev == nullptr ) { mhead = nullptr ; } else { m -> prev -> next = m -> next; } ... }
Bob’s code: void m61_free ( void * ptr, ...) { ... meta * m = (meta * ) ptr - 1 ; if (m -> next) { m -> next -> prev = m -> prev; } if (m -> prev) { m -> prev -> next = m -> next; } ... }
Chris’s code: void m61_free ( void * ptr, ...) { ... meta * m = (meta * ) ptr - 1 ; m -> next -> prev = m -> prev; m -> prev -> next = m -> next; ... }
Donna’s code: void m61_free ( void * ptr, ...) { ... meta * m = (meta * ) ptr - 1 ; if (m -> next) { m -> next -> prev = m -> prev; } if (m -> prev) { m -> prev -> next = m -> next; } else { mhead = m -> next; } ... }

You may assume that all code not shown is correct.

QUESTION DATAREP-12A. Whose code will segmentation fault on this input? List all students that apply.

QUESTION DATAREP-12B. Whose code might report something like “ invalid free of pointer [ptr1], not allocated ” on this input? (Because a list traversal starting from mhead fails to find ptr1 .) List all students that apply. Don’t include students whose code would segfault before the report.

QUESTION DATAREP-12C. Whose code would improperly report something like “ LEAK CHECK: allocated object [ptr1] with size 1 ” on this input? (Because the mhead list appears not empty, although it should be.) List all students that apply. Don’t include students whose code would segfault before the report.

QUESTION DATAREP-12D. Whose linked-list code is correct for all inputs? List all that apply.

DATAREP-13. Arena allocation

Chimamanda Ngozi Adichie is a writing a program that needs to allocate and free a lot of nodes, where a node is defined as follows:

She uses an arena allocator variant. Here’s her code.

QUESTION DATAREP-13A. True or false?

  • This allocator never has external fragmentation.
  • This allocator never has internal fragmentation.

QUESTION DATAREP-13B. Chimamanda’s frenemy Paul Auster notices that if many nodes are allocated right in a row, every 1024th allocation seems much more expensive than the others. The reason is that every 1024th allocation initializes a new group, which in turn adds 1024 nodes to the free list. Chimamanda decides instead to allow a single element of the free list to represent many contiguous free nodes . The average allocation might get a tiny bit slower, but no allocation will be much slower than average. Here’s the start of her idea:

Complete this function by writing code to replace // ??? .

if (n -> key == 1 ) { a -> frees = n -> right; } else { a -> frees = n + 1 ; a -> frees -> key = n -> key - 1 ; a -> frees -> right = n -> right; } Another solution: if (n -> right) { a -> frees = n -> right; } else if (n -> key == 1 ) { a -> frees = NULL ; } else { a -> frees = n + 1 ; a -> frees -> key = n -> key - 1 ; }

QUESTION DATAREP-13C. Write a node_free function that works with the node_alloc function from the previous question.

void node_free (arena * a, node * n) { n -> right = a -> frees; n -> key = 1 ; a -> frees = n; } Or, if you use the solution above: void node_free (arena * a, node * n) { n -> right = a -> frees; a -> frees = n; }

QUESTION DATAREP-13D. Complete the following new function.

arena_group * node_find_group (arena * a, node * n) { for (arena_group * g = a -> groups; g; g = g -> next_group) { if ((uintptr_t) & g -> nodes[ 0 ] <= (uintptr_t) n && (uintptr_t) n <= (uintptr_t) & g -> nodes[ 1023 ]) { return g; } } return nullptr ; }

QUESTION DATAREP-13E. Chimamanda doesn’t like that the node_find_group function from part D takes O ( G ) time, where G is the number of allocated arena_groups. She remembers a library function that might help, posix_memalign :

The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr . The address of the allocated memory will be a multiple of alignment , which must be a power of two and a multiple of sizeof(void*) . ...

“Cool,” she says, “I can use this to speed up node_find_group !” She now allocates a new group with the following code:

Given this allocation strategy, write a version of node_find_group that takes O (1) time.

arena_group * node_find_group (arena * a, node * n) { uintptr_t n_addr = (uintptr_t) n; return (arena_group * ) (n_addr - n_addr % 32768 ); }

DATAREP-14. Data representation

Sort the following expressions in ascending order by value, using the operators <, =, >. For example, if we gave you:

  • int B = 0x6;

you would write C < A = B .

  • unsigned char a = 0x191;
  • char b = 0x293;
  • unsigned long c = 0xFFFFFFFF;
  • int d = 0xFFFFFFFF;
  • int e = d + 3;
  • size_t g = sizeof(*s) (given short *s )
  • long h = 256;
  • i = 0b100000000000000000000000000000000000 (binary)
  • unsigned long j = 0xACE - 0x101;
b < d < e = g < a < h < j < c < f < i

DATAREP-15. Memory

For the following questions, select the part(s) of memory from the list below that best describes where you will find the object.

  • between the heap and the stack
  • in a read-only data segment
  • in a text segment starting at address 0x08048000
  • in a read/write data segment
  • in a register

Assume the following code, compiled without optimization.

QUESTION DATAREP-15A. The value 0xdeadbeef, when we are returning from main.

7, in a register

QUESTION DATAREP-15B. The variable maxitems

4, in a read-only data segment

QUESTION DATAREP-15C. The structure s

6, in a read/write data segment

QUESTION DATAREP-15D. The structure at sp[9]

QUESTION DATAREP-15E. The variable u

2, stack, or 7, in a register

QUESTION DATAREP-15F. main

5, in a text segment starting at address 0x08048000

QUESTION DATAREP-15G. printf

3, between the heap and the stack

QUESTION DATAREP-15H. argc

QUESTION DATAREP-15I. The number the user enters

QUESTION DATAREP-15J. The variable L

DATAREP-16. Memory and pointers

⚠️ This question may benefit from Unit 4, kernel programming. ⚠️

If multiple processes are sharing data via mmap , they may have the file mapped at different virtual addresses. In this case, pointers to the same object will have different values in the different processes. One way to store pointers in mmapped memory so that multiple processes can access them consistently is using relative pointers. Rather than storing a regular pointer, you store the offset from the beginning of the mmapped region and add that to the address of the mapping to obtain a real pointer. An alternative representation is called self-relative pointers. In this case, you store the difference in address between the current location (i.e., the location containing the pointer) and the location to which you want to point. Neither representation addresses pointers between the mmapped region and the rest of the address space; you may assume such pointers do not exist.

QUESTION DATAREP-16A. State one advantage that relative pointers have over self-relative pointers.

The key thing to understand is that both of these approaches use relative pointers and both can be used to solve the problem of sharing a mapped region among processes that might have the region mapped at different addresses. Possible advantages: Within a region, you can safely use memcpy as moving pointers around inside the region does not change their value. If you copy a self relative pointer to a new location, its value has to change. That is, imagine that you have a self-relative pointer at offset 4 from the region and it points to the object at offset 64 from the region. The value of the self relative pointer is 60. If I copy that pointer to the offset 100 from the region, I have to change it to be -36. If you save the region as a uintptr_t or a char * , then you can simply add the offset to the region; self-relative-pointers will always be adding/subtracting from the address of the location storing the pointer, which may have a type other than char * , so you'd need to cast it before performing the addition/subtraction. You can use a larger region: if we assume that we have only N bits to store the pointer, then in the base+offset model, offset could be an unsigned value, which will be larger than the maximum offset possible with a signed pointer, which you need for the self-relative case. That is, although the number of values that can be represented by signed and unsigned numbers differs by one, the implementation must allow for a pointer from the beginning of the region to reference an item at the very last location of the region -- thus, your region size is limited by the largest positive number you can represent.

QUESTION DATAREP-16B. State one advantage that self-relative pointers have over relative pointers.

You don't have to know the address at which the region is mapped to use them. That is, given a location containing a self-relative pointer, you can find the target of that pointer.

For the following questions, assume the following setup:

QUESTION DATAREP-16C. Propose a type for TYPE1 and give 1 sentence why you chose that type.

A good choice is ptrdiff_t , which represents differences between pointers. Other reasonable choices include uintptr_t and unsigned long .

QUESTION DATAREP-16D. Write a C expression to generate a (properly typed) pointer to the element referenced by the r_next field of ll1 .

(ll1*) (region + node1.r_next)

QUESTION DATAREP-16E. Propose a type for TYPE2 and give 1 sentence why you chose that type.

The same choices work; again ptrdiff_t is best.

QUESTION DATAREP-16F. Write a C expression to generate a (properly typed) pointer to the element referenced by the sr_next field of ll2 .

(ll2*) ((char*) &node2.sr_next + node2.sr_next)

DATAREP-17. Data representation: Allocation sizes

How much user-accessible space is allocated on the stack and/or the heap by each of the following statements? Assume x86-64.

QUESTION DATAREP-17A. union my_union { ... };

0; this declares the type , not any object

QUESTION DATAREP-17B. void* p = malloc(sizeof(char*));

16: 8 on the heap plus 8 on the stack

QUESTION DATAREP-17C. my_union u;

16 (on the stack)

QUESTION DATAREP-17D. my_union* up = &u;

8 (on the stack)

DATAREP-18. Data representation: ENIAC

Professor Kohler has been developing Eddie’s NIfty Awesome Computer (ENIAC). When he built the C compiler for ENIAC, he assigned the following sizes and alignments to C’s fundamental data types. (Assume that every other fundamental type has the same size and alignment as one of these.)

QUESTION DATAREP-18A. This set of sizes is valid: it obeys all the requirements set by C’s abstract machine. Give one different size assignment that would make the set as a whole invalid.

Some examples: sizeof(char) = 0 ; sizeof(char) = 2 ; sizeof(short) = 8 (i.e., longer than int ); sizeof(int) = 2 (though not discussed in class, turns out that C++ requires ints are at least 2 bytes big); etc.

QUESTION DATAREP-18B. What alignment must the ENIAC malloc guarantee?

For the following two questions, assume the following struct on the ENIAC:

QUESTION DATAREP-18C. What is sizeof(struct s) ?

f1 is 7 bytes. f2 is 16 bytes with 16-byte alignment, so add 9B padding. f3 is 4 bytes (and is already aligned). f4 is 8 bytes with 8-byte alignment, so add 4B padding. That adds up to 7 + 9 + 16 + 4 + 4 + 8 = 16 + 16 + 16 = 48 bytes. That’s a multiple of the structure’s alignment, which is 16, so no need for any end padding.

QUESTION DATAREP-18D. What is alignof(struct s) ?

The remaining questions refer to this structure definition:

Indicate for each statement whether the statement is always true, possibly true, or never true on the ENIAC.

QUESTION DATAREP-18E : sizeof(outer) > sizeof(inner) (Always / Possibly / Never)

QUESTION DATAREP-18F : sizeof(outer) is a multiple of sizeof(inner) (Always / Possibly / Never)

QUESTION DATAREP-18G : alignof(outer) > alignof(struct inner) (Always / Possibly / Never)

QUESTION DATAREP-18H : sizeof(outer) - sizeof(inner) < 4 (Always / Possibly / Never)

QUESTION DATAREP-18I : sizeof(outer) - sizeof(inner) > 32 (Always / Possibly / Never)

QUESTION DATAREP-18J : alignof(inner) == 2 (Always / Possibly / Never)

DATAREP-19. Undefined behavior

Which of the following expressions, instruction sequences, and code behaviors cause undefined behavior? For each question, write Defined or Undefined. (Note that the INT_MAX and UINT_MAX constants have types int and unsigned , respectively.)

QUESTION DATAREP-19A. INT_MAX + 1 (Defined / Undefined)

QUESTION DATAREP-19B. UINT_MAX + 1 (Defined / Undefined)

QUESTION DATAREP-19C.

(Defined / Undefined)

Defined (only C++ programs can have undefined behavior; the behavior of x86-64 instructions is always defined)

QUESTION DATAREP-19D. Failed memory allocation, i.e., malloc returns nullptr (Defined / Undefined)

QUESTION DATAREP-19E. Use-after-free (Defined / Undefined)

QUESTION DATAREP-19F. Here are two functions and a global variable:

C’s undefined behavior rules would allow an aggressive optimizing compiler to simplify the code generated for f . Fill in the following function with the simplest C code you can, under the constraint that an aggressive optimizing compiler might generate the same object code for f and f_simplified .

return i * 2;

DATAREP-20. Bit manipulation

It’s common in systems code to need to switch data between big-endian and little-endian representations. This is because networks represent multi-byte integers using big-endian representation, whereas x86-family processors store multi-byte integers using little-endian representation.

QUESTION DATAREP-20A. Complete this function, which translates an integer from big-endian representation to little-endian representation by swapping bytes. For instance, big_to_little(0x01020304) should return 0x04030201 . Your return statement must refer to the u.c array, and must not refer to x . This function is compiled on x86-64 Linux (as every function is unless we say otherwise).

return (u.c[ 0 ] << 24 ) | (u.c[ 1 ] << 16 ) | (u.c[ 2 ] << 8 ) | u.c[ 3 ];

QUESTION DATAREP-20B. Complete the function again, but this time write a single expression that refers to x (you may refer to x multiple times, of course).

return ((x & 0xFF ) << 24 ) | ((x & 0xFF00 ) << 8 ) | ((x & 0xFF0000 ) >> 8 ) | (x >> 24 );

QUESTION DATAREP-20C. Now write the function little_to_big , which will translate a little-endian integer into big-endian representation. You may introduce helper variables or even call big_to_little if that’s helpful.

return big_to_little (x);

DATAREP-21. Computer arithmetic

Bitwise operators and computer arithmetic can represent vectors of bits, which in turn are useful for representing sets . For example, say we have a function bit that maps elements to distinct bits; thus, bit(X) == (1 << i) for some i . Then a set {X0, X1, X2, …, Xn} can be represented as bit(X0) | bit(X1) | bit(X2) | … | bit(Xn) . Element Xi is in the set with integer representation z if and only if (bit(Xi) & z) != 0 .

QUESTION DATAREP-21A. What is the maximum number of set elements that can be represented in a single unsigned variable on an x86 machine?

QUESTION DATAREP-21B. Match each set operation with the C operator(s) that could implement that operation. (Complement is a unary operation.)

intersection & equality == complement ~ union | toggle membership ^

QUESTION DATAREP-21C. Complete this function, which should return the set difference between the sets with representations a and b . This is the set containing exactly those elements of set a that are not in set b .

Any of these work: return a & ~ b; return a - (a & b); return a & ~ (a & b);

QUESTION DATAREP-21D. Below we’ve given a number of C++ expressions, some of their values, and some of their set representations for a set of elements. For example, the first row says that the integer value of expression 0 is just 0, which corresponds to an empty set. Fill in the blanks. This will require figuring out which bits correspond to the set elements A , B , C , and D , and the values for the 32-bit int variables a , x , and s . No arithmetic operation overflows; abs(x) returns the absolute value of x (that is, x < 0 ? -x : x ).

Expression e Integer value Represented set 0 0 {} a == a 1 {A} (unsigned) ~a < (unsigned) a 1 {A} a < 0 1 {A} (1 << (s/2)) - 1 15 {A,B,C,D} a * a 4 {C} abs(a) 2 {D} x & (x - 1) 0 {} x - 1 3 {A,D} x 4 {C} s 8 {B}

DATAREP-22. Bit Tac Toe

Brenda Bitdiddle is implementing tic-tac-toe using bitwise arithmetic. (If you’re unfamiliar with tic-tac-toe, see below.) Her implementation starts like this:

Each position on the board is assigned a distinct bit.

Tic-tac-toe, also known as noughts and crosses, is a simple paper-and-pencil game for two players, X and O. The board is a 3x3 grid. The players take turns writing their symbol (X or O) in an empty square on the grid. The game is won when one player gets their symbol in all three squares in one of the rows, one of the columns, or one of the two diagonals. X goes first; played perfectly, the game always ends in a draw. You may access the Wikipedia page for tic-tac-toe .

QUESTION DATAREP-22A. Brenda’s current code doesn’t check whether a move reuses a position. Write a snippet of C code that returns –1 if an attempted move is reusing a position. This snippet will replace line 3.

Lots of people misinterpreted this to mean the player reused their own position and ignored the other player. That mistake was allowed with no points off. The code below checks whether any position was reused by either player. if ((b -> moves[XS] | b -> moves[OS]) & ttt_values[row][col]) { return - 1 ; } OR if ((b -> moves[XS] | b -> moves[OS] | ttt_values[row][col]) == (b -> moves[XS] | b -> moves[OS])) { return - 1 ; } OR if ((b -> moves[XS] + b -> moves[OS]) & ttt_values[row][col]) { return - 1 ; } OR if ((b -> moves[p] ^ ttt_values[row][col]) < b -> moves[p]) { return - 1 ; } etc.

QUESTION DATAREP-22B. Complete the following function. You may use the following helper function:

  • int popcount(unsigned n) Return the number of 1 bits in n . (Stands for “population count”; is implemented on recent x86 processors by a single instruction, popcnt .)

For full credit, your code should consist of a single “ return ” statement with a simple expression, but for substantial partial credit write any correct solution.

return popcount (b -> moves[XS] | b -> moves[OS]);

QUESTION DATAREP-22C. Write a simple expression that, if nonzero, indicates that player XS has a win on board b across the main diagonal (has marks in positions 0,0 , 1,1 , and 2,2 ).

(b -> moves[XS] & 0x421 ) == 0x421

Lydia Davis notices Brenda’s code and has a brainstorm. “If you use different values,” she suggests, “it becomes easy to detect any win.” She suggests:

QUESTION DATAREP-22D. Repeat part A for Lydia’s values: Write a snippet of C code that returns –1 if an attempted move is reusing a position. This snippet will replace line 3 in Brenda’s code.

The same answers as for part A work.

QUESTION DATAREP-22E. Repeat part B for Lydia’s values: Use popcount to complete tictactoe_nmoves .

Either of: return popcount ((b -> moves[ 0 ] | b -> moves[ 1 ]) & 0x777 ); return popcount ((b -> moves[ 0 ] | b -> moves[ 1 ]) & 0x777000 );

QUESTION DATAREP-22F. Complete the following function for Lydia’s values. For full credit, your code should consist of a single “ return ” statement containing exactly two constants, but for substantial partial credit write any correct solution.

return (b -> moves[p] + 0x11111111 ) & 0x88888888 ; // Another amazing possibility (Allen Chen and others): return b -> moves[p] & (b -> moves[p] << 1 ) & (b -> moves[p] << 2 );

DATAREP-23. Memory and Pointers

Two processes are mapping a file into their address space. The mapped file contains an unsorted linked list of integers. As the processes cannot ensure that the file will be mapped at the same virtual address, they use relative pointers to link elements in the list. A relative pointer holds not an address, but an offset that user code can use to calculate a true address. Our processes define the offset as relative to the start of the file.

Thus, each element in the linked list is represented by the following structure:

offset == (size_t) -1 indicates the end of the list. Other offset values represent the position of the next item in the list, calculated relative to the start of the file.

QUESTION DATAREP-23A. Write a function to find an item in the list. The function's prototype is:

The mapped_file parameter is the address of the mapped file data; the list parameter is a pointer to the first node in the list; and the value parameter is the value for which we are searching. The function should return a pointer to the linked list element if the value appears in the list or nullptr if the value is not in the list.

ll_node * find_element ( void * mapped_file, ll_node * list, int value) { while ( 1 ) { if (list -> value == value) return list; if (list -> offset == (size_t) - 1 ) return NULL ; list = (ll_node * ) (( char * ) mapped_file + list -> offset); } }

DATAREP-24. Integer representation

Write the value of the variable or expression in each problem, using signed decimal representation.

For example, if we gave you:

  • int i = 0xA;
  • int j = 0xFFFFFFFF;

you would write A) 10 B) -1.

QUESTION DATAREP-24A. int i = 0xFFFF; (You may write this either in decimal or as an expression using a power of 2)

2 16 - 1 or 65535

QUESTION DATAREP-24B. short s = 0xFFFF; (You may write this either in decimal or as an expression using a power of 2)

QUESTION DATAREP-24C. unsigned u = 1 << 10;

1024 or 2 10

QUESTION DATAREP-24D. ⚠️ From WeensyOS: unsigned long l = PTE_P | PTE_U;

QUESTION DATAREP-24E. int j = ~0;

QUESTION DATAREP-24F. ⚠️ From WeensyOS: sizeof(x86_64_pagetable);

4096 or 2 12

QUESTION DATAREP-24G. Given this structure:

This expression: sizeof(ps);

TRICK QUESTION! 8

QUESTION DATAREP-24H. Using the structure above: sizeof(*ps);

QUESTION DATAREP-24I. unsigned char u = 0xABC;

0xBC == 11*16 + 12 == 160 + 16 + 12 == 188

QUESTION DATAREP-24J. signed char c = 0xABC;

0xBC has most-significant bit on, so the value as a signed char is less than zero. We seek x so that 0xBC + x == 0x100 . The answer is 0x44: 0xBC + 4 == 0xC0 , and 0xC0 + 0x40 == 0x100 . So −0x44 == −4*16 − 4 == −68 .

DATAREP-25. Data representation

In gdb, you observe the following values for a set of memory locations.

For each C expression below, write its value in hexadecimal. For example, if we gave you:

the answer would be 0xa0 .

Assume the following structure and union declarations and variable definitions.

QUESTION DATAREP-25A. cp[4] =

QUESTION DATAREP-25B. cp + 7 =

0x100001027

QUESTION DATAREP-25C. s1 + 1 =

0x100001038

QUESTION DATAREP-25D. s1->i =

0xd3c2b1a0 (-742215264)

QUESTION DATAREP-25E. sizeof(s1) =

QUESTION DATAREP-25F. &s2->s =

0x100001028

QUESTION DATAREP-25G. &u->s =

0x100001020

QUESTION DATAREP-25H. s1->l =

0x9f8e7d6c5b4a3928 (-6949479270644565720)

QUESTION DATAREP-25I. s2->s.s =

0xf201 (-3583)

QUESTION DATAREP-25J. u->l =

0x1706f5e4d3c2b1a0 (1659283875886707104)

DATAREP-26. Sizes and alignments

Here’s a test struct with n members. Assume an x86-64 machine, where each Ti either is a basic x86-64 type (e.g., int , char , double ) or is a type derived from such types (e.g., arrays, structs, pointers, unions, possibly recursively), and assume that ai ≤8 for all i .

In these questions, you will compare this struct with other structs that have the same members, but in other orders.

QUESTION DATAREP-26A. True or false: The size of struct test is minimized when its members are sorted by size. In other words, if s1 ≤ s2 ≤…≤ sn , then sizeof(struct test) is less than or equal to the struct size for any other member order.

If true, briefly explain your answer; if false, give a counterexample (i.e., concrete types for T1 , …,  Tn that do not minimize sizeof(struct test) ).

False. T1 = char , T2 = int , T3 = char[5]

QUESTION DATAREP-26B. True or false: The size of struct test is minimized when its members are sorted by alignment. In other words, if a1 ≤ a2 ≤…≤ an , then sizeof(struct test) is less than or equal to the struct size for any other member order.

If true, briefly explain your answer; if false, give a counterexample.

True. Padding only occurs between objects with different alignments, and is limited by the second alignment; sorting by alignment therefore minimizes padding.

QUESTION DATAREP-26C. True or false: The alignment of struct test is minimized when its members are sorted in increasing order by alignment. In other words, if a1 ≤ a2 ≤…≤ an , then alignof(struct test) is less than or equal to the struct alignment for any other member order.

True. It’s all the same; alignment is max alignment of every component, and is independent of order.

QUESTION DATAREP-26D. What is the maximum number of bytes of padding that struct test could contain for a given n ? The answer will be a pretty simple formula involving n . (Remember that ai ≤8 for all i .)

Alternating char and long gives the most padding, which is 7*(n/2) when n is even and 7*(n+1)/2 otherwise.

QUESTION DATAREP-26E. What is the minimum number of bytes of padding that struct test could contain for a given n ?

DATAREP-27. Undefined behavior

QUESTION DATAREP-27A. Sometimes a conforming C compiler can assume that a + 1 > a , and sometimes it can’t. For each type below, consider this expression:

and say whether the compiler:

  • Must reject the expression as a type error.
  • May assume that the expression is true (that a + (int) 1 > a for all a ).
  • Must not assume that the expression is true.
  • unsigned char a
  • struct {int m;} a
1—May assume; 2—Must not assume; 3—May assume; 4—May assume (in fact due to integer promotion, this statement really is always true, even in mathematical terms); 5—Must reject.

QUESTION DATAREP-27B. The following code checks its arguments for sanity, but not well: each check can cause undefined behavior.

Rewrite these checks to avoid all undefined behavior. You will likely add one or more casts to uintptr_t . For full credit, write each check as a single comparison (no && or || , even though the current ptr_into_array check uses || ).

array_size check:

ptr_into_array check:

array_size check: (uintptr_t) array + 4 * array_size < (uintptr_t) array ptr_into_array check: (uintptr_t) ptr_into_array - (uintptr_t) array > 4 * array_size

QUESTION DATAREP-27C. In lecture, we discussed several ways to tell if a signed integer x is negative. One of them was the following:

But this is incorrect: it has undefined behavior. Correct it by adding two characters.

(x & (1UL << (sizeof(x) * CHAR_BIT - 1))) != 0

DATAREP-28. ⚠️ Memory errors and garbage collection ⚠️

⚠️ We didn’t discuss garbage collectors in class this year.

Recall that a conservative garbage collector is a program that can automatically free dynamically-allocated memory by detecting when that memory is no longer referenced. Such a GC works by scanning memory for currently-referenced pointers, starting from stack and global memory, and recursing over each referenced object until all referenced memory has been scanned. We built a conservative garbage collector in lecture datarep6.

QUESTION DATAREP-28A. An application program that uses conservative GC, and does not call free directly, will avoid certain errors and undefined behaviors. Which of the following errors are avoided? List all that apply.

  • Use-after-free
  • Double free
  • Signed integer overflow
  • Boundary write error
  • Unaligned access

QUESTION DATAREP-28B. Write a C program that leaks unbounded memory without GC, but does not do so with GC. You should need less than 5 lines. (Leaking “unbounded” memory means the program will exhaust the memory capacity of any machine on which it runs.)

while ( 1 ) { ( void ) malloc( 1 ); }

QUESTION DATAREP-28C. Not every valid C program works with a conservative GC, because the C abstract machine allows a program to manipulate pointers in strange ways. Which of the following pointer manipulations might cause the conservative GC from class to inappropriately free a memory allocation? List all that apply.

Storing the pointer in a uintptr_t variable

Writing the pointer to a disk file and reading it back later

Using the least-significant bit of the pointer to store a flag:

Storing the pointer in textual form:

Splitting the pointer into two parts and storing the parts in an array:

DATAREP-29. Bitwise

QUESTION DATAREP-29A. Consider this C fragment:

Or, shorter:

Write a single expression that evaluates to the same value, but that does not use the conditional ?: operator. You will use the fact that a < b always equals 0 or 1. For full credit, do not use expensive operators (multiply, divide, modulus).

Examples: (a < b) * x , (-(uintptr_t) (a < b)) & x

QUESTION DATAREP-29B. This function returns one more than the index of the least-significant 1 bit in its argument, or 0 if its argument is zero.

This function runs in O ( B ) time, where B is the number of bits in an unsigned long . Write a version of ffs that runs instead in O (log  B ) time.

int ffs ( unsigned long x) { if ( ! x) { return 0 ; } int ans = 1 ; if ( ! (x & 0x00000000FFFFFFFFUL )) { ans += 32 ; x >>= 32 ; } if ( ! (x & 0x0000FFFF )) { ans += 16 ; x >>= 16 ; } if ( ! (x & 0x00FF )) { ans += 8 ; x >>= 8 ; } if ( ! (x & 0x0F )) { ans += 4 ; x >>= 4 ; } if ( ! (x & 0x3 )) { ans += 2 ; x >>= 2 ; } return ans + (x & 0x1 ? 0 : 1 ); }

DATAREP-30. Data representation

QUESTION DATAREP-30A. Write a type whose size is 19,404,329 times larger than its alignment.

char[19404329]

QUESTION DATAREP-30B. Consider a structure type T with N members, all of which have nonzero size. Assume that sizeof(T) == alignof(T) . What is N ?

QUESTION DATAREP-30C. What is a C type that obeys (T) -1 == (T) 255 on x86-64?

char (or unsigned char or signed char )

Parts D–G use this code. The architecture might or might not be x86-64.

Assume that (uintptr_t) s2 - (uintptr_t) s1 == 4 and *s1 > *s2 .

QUESTION DATAREP-30D. What is sizeof(a) ?

QUESTION DATAREP-30E. What is sizeof(unsigned) on this architecture?

QUESTION DATAREP-30F. Is this architecture big-endian or little-endian?

QUESTION DATAREP-30G. Might the architecture be x86-64?

DATAREP-31. Memory errors

Mavis Gallant is starting on her debugging memory allocator. She’s written code that aims to detect invalid frees, where a pointer passed to m61_free was not returned by an earlier m61_malloc .

Help her track down bugs.

QUESTION DATAREP-31A. What is sizeof(struct m61_metadata) ?

QUESTION DATAREP-31B. Give an m61_ function call (function name and arguments) that would cause both unsigned integer overflow and invalid memory accesses.

m61_malloc((size_t) -15) . This turns into malloc(1) and the dereference of meta->magic becomes invalid.

QUESTION DATAREP-31C. Give an m61_ function call (function name and arguments) that would cause integer overflow, but no invalid memory access within the m61_ functions . (The application might or might not make an invalid memory access later.)

m61_malloc((size_t) -1)

QUESTION DATAREP-31D. These functions have some potential null pointer dereferences. Fix one such problem, including the line number(s) where your code should go.

C3: if (p) { memset(p, 0, sz * count); }

QUESTION DATAREP-31E. Put a single line of C code in the blank. The resulting program should (1) be well-defined with no memory leaks when using default malloc / free / calloc , but (2) always cause undefined behavior when using Mavis’s debugging malloc / free / calloc .

free(nullptr);

QUESTION DATAREP-31F. A double free should print a different message than an invalid free. Write code so Mavis’s implementation does this; include the line numbers where the code should go.

F4: fprintf(stderr, meta->magic == 0xB0B0B0B0 ? "double free of %p" : "invalid free of %p", ptr) after F6: meta->magic = 0xB0B0B0B0;

DATAREP-32. Memory word search

QUESTION DATAREP-32A. What is decimal 61 in hexadecimal?

0x3d, 2 pts

The following is a partial dump of an x86-64 Linux program’s memory. Note that each byte is represented as a hexadecimal number. Memory addresses increase from top to bottom and from left to right.

QUESTION DATAREP-32B. What memory segment contains the memory dump?

Data segment (globals), 2 pts

For questions 3C–3I, give the last two digits of the address of the given object in this memory dump (so for the object located at address 0x6010a8, you would write “a8”). There’s a single best answer for every question, and all questions have different answers .

QUESTION DATAREP-32C. A long (64 bits) of value 61.

QUESTION DATAREP-32D. A long of value -61.

QUESTION DATAREP-32E. An int of value 61.

QUESTION DATAREP-32F. A pointer to an object in the heap.

QUESTION DATAREP-32G. A pointer to a local variable.

QUESTION DATAREP-32H. A pointer that points to an object present in the memory dump.

QUESTION DATAREP-32I. The first byte of a C string comprising at least 4 printable ASCII characters. (Printable ASCII characters have values from 0x20 to 0x7e.)

DATAREP-33. Architecture

QUESTION DATAREP-33A. Write a C++ expression that will evaluate to false on all typical 32-bit architectures and true on all typical 64-bit architectures.

3pts sizeof(void*) == 8

QUESTION DATAREP-33B. Give an integer value that has the same representation in big-endian and little-endian, and give its C++ type. Assume the same type sizes as x86-64.

2pts ex: int x = 0

QUESTION DATAREP-33C. Repeat question B, but with a different integer value.

2pts ex: int x = -1 :)

QUESTION DATAREP-33D. Complete this C++ function, which should return true iff it is called on a machine with little-endian integer representation. Again, you may assume the same type sizes as x86-64.

3pts bool is_little_endian () { union { int a; char c[ 4 ]; } u; u.a = 0 ; u.c[ 0 ] = 1 ; return u.a == 1 ; }

DATAREP-34. Undefined behavior

The following code is taken from the well-known textbook Mastering C Pointers (with some stylistic changes). Assume it is run on x86-64.

QUESTION DATAREP-34A. List all situations in which this program will not execute undefined behavior. (There is at least one.)

3pts If malloc returns nullptr .

QUESTION DATAREP-34B. True or false? The expression ++x could be replaced with ++y without changing the meaning of the program.

3pts True! The undefined behavior is the same either way.

Here’s an updated version of the program.

QUESTION DATAREP-34C. True or false? The expression *(x + y) could be replaced with x[y] without changing the meaning of the program.

QUESTION DATAREP-34D. True or false? The expression *(x + y) could be replaced by * (int*) ((uintptr_t) x + y) without changing the meaning of the program.

3pts False (address arithmetic isn’t the same for pointer arithmetic, except for pointers with type equivalent to char )

DATAREP-35. Odd alignments

QUESTION DATAREP-35A. Write a struct definition that contains exactly seven bytes of padding on x86-64.

3pts struct {long a; char b;}

QUESTION DATAREP-35B. Can an x86-64 struct comprise more than half padding? Give an example if so, or explain briefly why not.

3pts Yes. struct {char a; long b; char c;} has size 24 and 14 bytes of padding.

The remaining questions consider a new architecture called x86-64-rainbow , which is like x86-64 plus special support for a fundamental data type called color . A color is a three-byte data type with red, green, and blue components, kind of like this:

But unlike struct color on x86-64, which has size 3 and alignment 1, color on x86-64-rainbow has size 3 and alignment 3 . All the usual rules for C abstract machine sizes and alignments still hold.

QUESTION DATAREP-35C. What is the alignment of pointers returned by malloc on x86-64-rainbow?

QUESTION DATAREP-35D. Give an example of an x86-64-rainbow struct that ends with more than 16 bytes of padding.

2pts struct { long a[2]; color b[3]; } has alignment lcm(8, 3) = 24 and size 8 + 8 + 3 + 3 + 3 = 25, so it ends with 23 bytes of padding!

DATAREP-36. Debugging allocators

In problem set 1, you built a debugging allocator that could detect many kinds of error. Some students used exclusively internal metadata , where the debugging allocator’s data was stored within the same block of memory as the payload. Some students used external metadata , such as a separate hash table mapping payload pointers to metadata information.

QUESTION DATAREP-36A. Which kind of error mentioned by the problem set could not be detected by external metadata alone?

2pts Boundary write errors.

The problem set requires the m61 library to detect invalid frees, which includes double frees. However, double frees are so common that a special double-free error message can help users. Jia Tolentino wants to print such a message. Her initial idea is to keep a set of recently freed pointers:

QUESTION DATAREP-36B. What eviction policy is used for the recently_freed array?

2pts FIFO (round-robin).

Jia uses these helpers in m61_free as follows. (You may assume that is_invalid_free(ptr) returns true for every invalid or double free.)

She has not yet changed m61_malloc , though she knows she needs to. Help her evaluate whether her design achieves the following mandatory and desirable (that is, optional) requirements.

Note : As you saw in tests 32 and 33 in pset 1, aggressive attacks from users can corrupt metadata arbitrarily. You should assume for the following questions that such aggressive attacks do not occur. The user might free something that was never allocated, and might double-free something, but will never overwrite metadata.

QUESTION DATAREP-36C. Mandatory : The design must not keep unbounded, un-reusable state for pointers that have been freed. Write “Achieved” or “Not achieved” and explain briefly.

6 pts for C-E together. +5 if one serious error; +3 if two serious errors Achieved. The design only keeps space for 10 recently-freed pointers.

QUESTION DATAREP-36D. Desirable : The design should report a double-free as a double-free, not an invalid free. Write “Achieved” or “Not achieved” and explain briefly.

Not achieved. If a pointer falls off the list after 10 frees, Jia’s design will report a double-free of that pointer as an invalid free.

QUESTION DATAREP-36E. Mandatory : The design must never report valid frees as double-frees or invalid frees. Write “Achieved” or “Not achieved” and explain briefly.

Not achieved. A recently-freed pointer can be reused by malloc, but she hasn’t accounted for that. So if (1) a pointer is freed, (2) a malloc returns the same pointer, (3) the user frees that pointer again, that valid free will be reported as a double free.

QUESTION DATAREP-36F. Describe code to be added to m61_malloc that will achieve all mandatory requirements, if any are required.

3pts Here’s the cleanest solution to the 4E problem. Remove returned pointers from recently_freed . Add this code to malloc before return ptr : for ( int i = 0 ; i != 10 ; ++ i) { if (recently_freed[i] == ptr) { recently_freed[i] = nullptr ; } }

DATAREP-37. Representations

Your job in the following questions is to complete each C++ program so that it prints 61 to standard output when run. Your answer should fill in the blank _________ , and you may assume system calls execute without error. If a question cannot be answered so that the program reliably prints 61 with no undefined behavior, say so and explain briefly.

To compile these programs, use a command line such as c++ -std=gnu++17 -pthread x.cc .

QUESTION DATAREP-37A.

QUESTION DATAREP-37B.

61L or (long) 61 are best; 61 will cause a compiler warning, because it’s an int and the format expects a long .

QUESTION DATAREP-37C.

6 * 16 + 1 , 0x61 , 97

QUESTION DATAREP-37D.

157 , - x + 61 , 96 + 61

QUESTION DATAREP-37E.

char c[61];

QUESTION DATAREP-37F.

This cannot be done. struct sixtyfun has minimum alignment of 4 (because int has alignment 4), so its size must be a multiple of 4.

QUESTION DATAREP-37G.

Anything with size 20, such as char c[20]; or int x[5]; .

QUESTION DATAREP-37H.

int a = 255, b = 0, c = 61; int a = -1, b = 0 (or anything ≤31), c = 61; unsigned a = 0x10000, b = 12, c = 0xFF; int a = 32, b = 1, c = 63; There are many other possibilities!

QUESTION DATAREP-37I.

This cannot be done without undefined behavior, because y is computed using an uninitialized variable.

QUESTION DATAREP-37J.

Anything that adds 15 total to x[0] … x[2] . For example, 5

QUESTION DATAREP-37K.

{ "echo", "61", nullptr }

QUESTION DATAREP-37L.

dup2(pfd[0], 0); close(pfd[0]); close(pfd[1]); The sleep(3) , which is not waitpid , makes many other solutions work too. dup2(pfd[0], 0); works; pipe hygiene isn't important because the parent exits. printf("61\n"); almost works, but something weird happens on Mac OS X.

QUESTION DATAREP-37M.

waitpid(p, nullptr, 0)

QUESTION DATAREP-37N. Note that there are three blanks to fill in with code (you may also leave them empty).

#1, #3: nothing. #2: std::unique_lock<std::mutex> guard(m); If the lock is in #1 (and there is no unlock in #3), that causes deadlock.

NCERT Books and Solutions for all classes

NCERT Books and Solutions for all classes

Data Representation Class 11 Computer Science Exam Questions

Please refer to Data Representation Class 11 Computer Science Exam Questions provided below. These questions and answers for Class 11 Computer Science have been designed based on the past trend of questions and important topics in your class 11 Computer Science books. You should go through all Class 11 Computer Science Important Questions provided by our teachers which will help you to get more marks in upcoming exams.

Class 11 Computer Science Exam Questions Data Representation

Class 11 Computer Science students should read and understand the important questions and answers provided below for Data Representation which will help them to understand all important and difficult topics.

Very Short answer Type Questions

Question: Covert the following Decimal numbers to binary – (a) 23 (b) 100 (c) 161 (d) 145 Answer:  (a) 10111 (b) 1100100 (c) 10100001 (d) 10010001

Question: Convert 111110111101012 to octal. Answer:  37365

Question: Covert the following Octal numbers to Binary – (a) 456 (b) 26 (c) 751 (d) 777 Answer:  (a) 100101110 (b) 010110 (c) 111101001 (d)111111111

Question: Covert the following binary numbers to Hexadecimal – (a)101000001 (b) 11100011 (c) 10101111 (d) 101101111 Answer:  (a) 141 (b) E3 (c) AF (d)16F

Question: Covert the following binary numbers to decimal – (a)1010 (b) 111000 (c) 10101111 (d) 10110 Answer:  (a) 10 (b) 56 (c) 175 (d) 22

Question: Add the binary numbers (a) 110101 and 101111 (b) 10110 and 1101 Answer:  (a) 1100100 (b) 100011

Question:  Covert the following Hexadecimal numbers to Binary – (a) BE (b) BC9 (c) A07 (d) 7AB4 Answer:  (a)10111110 (b) 101111001001 (c) 101000000111 (d) 0111101010110100

Question: Convert the following: (a) 4468 to ( )16 (b) 47.58 to ( )10 (c) 45.910 to ( )2 Answer:  (a) 126 (b) 39.625 (c) 101101.1110

Short Answer Type Questions  

Question: How UTF-8 encoding scheme different from UTF-32 encoding scheme? Answer:  UTF-8: Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text. UTF-32 uses 32-bit values for each character. That allows them to use a fixed-width code for every character. UTF-32 is opposite, it uses the most memory (each character is a fixed 4 bytes wide), but on the other hand, you know that every character has this precise length, so string manipulation becomes far simpler. You can compute the number of characters in a string simply from the length in bytes of the string. You can’t do that with UTF-8.

Question: What are ASCII and extended ASCII schemes? Answer:The standard ASCII  character set uses just 7 bits for each character. There are severallarger character sets that use 8 bits, which gives them 128 additional characters. The extra characters are used to represent non-English characters, graphics symbols, and mathematical symbols.

The extended ASCII  character set uses 8 bits, which gives it an additional 128 characters. The extra characters represent characters from foreign languages and special symbols for drawing pictures.

Question: What is the use of encoding schemes? Answer:  A character encoding provides a key to unlock (ie. crack) the code. It is a set of mappingsbetween the bytes in the computer and the characters in the character set. Without the key, thedata looks like garbage. So, when you input text using a keyboard or in some other way, the character encoding mapscharacters you choose to specific bytes in computer memory, and then to display the text itreads the bytes back into characters. Unfortunately, there are many different character sets and character encodings, ie. many different ways of mapping between bytes, code points and characters. The section Additional information provides a little more detail for those who are interested.

Question: What is Unicode? What is its significance? Answer:  Unicode is a character encoding standard that has widespread acceptance. Microsoft software uses Unicode at its core. Whether you realize it or not, you are using Unicode already! Basically, ―computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters.1‖ This has been the problem we, in SIL, have often run into. If you are using a legacy encodingyour font conflicts with the font someone in another area of the world uses. You might have an in your font while someplace else someone used a at the same codepoint. Your files are  incompatible. Unicode provides a unique number for every character and so you do not have this problem if you use Unicode. If your document calls for U+0289 it will be clear to any computer program what the character should be

Question: Discuss UTF-8 encoding Scheme. Answer:  UTF-8 is a compromise character encoding that can be as compact as ASCII (if the file is just plain English text) but can also contain any unicode characters (with some increase in file size). UTF stands for Unicode Transformation Format. The ‘8’ means it uses 8-bit blocks to represent a character.

Question: What is the utility of ISCII encoding schemes? Prepared By: Sanjeev Bhadauria & Neha Tyagi Answer:  ISCII is a bilingual character encoding (not glyphs) scheme. Roman characters and punctuation marks as defined in the standard lower-ASCII take up the first half the character set (first 128 slots). Characters for indie languages are allocated to the upper slots (128-255). T

Question: What are ASCII and ISCII? Why are these used? Answer:  ASCII uses a 7-bit encoding and ISCII uses an 8-bit which is an extension of ASCII. These are encoding schemes to represent character set in s computer system.

Question: What do you understand by code point and code unit? Answer:  A code point is the atomic unit of information. … Each code point is a number which is given meaning by the Unicode standard. A code unit is the unit of storage of a part of anencoded code point. In UTF-8 this means 8-bits, in UTF-16 this means 16-bits.

Question: What is code space? How is it related to code point? Answer:  In computing, Code space may refer to: In memory address space: code space, where machine code is stored. For a character encoding: code space (or code space), the range of code points.

Question: Compare UTF-8 and UTF-32 encoding schemes. Which one is most popular scheme? Answer: UTF-8:  Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text. UTF-32  uses 32-bit values for each character. That allows them to use a fixed-width code for every character. UTF-32 is opposite, it uses the most memory (each character is a fixed 4 bytes wide), but on the other hand, you know that every character has this precise length, so string manipulation becomes far simpler. You can compute the number of characters in a string simply from the length in bytes of the string. You can’t do that with UTF-8.

Data Representation Computer Science Exam Questions

We hope you liked the above provided Data Representation Class 11 Computer Science Exam Questions. If you have any further questions please put them in the comments section below so that our teachers can provide you an answer.

Related Posts:

Life Processes Class 10 Science MCQ Questions

Related Posts

Gravitation class 11 physics exam questions.

Depreciation Provisions and Reserves Class 11 Accountancy Exam Questions

Depreciation Provisions and Reserves Class 11 Accountancy Exam Questions

Insight Into Program Execution Computer Science Exam Questions

Insight Into Program Execution Class 11 Computer Science Exam Questions

Teach Computer Science

Data Representation

Topics include binary, decimal, and hexadecimal numbers, and the conversions between them.

Computational Thinking

We may think that computers “think” and that they outsmart humans “just like that”. However that’s not the case, computers do exactly what we humans tell them to do, or better said, what we program them to do. Once programmed a computer can only execute problems and produce solutions more efficiently than humans. Computational thinking …

For Data Units, candidates should be able to: define the terms bit, nibble, byte, kilobyte, megabyte, gigabyte, terabyte understand that data needs to be converted into a binary format to be processed by a computer. Data units in computer systems Bit This is a single unit of memory and can only store 2 possible binary …

Image Representation

For Image Representation, candidates should be able to: explain the representation of an image as a series of pixels represented in binary explain the need for metadata to be included in the file such as height, width and colour depth discuss the effect of colour depth and resolution on the size of an image file. …

Sound Representation

For Sound Representation, candidates should be able to: explain how sound can be sampled and stored in digital form explain how sampling intervals and other considerations affect the size of a sound file and the quality of its playback. How can sound be sampled and stored in digital form? A microphone converts sound waves into …

Number Systems

For Number Systems, candidates should be able to: convert positive denary whole numbers (0-255) into 8-bit binary numbers and vice versa add two 8-bit binary integers and explain overflow errors which may occur convert positive denary whole numbers (0-255) into 2-digit hexadecimal numbers and vice versa convert between binary and hexadecimal equivalents of the same …

Character Sets

Candidates should be able to: explain the use of binary codes to represent characters explain the term character set describe with examples (for example ASCII and Unicode) the relationship between the number of bits per character in a character set and the number of characters which can be represented. How are binary codes used to …

Computer Instructions

In regards to computer instructions, candidates should be able to: explain how instructions are coded as bit patterns explain how the computer distinguishes between instructions and data. Computer Instructions: How are program instructions coded? Machine code instructions are binary numbers and are coded as bit patterns, for example, a 16 bit machine code instruction could …

Converting Decimal to Binary

Converting Decimal to Binary: The Decimal Numbering System Decimal is a base 10 numbering Binary Numbering System Binary is a base 2 numbering system that is made up of two numbers: 0 and 1.  0 means OFF and 1 means ON.  The computer’s central processing unit (CPU) only recognizes these two states – ON and …

Converting Hexadecimal to Decimal

Hexadecimal Numbering System Hexadecimal is a base 16 numbering system that is made up of 16 digits: 0 – 9 and six more, which is A through F. Uses of Hexadecimal The hexadecimal numbering system is often used by programmers to simplify the binary numbering system.  Since 16 is equivalent to 24, there is a …

Converting Hexadecimal to Binary

Converting Hexadecimal to Binary Hexadecimal Numbering System Hexadecimal is a base 16 numbering system which is made up of 16 digits: 0 – 9 and six more, which is A through F. Uses of Hexadecimal Hexadecimal numbering system is often used by programmers to simplify the binary numbering system.  Since 16 is equivalent to 24, …

Converting Decimal to Hexadecimal

Converting Decimal to Hexadecimal: The Decimal Numbering System Decimal is a base 10 numbering system that is made up of 10 numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.  It is the most commonly used numbering system.  The reason behind that is convenience.  We have 10 fingers that we use for …

Uses of Hexadecimal

Hexadecimal Numbering System Hexadecimal is a base 16 numbering system that is made up of 16 digits: 0 – 9 and six more, which is A through F.The table below shows how the hexadecimal system works and its equivalent decimal number: Hexadecimal Decimal   Hexadecimal Decimal 0 0   11 = (1 x 16) + …

Converting Binary to Hexadecimal

Binary Numbering System Binary is a base 2 numbering system which is made up of two numbers: 0 and 1.  0 means OFF and 1 means ON.  The computer’s central processing unit (CPU) only recognizes these two states – ON and OFF.  It is the foundation for all binary code, which is used in computer …

Converting Binary to Decimal

Converting Binary to Decimal: The Binary Numbering System Binary is a base 2 numbering system that is made up of two numbers: 0 and 1.  0 means OFF and 1 means ON.  The computer’s central processing unit (CPU) only recognizes these two states – ON and OFF.  It is the foundation for all binary code, …

Bitmap Image and Colour Depth Quiz

Further Readings: Bitmap Color depth

Colour Depth Gap Fill Exercise

Further Readings: Color depth

Data Units Gap Fill Exercise

Further Readings: Units of information

Data Units Multi-Choice Quiz

Colour mapping and direct colour.

Pupil Resources – EXTENSION TOPIC What is the difference between Colour Mapping and Direct Colour? Colour mapping With low colour depths (up to 8-bit) it is practical to map every colour to a binary code. 1-bit colour mapping – (2 colours) monochrome, often black and white. 2-bit colour mapping – (4 colours) CGA – used …

Instructions

Candidates should be able to: explain how instructions are coded as bit patterns explain how the computer distinguishes between instructions and data. How are program instructions coded? Machine code instructions are binary numbers and are coded as bit patterns, for example, a 16-bit machine code instruction could be coded as 001010101101001011. In machine code, the …

Sound – Quality, Size & Storing Solutions

Candidates should be able to: How can sound be sampled and stored in digital form? A microphone converts sound waves into voltage changes. If a microphone is plugged into a sound card then the voltage can be sampled at regular intervals (the sample rate) and each value converted into a binary number. This digitising of …

National Academies Press: OpenBook

Computer Science: Reflections on the Field, Reflections from the Field (2004)

Chapter: 5 data, representation, and information, 5 data, representation, and information.

T he preceding two chapters address the creation of models that capture phenomena of interest and the abstractions both for data and for computation that reduce these models to forms that can be executed by computer. We turn now to the ways computer scientists deal with information, especially in its static form as data that can be manipulated by programs.

Gray begins by narrating a long line of research on databases—storehouses of related, structured, and durable data. We see here that the objects of research are not data per se but rather designs of “schemas” that allow deliberate inquiry and manipulation. Gray couples this review with introspection about the ways in which database researchers approach these problems.

Databases support storage and retrieval of information by defining—in advance—a complex structure for the data that supports the intended operations. In contrast, Lesk reviews research on retrieving information from documents that are formatted to meet the needs of applications rather than predefined schematized formats.

Interpretation of information is at the heart of what historians do, and Ayers explains how information technology is transforming their paradigms. He proposes that history is essentially model building—constructing explanations based on available information—and suggests that the methods of computer science are influencing this core aspect of historical analysis.

DATABASE SYSTEMS: A TEXTBOOK CASE OF RESEARCH PAYING OFF

Jim Gray, Microsoft Research

A small research investment helped produce U.S. market dominance in the $14 billion database industry. Government and industry funding of a few research projects created the ideas for several generations of products and trained the people who built those products. Continuing research is now creating the ideas and training the people for the next generation of products.

Industry Profile

The database industry generated about $14 billion in revenue in 2002 and is growing at 20 percent per year, even though the overall technology sector is almost static. Among software sectors, the database industry is second only to operating system software. Database industry leaders are all U.S.-based corporations: IBM, Microsoft, and Oracle are the three largest. There are several specialty vendors: Tandem sells over $1 billion/ year of fault-tolerant transaction processing systems, Teradata sells about $1 billion/year of data-mining systems, and companies like Information Resources Associates, Verity, Fulcrum, and others sell specialized data and text-mining software.

In addition to these well-established companies, there is a vibrant group of small companies specializing in application-specific databases—for text retrieval, spatial and geographical data, scientific data, image data, and so on. An emerging group of companies offer XML-oriented databases. Desktop databases are another important market focused on extreme ease of use, small size, and disconnected (offline) operation.

Historical Perspective

Companies began automating their back-office bookkeeping in the 1960s. The COBOL programming language and its record-oriented file model were the workhorses of this effort. Typically, a batch of transactions was applied to the old-tape-master, producing a new-tape-master and printout for the next business day. During that era, there was considerable experimentation with systems to manage an online database that could capture transactions as they happened. At first these systems were ad hoc, but late in that decade network and hierarchical database products emerged. A COBOL subcommittee defined a network data model stan-

dard (DBTG) that formed the basis for most systems during the 1970s. Indeed, in 1980 DBTG-based Cullinet was the leading software company.

However, there were some problems with DBTG. DBTG uses a low-level, record-at-a-time procedural language to access information. The programmer has to navigate through the database, following pointers from record to record. If the database is redesigned, as often happens over a decade, then all the old programs have to be rewritten.

The relational data model, enunciated by IBM researcher Ted Codd in a 1970 Communications of the Association for Computing Machinery article, 1 was a major advance over DBTG. The relational model unified data and metadata so that there was only one form of data representation. It defined a non-procedural data access language based on algebra or logic. It was easier for end users to visualize and understand than the pointers-and-records-based DBTG model.

The research community (both industry and university) embraced the relational data model and extended it during the 1970s. Most significantly, researchers showed that a non-procedural language could be compiled to give performance comparable to the best record-oriented database systems. This research produced a generation of systems and people that formed the basis for products from IBM, Ingres, Oracle, Informix, Sybase, and others. The SQL relational database language was standardized by ANSI/ISO between 1982 and 1986. By 1990, virtually all database systems provided an SQL interface (including network, hierarchical, and object-oriented systems).

Meanwhile the database research agenda moved on to geographically distributed databases and to parallel data access. Theoretical work on distributed databases led to prototypes that in turn led to products. Today, all the major database systems offer the ability to distribute and replicate data among nodes of a computer network. Intense research on data replication during the late 1980s and early 1990s gave rise to a second generation of replication products that are now the mainstays of mobile computing.

Research of the 1980s showed how to execute each of the relational data operators in parallel—giving hundred-fold and thousand-fold speedups. The results of this research began to appear in the products of several major database companies. With the proliferation of data mining in the 1990s, huge databases emerged. Interactive access to these databases requires that the system use multiple processors and multiple disks to read all the data in parallel. In addition, these problems require near-

linear time search algorithms. University and industrial research of the previous decade had solved these problems and forms the basis of the current VLDB (very large database) data-mining systems.

Rollup and drilldown data reporting systems had been a mainstay of decision-support systems ever since the 1960s. In the middle 1990s, the research community really focused on data-mining algorithms. They invented very efficient data cube and materialized view algorithms that form the basis for the current generation of business intelligence products.

The most recent round of government-sponsored research creating a new industry comes from the National Science Foundation’s Digital Libraries program, which spawned Google. It was founded by a group of “database” graduate students who took a fresh look at how information should be organized and presented in the Internet era.

Current Research Directions

There continues to be active and valuable research on representing and indexing data, adding inference to data search, compiling queries more efficiently, executing queries in parallel, integrating data from heterogeneous data sources, analyzing performance, and extending the transaction model to handle long transactions and workflow (transactions that involve human as well as computer steps). The availability of huge volumes of data on the Internet has prompted the study of data integration, mediation, and federation in which a portal system presents a unification of several data sources by pulling data on demand from different parts of the Internet.

In addition, there is great interest in unifying object-oriented concepts with the relational model. New data types (image, document, and drawing) are best viewed as the methods that implement them rather than by the bytes that represent them. By adding procedures to the database system, one gets active databases, data inference, and data encapsulation. This object-oriented approach is an area of active research and ferment both in academe and industry. It seems that in 2003, the research prototypes are mostly done and this is an area that is rapidly moving into products.

The Internet is full of semi-structured data—data that has a bit of schema and metadata, but is mostly a loose collection of facts. XML has emerged as the standard representation of semi-structured data, but there is no consensus on how such data should be stored, indexed, or searched. There have been intense research efforts to answer these questions. Prototypes have been built at universities and industrial research labs, and now products are in development.

The database research community now has a major focus on stream data processing. Traditionally, databases have been stored locally and are

updated by transactions. Sensor networks, financial markets, telephone calls, credit card transactions, and other data sources present streams of data rather than a static database. The stream data processing researchers are exploring languages and algorithms for querying such streams and providing approximate answers.

Now that nearly all information is online, data security and data privacy are extremely serious and important problems. A small, but growing, part of the database community is looking at ways to protect people’s privacy by limiting the ways data is used. This work also has implications for protecting intellectual property (e.g., digital rights management, watermarking) and protecting data integrity by digitally signing documents and then replicating them so that the documents cannot be altered or destroyed.

Case Histories

The U.S. government funded many database research projects from 1972 to the present. Projects at the University of California at Los Angeles gave rise to Teradata and produced many excellent students. Projects at Computer Corp. of America (SDD-1, Daplex, Multibase, and HiPAC) pioneered distributed database technology and object-oriented database technology. Projects at Stanford University fostered deductive database technology, data integration technology, query optimization technology, and the popular Yahoo! and Google Internet sites. Work at Carnegie Mellon University gave rise to general transaction models and ultimately to the Transarc Corporation. There have been many other successes from AT&T, the University of Texas at Austin, Brown and Harvard Universities, the University of Maryland, the University of Michigan, Massachusetts Institute of Technology, Princeton University, and the University of Toronto among others. It is not possible to enumerate all the contributions here, but we highlight three representative research projects that had a major impact on the industry.

Project INGRES

Project Ingres started at the University of California at Berkeley in 1972. Inspired by Codd’s paper on relational databases, several faculty members (Stonebraker, Rowe, Wong, and others) started a project to design and build a relational system. Incidental to this work, they invented a query language (QUEL), relational optimization techniques, a language binding technique, and interesting storage strategies. They also pioneered work on distributed databases.

The Ingres academic system formed the basis for the Ingres product now owned by Computer Associates. Students trained on Ingres went on

to start or staff all the major database companies (AT&T, Britton Lee, HP, Informix, IBM, Oracle, Tandem, Sybase). The Ingres project went on to investigate distributed databases, database inference, active databases, and extensible databases. It was rechristened Postgres, which is now the basis of the digital library and scientific database efforts within the University of California system. Recently, Postgres spun off to become the basis for a new object-relational system from the start-up Illustra Information Technologies.

Codd’s ideas were inspired by seeing the problems IBM and its customers were having with IBM’s IMS product and the DBTG network data model. His relational model was at first very controversial; people thought that the model was too simplistic and that it could never give good performance. IBM Research management took a gamble and chartered a small (10-person) systems effort to prototype a relational system based on Codd’s ideas. That system produced a prototype that eventually grew into the DB2 product series. Along the way, the IBM team pioneered ideas in query optimization, data independence (views), transactions (logging and locking), and security (the grant-revoke model). In addition, the SQL query language from System R was the basis for the ANSI/ISO standard.

The System R group went on to investigate distributed databases (project R*) and object-oriented extensible databases (project Starburst). These research projects have pioneered new ideas and algorithms. The results appear in IBM’s database products and those of other vendors.

Not all research ideas work out. During the 1970s there was great enthusiasm for database machines—special-purpose computers that would be much faster than general-purpose operating systems running conventional database systems. These research projects were often based on exotic hardware like bubble memories, head-per-track disks, or associative RAM. The problem was that general-purpose systems were improving at 50 percent per year, so it was difficult for exotic systems to compete with them. By 1980, most researchers realized the futility of special-purpose approaches and the database-machine community switched to research on using arrays of general-purpose processors and disks to process data in parallel.

The University of Wisconsin hosted the major proponents of this idea in the United States. Funded by the government and industry, those researchers prototyped and built a parallel database machine called

Gamma. That system produced ideas and a generation of students who went on to staff all the database vendors. Today the parallel systems from IBM, Tandem, Oracle, Informix, Sybase, and Microsoft all have a direct lineage from the Wisconsin research on parallel database systems. The use of parallel database systems for data mining is the fastest-growing component of the database server industry.

The Gamma project evolved into the Exodus project at Wisconsin (focusing on an extensible object-oriented database). Exodus has now evolved to the Paradise system, which combines object-oriented and parallel database techniques to represent, store, and quickly process huge Earth-observing satellite databases.

And Then There Is Science

In addition to creating a huge industry, database theory, science, and engineering constitute a key part of computer science today. Representing knowledge within a computer is one of the central challenges of computer science ( Box 5.1 ). Database research has focused primarily on this fundamental issue. Many universities have faculty investigating these problems and offer classes that teach the concepts developed by this research program.

COMPUTER SCIENCE IS TO INFORMATION AS CHEMISTRY IS TO MATTER

Michael Lesk, Rutgers University

In other countries computer science is often called “informatics” or some similar name. Much computer science research derives from the need to access, process, store, or otherwise exploit some resource of useful information. Just as chemistry is driven to large extent by the need to understand substances, computing is driven by a need to handle data and information. As an example of the way chemistry has developed, see Oliver Sacks’s book Uncle Tungsten: Memories of a Chemical Boyhood (Vintage Books, 2002). He describes his explorations through the different metals, learning the properties of each, and understanding their applications. Similarly, in the history of computer science, our information needs and our information capabilities have driven parts of the research agenda. Information retrieval systems take some kind of information, such as text documents or pictures, and try to retrieve topics or concepts based on words or shapes. Deducing the concept from the bytes can be difficult, and the way we approach the problem depends on what kind of bytes we have and how many of them we have.

Our experimental method is to see if we can build a system that will provide some useful access to information or service. If it works, those algorithms and that kind of data become a new field: look at areas like geographic information systems. If not, people may abandon the area until we see a new motivation to exploit that kind of data. For example, face-recognition algorithms have received a new impetus from security needs, speeding up progress in the last few years. An effective strategy to move computer science forward is to provide some new kind of information and see if we can make it useful.

Chemistry, of course, involves a dichotomy between substances and reactions. Just as we can (and frequently do) think of computer science in terms of algorithms, we can talk about chemistry in terms of reactions. However, chemistry has historically focused on substances: the encyclopedias and indexes in chemistry tend to be organized and focused on compounds, with reaction names and schemes getting less space on the shelf. Chemistry is becoming more balanced as we understand reactions better; computer science has always been more heavily oriented toward algorithms, but we cannot ignore the driving force of new kinds of data.

The history of information retrieval, for example, has been driven by the kinds of information we could store and use. In the 1960s, for example, storage was extremely expensive. Research projects were limited to text

materials. Even then, storage costs meant that a research project could just barely manage to have a single ASCII document available for processing. For example, Gerard Salton’s SMART system, one of the leading text retrieval systems for many years (see Salton’s book, The SMART Automatic Retrieval System , Prentice-Hall, 1971), did most of its processing on collections of a few hundred abstracts. The only collections of “full documents” were a collection of 80 extended abstracts, each a page or two long, and a collection of under a thousand stories from Time Magazine , each less than a page in length. The biggest collection was 1400 abstracts in aeronautical engineering. With this data, Salton was able to experiment on the effectiveness of retrieval methods using suffixing, thesauri, and simple phrase finding. Salton also laid down the standard methodology for evaluating retrieval systems, based on Cyril Cleverdon’s measures of “recall” (percentage of the relevant material that is retrieved in response to a query) and “precision” (the percentage of the material retrieved that is relevant). A system with perfect recall finds all relevant material, making no errors of omission and leaving out nothing the user wanted. In contrast, a system with perfect precision finds only relevant material, making no errors of commission and not bothering the user with stuff of no interest. The SMART system produced these measures for many retrieval experiments and its methodology was widely used, making text retrieval one of the earliest areas of computer science with agreed-on evaluation methods. Salton was not able to do anything with image retrieval at the time; there were no such data available for him.

Another idea shaped by the amount of information available was “relevance feedback,” the idea of identifying useful documents from a first retrieval pass in order to improve the results of a later retrieval. With so few documents, high precision seemed like an unnecessary goal. It was simply not possible to retrieve more material than somebody could look at. Thus, the research focused on high recall (also stimulated by the insistence by some users that they had to have every single relevant document). Relevance feedback helped recall. By contrast, the use of phrase searching to improve precision was tried but never got much attention simply because it did not have the scope to produce much improvement in the running systems.

The basic problem is that we wish to search for concepts, and what we have in natural language are words and phrases. When our documents are few and short, the main problem is not to miss any, and the research at the time stressed algorithms that found related words via associations or improved recall with techniques like relevance feedback.

Then, of course, several other advances—computer typesetting and word processing to generate material and cheap disks to hold it—led to much larger text collections. Figure 5.1 shows the decline in the price of

data representation computer science questions

FIGURE 5.1 Decline in the price of disk space, 1950 to 2004.

disk space since the first disks in the mid-1950s, generally following the cost-performance trends of Moore’s law.

Cheaper storage led to larger and larger text collections online. Now there are many terabytes of data on the Web. These vastly larger volumes mean that precision has now become more important, since a common problem is to wade through vastly too many documents. Not surprisingly, in the mid-1980s efforts started on separating the multiple meanings of words like “bank” or “pine” and became the research area of “sense disambiguation.” 2 With sense disambiguation, it is possible to imagine searching for only one meaning of an ambiguous word, thus avoiding many erroneous retrievals.

Large-scale research on text processing took off with the availability of the TREC (Text Retrieval Evaluation Conference) data. Thanks to the National Institute of Standards and Technology, several hundred megabytes of text were provided (in each of several years) for research use. This stimulated more work on query analysis, text handling, searching

algorithms, and related areas; see the series titled TREC Conference Proceedings, edited by Donna Harmon of NIST.

Document clustering appeared as an important way to shorten long search results. Clustering enables a system to report not, say, 5000 documents but rather 10 groups of 500 documents each, and the user can then explore the group or groups that seem relevant. Salton anticipated the future possibility of such algorithms, as did others. 3 Until we got large collections, though, clustering did not find application in the document retrieval world. Now one routinely sees search engines using these techniques, and faster clustering algorithms have been developed.

Thus the algorithms explored switched from recall aids to precision aids as the quantity of available data increased. Manual thesauri, for example, have dropped out of favor for retrieval, partly because of their cost but also because their goal is to increase recall, which is not today’s problem. In terms of finding the concepts hinted at by words and phrases, our goals now are to sharpen rather than broaden these concepts: thus disambiguation and phrase matching, and not as much work on thesauri and term associations.

Again, multilingual searching started to matter, because multilingual collections became available. Multilingual research shows a more precise example of particular information resources driving research. The Canadian government made its Parliamentary proceedings (called Hansard ) available in both French and English, with paragraph-by-paragraph translation. This data stimulated a number of projects looking at how to handle bilingual material, including work on automatic alignment of the parallel texts, automatic linking of similar words in the two languages, and so on. 4

A similar effect was seen with the Brown corpus of tagged English text, where the part of speech of each word (e.g., whether a word is a noun or a verb) was identified. This produced a few years of work on algorithms that learned how to assign parts of speech to words in running text based on statistical techniques, such as the work by Garside. 5

One might see an analogy to various new fields of chemistry. The recognition that pesticides like DDT were environmental pollutants led to a new interest in biodegradability, and the Freon propellants used in aerosol cans stimulated research in reactions in the upper atmosphere. New substances stimulated a need to study reactions that previously had not been a top priority for chemistry and chemical engineering.

As storage became cheaper, image storage was now as practical as text storage had been a decade earlier. Starting in the 1980s we saw the IBM QBIC project demonstrating that something could be done to retrieve images directly, without having to index them by text words first. 6 Projects like this were stimulated by the availability of “clip art” such as the COREL image disks. Several different projects were driven by the easy access to images in this way, with technology moving on from color and texture to more accurate shape processing. At Berkeley, for example, the “Blobworld” project made major improvements in shape detection and recognition, as described in Carson et al. 7 These projects demonstrated that retrieval could be done with images as well as with words, and that properties of images could be found that were usable as concepts for searching.

Another new kind of data that became feasible to process was sound, in particular human speech. Here it was the Defense Advanced Research Projects Agency (DARPA) that took the lead, providing the SWITCH-BOARD corpus of spoken English. Again, the availability of a substantial file of tagged information helped stimulate many research projects that used this corpus and developed much of the technology that eventually went into the commercial speech recognition products we now have. As with the TREC contests, the competitions run by DARPA based on its spoken language data pushed the industry and the researchers to new advances. National needs created a new technology; one is reminded of the development of synthetic rubber during World War II or the advances in catalysis needed to make explosives during World War I.

Yet another kind of new data was geo-coded data, introducing a new set of conceptual ideas related to place. Geographical data started showing up in machine-readable form during the 1980s, especially with the release of the Dual Independent Map Encoding (DIME) files after the 1980

census and the Topologically Integrated Geographic Encoding and Referencing (TIGER) files from the 1990 census. The availability, free of charge, of a complete U.S. street map stimulated much research on systems to display maps, to give driving directions, and the like. 8 When aerial photographs also became available, there was the triumph of Microsoft’s “Terraserver,” which made it possible to look at a wide swath of the world from the sky along with correlated street and topographic maps. 9

More recently, in the 1990s, we have started to look at video search and retrieval. After all, if a CD-ROM contains about 300,000 times as many bytes per pound as a deck of punched cards, and a digitized video has about 500,000 times as many bytes per second as the ASCII script it comes from, we should be about where we were in the 1960s with video today. And indeed there are a few projects, most notably the Informedia project at Carnegie Mellon University, that experiment with video signals; they do not yet have ways of searching enormous collections, but they are developing algorithms that exploit whatever they can find in the video: scene breaks, closed-captioning, and so on.

Again, there is the problem of deducing concepts from a new kind of information. We started with the problem of words in one language needing to be combined when synonymous, picked apart when ambiguous, and moved on to detecting synonyms across multiple languages and then to concepts depicted in pictures and sounds. Now we see research such as that by Jezekiel Ben-Arie associating words like “run” or “hop” with video images of people doing those actions. In the same way we get again new chemistry when molecules like “buckyballs” are created and stimulate new theoretical and reaction studies.

Defining concepts for search can be extremely difficult. For example, despite our abilities to parse and define every item in a computer language, we have made no progress on retrieval of software; people looking for search or sort routines depend on metadata or comments. Some areas seem more flexible than others: text and naturalistic photograph processing software tends to be very general, while software to handle CAD diagrams and maps tends to be more specific. Algorithms are sometimes portable; both speech processing and image processing need Fourier transforms, but the literature is less connected than one might like (partly

because of the difference between one-dimensional and two-dimensional transforms).

There are many other examples of interesting computer science research stimulated by the availability of particular kinds of information. Work on string matching today is often driven by the need to align sequences in either protein or DNA data banks. Work on image analysis is heavily influenced by the need to deal with medical radiographs. And there are many other interesting projects specifically linked to an individual data source. Among examples:

The British Library scanning of the original manuscript of Beowulf in collaboration with the University of Kentucky, working on image enhancement until the result of the scanning is better than reading the original;

The Perseus project, demonstrating the educational applications possible because of the earlier Thesaurus Linguae Graecae project, which digitized all the classical Greek authors;

The work in astronomical analysis stimulated by the Sloan Digital Sky Survey;

The creation of the field of “forensic paleontology” at the University of Texas as a result of doing MRI scans of fossil bones;

And, of course, the enormous amount of work on search engines stimulated by the Web.

When one of these fields takes off, and we find wide usage of some online resource, it benefits society. Every university library gained readers as their catalogs went online and became accessible to students in their dorm rooms. Third World researchers can now access large amounts of technical content their libraries could rarely acquire in the past.

In computer science, and in chemistry, there is a tension between the algorithm/reaction and the data/substance. For example, should one look up an answer or compute it? Once upon a time logarithms were looked up in tables; today we also compute them on demand. Melting points and other physical properties of chemical substances are looked up in tables; perhaps with enough quantum mechanical calculation we could predict them, but it’s impractical for most materials. Predicting tomorrow’s weather might seem a difficult choice. One approach is to measure the current conditions, take some equations that model the atmosphere, and calculate forward a day. Another is to measure the current conditions, look in a big database for the previous day most similar to today, and then take the day after that one as the best prediction for tomorrow. However, so far the meteorologists feel that calculation is better. Another complicated example is chess: given the time pressure of chess tournaments

against speed and storage available in computers, chess programs do the opening and the endgame by looking in tables of old data and calculate for the middle game.

To conclude, a recipe for stimulating advances in computer science is to make some data available and let people experiment with it. With the incredibly cheap disks and scanners available today, this should be easier than ever. Unfortunately, what we gain with technology we are losing to law and economics. Many large databases are protected by copyright; few motion pictures, for example, are old enough to have gone out of copyright. Content owners generally refuse to grant permission for wide use of their material, whether out of greed or fear: they may have figured out how to get rich off their files of information or they may be afraid that somebody else might have. Similarly it is hard to get permission to digitize in-copyright books, no matter how long they have been out of print. Jim Gray once said to me, “May all your problems be technical.” In the 1960s I was paying people to key in aeronautical abstracts. It never occurred to us that we should be asking permission of the journals involved (I think what we did would qualify as fair use, but we didn’t even think about it). Today I could scan such things much more easily, but I would not be able to get permission. Am I better off or worse off?

There are now some 22 million chemical substances in the Chemical Abstracts Service Registry and 7 million reactions. New substances continue to intrigue chemists and cause research on new reactions, with of course enormous interest in biochemistry both for medicine and agriculture. Similarly, we keep adding data to the Web, and new kinds of information (photographs of dolphins, biological flora, and countless other things) can push computer scientists to new algorithms. In both cases, synthesis of specific instances into concepts is a crucial problem. As we see more and more kinds of data, we learn more about how to extract meaning from it, and how to present it, and we develop a need for new algorithms to implement this knowledge. As the data gets bigger, we learn more about optimization. As it gets more complex, we learn more about representation. And as it gets more useful, we learn more about visualization and interfaces, and we provide better service to society.

HISTORY AND THE FUNDAMENTALS OF COMPUTER SCIENCE

Edward L. Ayers, University of Virginia

We might begin with a thought experiment: What is history? Many people, I’ve discovered, think of it as books and the things in books. That’s certainly the explicit form in which we usually confront history. Others, thinking less literally, might think of history as stories about the past; that would open us to oral history, family lore, movies, novels, and the other forms in which we get most of our history.

All these images are wrong, of course, in the same way that images of atoms as little solar systems are wrong, or pictures of evolution as profiles of ever taller and more upright apes and people are wrong. They are all models, radically simplified, that allow us to think about such things in the exceedingly small amounts of time that we allot to these topics.

The same is true for history, which is easiest to envision as technological progress, say, or westward expansion, of the emergence of freedom—or of increasing alienation, exploitation of the environment, or the growth of intrusive government.

Those of us who think about specific aspects of society or nature for a living, of course, are never satisfied with the stories that suit the purposes of everyone else so well.

We are troubled by all the things that don’t fit, all the anomalies, variance, and loose ends. We demand more complex measurement, description, and fewer smoothing metaphors and lowest common denominators.

Thus, to scientists, atoms appear as clouds of probability; evolution appears as a branching, labyrinthine bush in which some branches die out and others diversify. It can certainly be argued that past human experience is as complex as anything in nature and likely much more so, if by complexity we mean numbers of components, variability of possibilities, and unpredictability of outcomes.

Yet our means of conveying that complexity remain distinctly analog: the story, the metaphor, the generalization. Stories can be wonderfully complex, of course, but they are complex in specific ways: of implication, suggestion, evocation. That’s what people love and what they remember.

But maybe there is a different way of thinking about the past: as information. In fact, information is all we have. Studying the past is like studying scientific processes for which you have the data but cannot run the experiment again, in which there is no control, and in which you can never see the actual process you are describing and analyzing. All we have is information in various forms: words in great abundance, billions of numbers, millions of images, some sounds and buildings, artifacts.

The historian’s goal, it seems to me, should be to account for as much of the complexity embedded in that information as we can. That, it appears, is what scientists do, and it has served them well.

And how has science accounted for ever-increasing amounts of complexity in the information they use? Through ever more sophisticated instruments. The connection between computer science and history could be analogous to that between telescopes and stars, microscopes and cells. We could be on the cusp of a new understanding of the patterns of complexity in human behavior of the past.

The problem may be that there is too much complexity in that past, or too much static, or too much silence. In the sciences, we’ve learned how to filter, infer, use indirect evidence, and fill in the gaps, but we have a much more literal approach to the human past.

We have turned to computer science for tasks of more elaborate description, classification, representation. The digital archive my colleagues and I have built, the Valley of the Shadow Project, permits the manipulation of millions of discrete pieces of evidence about two communities in the era of the American Civil War. It uses sorting mechanisms, hypertextual display, animation, and the like to allow people to handle the evidence of this part of the past for themselves. This isn’t cutting-edge computer science, of course, but it’s darned hard and deeply disconcerting to some, for it seems to abdicate responsibility, to undermine authority, to subvert narrative, to challenge story.

Now, we’re trying to take this work to the next stage, to analysis. We have composed a journal article that employs an array of technologies, especially geographic information systems and statistical analysis in the creation of the evidence. The article presents its argument, evidence, and historiographical context as a complex textual, tabular, and graphical representation. XML offers a powerful means to structure text and XSL an even more powerful means to transform it and manipulate its presentation. The text is divided into sections called “statements,” each supported with “explanation.” Each explanation, in turn, is supported by evidence and connected to relevant historiography.

Linkages, forward and backward, between evidence and narrative are central. The historiography can be automatically sorted by author, date, or title; the evidence can be arranged by date, topic, or type. Both evidence and historiographical entries are linked to the places in the analysis where they are invoked. The article is meant to be used online, but it can be printed in a fixed format with all the limitations and advantages of print.

So, what are the implications of thinking of the past in the hardheaded sense of admitting that all we really have of the past is information? One implication might be great humility, since all we have for most

of the past are the fossils of former human experience, words frozen in ink and images frozen in line and color. Another implication might be hubris: if we suddenly have powerful new instruments, might we be on the threshold of a revolution in our understanding of the past? We’ve been there before.

A connection between history and social science was tried before, during the first days of accessible computers. Historians taught themselves statistical methods and even programming languages so that they could adopt the techniques, models, and insights of sociology and political science. In the 1950s and 1960s the creators of the new political history called on historians to emulate the precision, explicitness, replicability, and inclusivity of the quantitative social sciences. For two decades that quantitative history flourished, promising to revolutionize the field. And to a considerable extent it did: it changed our ideas of social mobility, political identification, family formation, patterns of crime, economic growth, and the consequences of ethnic identity. It explicitly linked the past to the present and held out a history of obvious and immediate use.

But that quantitative social science history collapsed suddenly, the victim of its own inflated claims, limited method and machinery, and changing academic fashion. By the mid-1980s, history, along with many of the humanities and social sciences, had taken the linguistic turn. Rather than software manuals and codebooks, graduate students carried books of French philosophy and German literary interpretation. The social science of choice shifted from sociology to anthropology; texts replaced tables. A new generation defined itself in opposition to social scientific methods just as energetically as an earlier generation had seen in those methods the best means of writing a truly democratic history. The first computer revolution largely failed.

The first effort at that history fell into decline in part because historians could not abide the distance between their most deeply held beliefs and what the statistical machinery permitted, the abstraction it imposed. History has traditionally been built around contingency and particularity, but the most powerful tools of statistics are built on sampling and extrapolation, on generalization and tendency. Older forms of social history talked about vague and sometimes dubious classifications in part because that was what the older technology of tabulation permitted us to see. It has become increasingly clear across the social sciences that such flat ways of describing social life are inadequate; satisfying explanations must be dynamic, interactive, reflexive, and subtle, refusing to reify structures of social life or culture. The new technology permits a new cross-fertilization.

Ironically, social science history faded just as computers became widely available, just as new kinds of social science history became feasible. No longer is there any need for white-coated attendants at huge mainframes

and expensive proprietary software. Rather than reducing people to rows and columns, searchable databases now permit researchers to maintain the identities of individuals in those databases and to represent entire populations rather than samples. Moreover, the record can now include things social science history could only imagine before the Web: completely indexed newspapers, with the original readable on the screen; completely searchable letters and diaries by the thousands; and interactive maps with all property holders identified and linked to other records. Visualization of patterns in the data, moreover, far outstrips the possibilities of numerical calculation alone. Manipulable histograms, maps, and time lines promise a social history that is simultaneously sophisticated and accessible. We have what earlier generations of social science historians dreamed of: a fast and widely accessible network linked to cheap and powerful computers running common software with well-established standards for the handling of numbers, texts, and images. New possibilities of collaboration and cumulative research beckon. Perhaps the time is right to reclaim a worthy vision of a disciplined and explicit social scientific history that we abandoned too soon.

What does this have to do with computer science? Everything, it seems to me. If you want hard problems, historians have them. And what’s the hardest problem of all right now? The capture of the very information that is history. Can computer science imagine ways to capture historical information more efficiently? Can it offer ways to work with the spotty, broken, dirty, contradictory, nonstandardized information we work with?

The second hard problem is the integration of this disparate evidence in time and space, offering new precision, clarity, and verifiability, as well as opening new questions and new ways of answering them.

If we can think of these ways, then we face virtually limitless possibilities. Is there a more fundamental challenge or opportunity for computer science than helping us to figure out human society over human time?

This page intentionally left blank.

Computer Science: Reflections on the Field, Reflections from the Field provides a concise characterization of key ideas that lie at the core of computer science (CS) research. The book offers a description of CS research recognizing the richness and diversity of the field. It brings together two dozen essays on diverse aspects of CS research, their motivation and results. By describing in accessible form computer science’s intellectual character, and by conveying a sense of its vibrancy through a set of examples, the book aims to prepare readers for what the future might hold and help to inspire CS researchers in its creation.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Find Study Materials for

  • Business Studies
  • Combined Science
  • Computer Science
  • Engineering
  • English Literature
  • Environmental Science
  • Human Geography
  • Macroeconomics
  • Microeconomics
  • Social Studies
  • Browse all subjects
  • Read our Magazine

Create Study Materials

Dive deep into the realm of Computer Science with this comprehensive guide about data representation. Data representation, a fundamental concept in computing, refers to the various ways that information can be expressed digitally. The interpretation of this data plays a critical role in decision-making procedures in businesses and scientific research. Gain an understanding of binary data representation, the backbone of digital computing. 

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Data Representation in Computer Science

Want to get better grades, get free, full access to:.

  • Explanations
  • Study Planner
  • Textbook solutions
  • StudySmarter AI
  • Textbook Solutions
  • Algorithms in Computer Science
  • Computer Network
  • Computer Organisation and Architecture
  • Computer Programming
  • Computer Systems
  • Analogue Signal
  • Binary Arithmetic
  • Binary Conversion
  • Binary Number System
  • Bitmap Graphics
  • Data Compression
  • Data Encoding
  • Digital Signal
  • Hexadecimal Conversion
  • Hexadecimal Number System
  • Huffman Coding
  • Image Representation
  • Lempel Ziv Welch
  • Logic Circuits
  • Lossless Compression
  • Lossy Compression
  • Numeral Systems
  • Quantisation
  • Run Length Encoding
  • Sample Rate
  • Sampling Informatics
  • Sampling Theorem
  • Signal Processing
  • Sound Representation
  • Two's Complement
  • What is ASCII
  • What is Unicode
  • What is Vector Graphics
  • Data Structures
  • Functional Programming
  • Issues in Computer Science
  • Problem Solving Techniques
  • Theory of Computation

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Dive deep into the realm of Computer Science with this comprehensive guide about data representation. Data representation, a fundamental concept in computing, refers to the various ways that information can be expressed digitally. The interpretation of this data plays a critical role in decision-making procedures in businesses and scientific research. Gain an understanding of binary data representation, the backbone of digital computing.

Binary data representation uses a system of numerical notation that has just two possible states represented by 0 and 1 (also known as 'binary digits' or 'bits'). Grasp the practical applications of binary data representation and explore its benefits.

Finally, explore the vast world of data model representation. Different types of data models offer a variety of ways to organise data in databases . Understand the strategic role of data models in data representation, and explore how they are used to design efficient database systems. This comprehensive guide positions you at the heart of data representation in Computer Science .

Understanding Data Representation in Computer Science

In the realm of Computer Science, data representation plays a paramount role. It refers to the methods or techniques used to represent, or express information in a computer system. This encompasses everything from text and numbers to images, audio, and beyond.

Basic Concepts of Data Representation

Data representation in computer science is about how a computer interprets and functions with different types of information. Different information types require different representation techniques. For instance, a video will be represented differently than a text document.

When working with various forms of data, it is important to grasp a fundamental understanding of:

  • Binary system
  • Bits and Bytes
  • Number systems: decimal, hexadecimal
  • Character encoding: ASCII, Unicode

Data in a computer system is represented in binary format, as a sequence of 0s and 1s, denoting 'off' and 'on' states respectively. The smallest component of this binary representation is known as a bit , which stands for 'binary digit'.

A byte , on the other hand, generally encompasses 8 bits. An essential aspect of expressing numbers and text in a computer system, are the decimal and hexadecimal number systems, and character encodings like ASCII and Unicode.

Role of Data Representation in Computer Science

Data Representation is the foundation of computing systems and affects both hardware and software designs. It enables both logic and arithmetic operations to be performed in the binary number system , on which computers are based.

An illustrative example of the importance of data representation is when you write a text document. The characters you type are represented in ASCII code - a set of binary numbers. Each number is sent to the memory, represented as electrical signals; everything you see on your screen is a representation of the underlying binary data.

Computing operations and functions, like searching, sorting or adding, rely heavily on appropriate data representation for efficient execution. Also, computer programming languages and compilers require a deep understanding of data representation to successfully interpret and execute commands.

As technology evolves, so too does our data representation techniques. Quantum computing, for example, uses quantum bits or "qubits". A qubit can represent a 0, 1, or both at the same time, thanks to the phenomenon of quantum superposition.

Types of Data Representation

In computer systems , various types of data representation techniques are utilized:

Numbers can be represented in real, integer, and rational formats. Text is represented by using different types of encodings, such as ASCII or Unicode. Images can be represented in various formats like JPG, PNG, or GIF, each having its specific rendering algorithm and compression techniques.

Tables are another important way of data representation, especially in the realm of databases .

This approach is particularly effective in storing structured data, making information readily accessible and easy to handle. By understanding the principles of data representation, you can better appreciate the complexity and sophistication behind our everyday interactions with technology.

Data Representation and Interpretation

To delve deeper into the world of Computer Science, it is essential to study the intricacies of data representation and interpretation. While data representation is about the techniques through which data are expressed or encoded in a computer system, data interpretation refers to the computing machines' ability to understand and work with these encoded data.

Basics of Data Representation and Interpretation

The core of data representation and interpretation is founded on the binary system. Represented by 0s and 1s, the binary system signifies the 'off' and 'on' states of electric current, seamlessly translating them into a language comprehensible to computing hardware.

For instance, \[ 1101 \, \text{in binary is equivalent to} \, 13 \, \text{in decimal} \] This interpretation happens consistently in the background during all of your interactions with a computer system.

Now, try imagining a vast array of these binary numbers. It could get overwhelming swiftly. To bring order and efficiency to this chaos, binary digits (or bits) are grouped into larger sets like bytes, kilobytes, and so on. A single byte , the most commonly used set, contains eight bits. Here's a simplified representation of how bits are grouped:

  • 1 bit = Binary Digit
  • 8 bits = 1 byte
  • 1024 bytes = 1 kilobyte (KB)
  • 1024 KB = 1 megabyte (MB)
  • 1024 MB = 1 gigabyte (GB)
  • 1024 GB = 1 terabyte (TB)

However, the binary system isn't the only number system pivotal for data interpretation. Both decimal (base 10) and hexadecimal (base 16) systems play significant roles in processing numbers and text data. Moreover, translating human-readable language into computer interpretable format involves character encodings like ASCII (American Standard Code for Information Interchange) and Unicode.

These systems interpret alphabetic characters, numerals, punctuation marks, and other common symbols into binary code. For example, the ASCII value for capital 'A' is 65, which corresponds to \(01000001\) in binary.

In the world of images, different encoding schemes interpret pixel data. JPG, PNG, and GIF, being common examples of such encoded formats. Similarly, audio files utilise encoding formats like MP3 and WAV to store sound data.

Importance of Data Interpretation in Computer Science

Understanding data interpretation in computer science is integral to unlocking the potential of any computing process or system. When coded data is input into a system, your computer must interpret this data accurately to make it usable.

Consider typing a document in a word processor like Microsoft Word. As you type, each keystroke is converted to an ASCII code by your keyboard. Stored as binary, these codes are transmitted to the active word processing software. The word processor interprets these codes back into alphabetic characters, enabling the correct letters to appear on your screen, as per your keystrokes.

Data interpretation is not just an isolated occurrence, but a recurring necessity - needed every time a computing process must deal with data. This is no different when you're watching a video, browsing a website, or even when the computer boots up.

Rendering images and videos is an ideal illustration of the importance of data interpretation.

Digital photos and videos are composed of tiny dots, or pixels, each encoded with specific numbers to denote colour composition and intensity. Every time you view a photo or play a video, your computer interprets the underlying data and reassembles the pixels to form a comprehensible image or video sequence on your screen.

Data interpretation further extends to more complex territories like facial recognition, bioinformatics, data mining, and even artificial intelligence. In these applications, data from various sources is collected, converted into machine-acceptable format, processed, and interpreted to provide meaningful outputs.

In summary, data interpretation is vital for the functionality, efficiency, and progress of computer systems and the services they provide. Understanding the basics of data representation and interpretation, thereby, forms the backbone of computer science studies.

Delving into Binary Data Representation

Binary data representation is the most fundamental and elementary form of data representation in computing systems. At the lowermost level, every piece of information processed by a computer is converted into a binary format.

Understanding Binary Data Representation

Binary data representation is based on the binary numeral system. This system, also known as the base-2 system, uses only two digits - 0 and 1 to represent all kinds of data. The concept dates back to the early 18th-century mathematics and has since found its place as the bedrock of modern computers. In computing, the binary system's digits are called bits (short for 'binary digit'), and they are the smallest indivisible unit of data.

Each bit can be in one of two states representing 0 ('off') or 1 ('on'). Formally, the binary number \( b_n b_{n-1} ... b_2 b_1 b_0 \), is interpreted using the formula: \[ B = b_n \times 2^n + b_{n-1} \times 2^{n-1} + ... + b_2 \times 2^2 + b_1 \times 2^1 + b_0 \times 2^0 \] Where \( b_i \) are the binary digits and \( B \) is the corresponding decimal number.

For example, for the binary number 1011, the process will look like this: \[ B = 1*2^3 + 0*2^2 + 1*2^1 + 1*2^0 \]

This mathematical translation makes it possible for computing machines to perform complex operations even though they understand only the simple language of 'on' and 'off' signals.

When representing character data, computing systems use binary-encoded formats. ASCII and Unicode are common examples. In ASCII, each character is assigned a unique 7-bit binary code. For example, the binary representation for the uppercase letter 'A' is 0100001. Interpreting such encoded data back to a human-readable format is a core responsibility of computing systems and forms the basis for the exchange of digital information globally.

Practical Application of Binary Data Representation

Binary data representation is used across every single aspect of digital computing. From simple calculations performed by a digital calculator to the complex animations rendered in a high-definition video game, binary data representation is at play in the background.

Consider a simple calculation like 7+5. When you input this into a digital calculator, the numbers and the operation get converted into their binary equivalents. The microcontroller inside the calculator processes these binary inputs, performs the sum operation in binary, and finally, returns the result as a binary output. This binary output is then converted back into a decimal number which you see displayed on the calculator screen.

When it comes to text files, every character typed into the document is converted to its binary equivalent using a character encoding system, typically ASCII or Unicode. It is then saved onto your storage device as a sequence of binary digits.

Similarly, for image files, every pixel is represented as a binary number. Each binary number, called a 'bit map', specifies the colour and intensity of each pixel. When you open the image file, the computer reads the binary data and presents it on your screen as a colourful, coherent image. The concept extends even further into the internet and network communications, data encryption , data compression , and more.

When you are downloading a file over the internet, it is sent to your system as a stream of binary data. The web browser on your system receives this data, recognizes the type of file and accordingly interprets the binary data back into the intended format.

In essence, every operation that you can perform on a computer system, no matter how simple or complex, essentially boils down to large-scale manipulation of binary data. And that sums up the practical application and universal significance of binary data representation in digital computing.

Binary Tree Representation in Data Structures

Binary trees occupy a central position in data structures , especially in algorithms and database designs. As a non-linear data structure, a binary tree is essentially a tree-like model where each node has a maximum of two children, often distinguished as 'left child' and 'right child'.

Fundamentals of Binary Tree Representation

A binary tree is a tree data structure where each parent node has no more than two children, typically referred to as the left child and the right child. Each node in the binary tree contains:

  • A data element
  • Pointer or link to the left child
  • Pointer or link to the right child

The topmost node of the tree is known as the root. The nodes without any children, usually dwelling at the tree's last level, are known as leaf nodes or external nodes. Binary trees are fundamentally differentiated by their properties and the relationships among the elements. Some types include:

  • Full Binary Tree: A binary tree where every node has 0 or 2 children.
  • Complete Binary Tree: A binary tree where all levels are completely filled except possibly the last level, which is filled from left to right.
  • Perfect Binary Tree: A binary tree where all internal nodes have two children and all leaves are at the same level.
  • Skewed Binary Tree: A binary tree where every node has only left child or only right child.

In a binary tree, the maximum number of nodes \( N \) at any level \( L \) can be calculated using the formula \( N = 2^{L-1} \). Conversely, for a tree with \( N \) nodes, the maximum height or maximum number of levels is \( \lceil Log_2(N+1) \rceil \).

Binary tree representation employs arrays and linked lists. Sometimes, an implicit array-based representation suffices, especially for complete binary trees. The root is stored at index 0, while for each node at index \( i \), the left child is stored at index \( 2i + 1 \), and the right child at \( 2i + 2 \).

However, the most common representation is the linked-node representation that utilises a node-based structure. Each node in the binary tree is a data structure that contains a data field and two pointers pointing to its left and right child nodes.

Usage of Binary Tree in Data Structures

Binary trees are typically used for expressing hierarchical relationships, and thus find application across various areas in computer science. In mathematical applications, binary trees are ideal for expressing certain elements' relationships.

For example, binary trees are used to represent expressions in arithmetic and Boolean algebra.

Consider an arithmetic expression like (4 + 5) * 6. This can be represented using a binary tree where the operators are parent nodes, and the operands are children. The expression gets evaluated by performing operations in a specific tree traversal order.

Among the more complex usages, binary search trees — a variant of binary trees — are employed in database engines and file systems .

  • Binary Heaps, a type of binary tree, are used as an efficient priority queue in many algorithms like Dijkstra's algorithm and the Heap Sort algorithm.
  • Binary trees are also used in creating binary space partition trees, which are used for quickly finding objects in games and 3D computer graphics.
  • Syntax trees used in compilers are a direct application of binary trees. They help translate high-level language expressions into machine code.
  • Huffman Coding Trees, which are used in data compression algorithms, are another variant of binary trees.

The theoretical underpinnings of all these binary tree applications are the traversal methods and operations, such as insertion and deletion, which are intrinsic to the data structure.

Binary trees are also used in advanced machine-learning algorithms. Decision Tree is a type of binary tree that uses a tree-like model of decisions. It is one of the most successful forms of supervised learning algorithms in data mining and machine learning.

The advantages of a binary tree lie in their efficient organisation and quick data access, making them a cornerstone of many complex data structures and algorithms. Understanding the workings and fundamentals of binary tree representation will equip you with a stronger pillaring in the world of data structures and computer science in general.

Grasping Data Model Representation

When dealing with vast amounts of data, organising and understanding the relationships between different pieces of data is of utmost importance. This is where data model representation comes into play in computer science. A data model provides an abstract, simplified view of real-world data. It defines the data elements and the relationships among them, providing an organised and consistent representation of data.

Exploring Different Types of Data Models

Understanding the intricacies of data models will equip you with a solid foundation in making sense of complex data relationships. Some of the most commonly used data models include:

  • Hierarchical Model
  • Network Model
  • Relational Model
  • Entity-Relationship Model
  • Object-Oriented Model
  • Semantic Model

The Hierarchical Model presents data in a tree-like structure, where each record has one parent record and many children. This model is largely applied in file systems and XML documents. The limitations are that this model does not allow a child to have multiple parents, thus limiting its real-world applications.

The Network Model, an enhancement of the hierarchical model, allows a child node to have multiple parent nodes, resulting in a graph structure. This model is suitable for representing complex relationships but comes with its own challenges such as iteration and navigation, which can be intricate.

The Relational Model, created by E.F. Codd, uses a tabular structure to depict data and their relationships. Each row represents a collection of related data values, and each column represents a particular attribute. This is the most widely used model due to its simplicity and flexibility.

The Entity-Relationship Model illustrates the conceptual view of a database. It uses three basic concepts: Entities, Attributes (the properties of these entities), and Relationships among entities. This model is most commonly used in database design .

The Object-Oriented Model goes a step further and adds methods (functions) to the entities besides attributes. This data model integrates the data and the operations applicable to the data into a single component known as an object. Such an approach enables encapsulation, a significant characteristic of object-oriented programming.

The Semantic Model aims to capture more meaning of data by defining the nature of data and the relationships that exist between them. This model is beneficial in representing complex data interrelations and is used in expert systems and artificial intelligence fields.

The Role of Data Models in Data Representation

Data models provide a method for the efficient representation and interaction of data elements, thus forming an integral part of any database system. They provide the theoretical foundation for designing databases, thereby playing an essential role in the development of applications.

A data model is a set of concepts and rules for formally describing and representing real-world data. It serves as a blueprint for designing and implementing databases and assists communication between system developers and end-users.

Databases serve as vast repositories, storing a plethora of data. Such vast data needs effective organisation and management for optimal access and usage. Here, data models come into play, providing a structural view of data, thereby enabling the efficient organisation, storage and retrieval of data.

Consider a library system. The system needs to record data about books, authors, publishers, members, and loans. All these items represent different entities. Relationships exist between these entities. For example, a book is published by a publisher, an author writes a book, or a member borrows a book. Using an Entity-Relationship Model, we can effectively represent all these entities and relationships, aiding the library system's development process.

Designing such a model requires careful consideration of what data is required to be stored and how different data elements relate to each other. Depending on their specific requirements, database developers can select the most suitable data model representation. This choice can significantly affect the functionality, performance, and scalability of the resulting databases.

From decision-support systems and expert systems to distributed databases and data warehouses, data models find a place in various applications.

Modern NoSQL databases often use several models simultaneously to meet their needs. For example, a document-based model for unstructured data and a column-based model for analyzing large data sets. In this way, data models continue to evolve and adapt to the burgeoning needs of the digital world.

Therefore, acquiring a strong understanding of data model representations and their roles forms an integral part of the database management and design process. It empowers you with the ability to handle large volumes of diverse data efficiently and effectively.

Data Representation - Key takeaways

  • Data representation refers to techniques used to express information in computer systems, encompassing text, numbers, images, audio, and more.
  • Data Representation is about how computers interpret and function with different information types, including binary systems, bits and bytes, number systems (decimal, hexadecimal) and character encoding (ASCII, Unicode).
  • Binary Data Representation is the conversion of all kinds of information processed by a computer into binary format.
  • Express hierarchical relationships across various areas in computer science.
  • Represent relationships in mathematical applications, used in database engines, file systems, and priority queues in algorithms.
  • Data Model Representation is an abstract, simplified view of real-world data that defines the data elements, and their relationships and provides a consistently organised way of representing data.

Frequently Asked Questions about Data Representation in Computer Science

--> what is data representation.

Data representation is the method used to encode information into a format that can be used and understood by computer systems. It involves the conversion of real-world data, such as text, images, sounds, numbers, into forms like binary or hexadecimal which computers can process. The choice of representation can affect the quality, accuracy and efficiency of data processing. Precisely, it's how computer systems interpret and manipulate data.

--> What does data representation mean?

Data representation refers to the methods or techniques used to express, display or encode data in a readable format for a computer or a user. This could be in different forms such as binary, decimal, or alphabetic forms. It's crucial in computer science since it links the abstract world of thought and concept to the concrete domain of signals, signs and symbols. It forms the basis of information processing and storage in contemporary digital computing systems.

--> Why is data representation important?

Data representation is crucial as it allows information to be processed, transferred, and interpreted in a meaningful way. It helps in organising and analysing data effectively, providing insights for decision-making processes. Moreover, it facilitates communication between the computer system and the real world, enabling computing outcomes to be understood by users. Finally, accurate data representation ensures integrity and reliability of the data, which is vital for effective problem solving.

--> How to make a graphical representation of data?

To create a graphical representation of data, first collect and organise your data. Choose a suitable form of data representation such as bar graphs, pie charts, line graphs, or histograms depending on the type of data and the information you want to display. Use a data visualisation tool or software such as Excel or Tableau to help you generate the graph. Always remember to label your axes and provide a title and legend if necessary.

--> What is data representation in statistics?

Data representation in statistics refers to the various methods used to display or present data in meaningful ways. This often includes the use of graphs, charts, tables, histograms or other visual tools that can help in the interpretation and analysis of data. It enables efficient communication of information and helps in drawing statistical conclusions. Essentially, it's a way of providing a visual context to complex datasets, making the data easily understandable.

Test your knowledge with multiple choice flashcards

What is data representation in computer science?

What are some of the fundamental concepts to understand when dealing with data representation?

Why is data representation crucial in computer science?

Your score:

Smart Exams

Join the StudySmarter App and learn efficiently with millions of flashcards and more!

Learn with 441 data representation in computer science flashcards in the free studysmarter app.

Already have an account? Log in

Data representation in computer science refers to the methods used to express information in a computer system. It's how a computer interprets and functions with different information types, ranging from text and numbers to images, audio, and beyond.

When dealing with data representation, one should understand the binary system, bits and bytes, number systems like decimal and hexadecimal, and character encoding such as ASCII and Unicode.

Data representation forms the foundation of computer systems and affects hardware and software designs. It enables logic and arithmetic operations to be performed in the binary number system, and is integral to computer programming languages and compilers.

What is the relationship between data representation and the binary system in computer systems?

The core of data representation in computer systems is based on the binary system, which uses 0s and 1s, representing 'off' and 'on' states of electric current. These translate into a language that computer hardware can understand.

What are the larger sets into which binary digits or bits are grouped for efficiency and order?

Binary digits or bits are grouped into larger sets like bytes, kilobytes, MB, GB and TB. For instance, 8 bits make up a byte, and 1024 bytes make up a kilobyte.

How does data interpretation contribute to the functionality of computer systems and services?

Data interpretation is vital as it allows coded data to be accurately translated into a usable format for any computer process or system. It is a recurring necessity whenever a computing process has to deal with data.

Flashcards

of the users don't pass the Data Representation in Computer Science quiz! Will you pass the quiz?

How would you like to learn this content?

Free computer-science cheat sheet!

Everything you need to know on . A perfect summary so you can easily remember everything.

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Smart Note-Taking

Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

This is still free to read, it's not a paywall.

You need to register to keep reading, create a free account to save this explanation..

Save explanations to your personalised space and access them anytime, anywhere!

By signing up, you agree to the Terms and Conditions and the Privacy Policy of StudySmarter.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Getuplearn.com

Getuplearn.com

Data Representation in Computer

Data Representation in Computer: Number Systems, Characters, Audio, Image and Video

Table of Contents

  • 1 What is Data Representation in Computer?
  • 2.1 Binary Number System
  • 2.2 Octal Number System
  • 2.3 Decimal Number System
  • 2.4 Hexadecimal Number System
  • 3.4 Unicode
  • 4 Data Representation of Audio, Image and Video
  • 5.1 What is number system with example?

What is Data Representation in Computer?

A computer uses a fixed number of bits to represent a piece of data which could be a number, a character, image, sound, video, etc. Data representation is the method used internally to represent data in a computer. Let us see how various types of data can be represented in computer memory.

Before discussing data representation of numbers, let us see what a number system is.

Number Systems

Number systems are the technique to represent numbers in the computer system architecture, every value that you are saving or getting into/from computer memory has a defined number system.

A number is a mathematical object used to count, label, and measure. A number system is a systematic way to represent numbers. The number system we use in our day-to-day life is the decimal number system that uses 10 symbols or digits.

The number 289 is pronounced as two hundred and eighty-nine and it consists of the symbols 2, 8, and 9. Similarly, there are other number systems. Each has its own symbols and method for constructing a number.

A number system has a unique base, which depends upon the number of symbols. The number of symbols used in a number system is called the base or radix of a number system.

Let us discuss some of the number systems. Computer architecture supports the following number of systems:

Binary Number System

Octal number system, decimal number system, hexadecimal number system.

Number Systems

A Binary number system has only two digits that are 0 and 1. Every number (value) represents 0 and 1 in this number system. The base of the binary number system is 2 because it has only two digits.

The octal number system has only eight (8) digits from 0 to 7. Every number (value) represents with 0,1,2,3,4,5,6 and 7 in this number system. The base of the octal number system is 8, because it has only 8 digits.

The decimal number system has only ten (10) digits from 0 to 9. Every number (value) represents with 0,1,2,3,4,5,6, 7,8 and 9 in this number system. The base of decimal number system is 10, because it has only 10 digits.

A Hexadecimal number system has sixteen (16) alphanumeric values from 0 to 9 and A to F. Every number (value) represents with 0,1,2,3,4,5,6, 7,8,9,A,B,C,D,E and F in this number system. The base of the hexadecimal number system is 16, because it has 16 alphanumeric values.

Here A is 10, B is 11, C is 12, D is 13, E is 14 and F is 15 .

Data Representation of Characters

There are different methods to represent characters . Some of them are discussed below:

Data Representation of Characters

The code called ASCII (pronounced ‘􀀏’.S-key”), which stands for American Standard Code for Information Interchange, uses 7 bits to represent each character in computer memory. The ASCII representation has been adopted as a standard by the U.S. government and is widely accepted.

A unique integer number is assigned to each character. This number called ASCII code of that character is converted into binary for storing in memory. For example, the ASCII code of A is 65, its binary equivalent in 7-bit is 1000001.

Since there are exactly 128 unique combinations of 7 bits, this 7-bit code can represent only128 characters. Another version is ASCII-8, also called extended ASCII, which uses 8 bits for each character, can represent 256 different characters.

For example, the letter A is represented by 01000001, B by 01000010 and so on. ASCII code is enough to represent all of the standard keyboard characters.

It stands for Extended Binary Coded Decimal Interchange Code. This is similar to ASCII and is an 8-bit code used in computers manufactured by International Business Machines (IBM). It is capable of encoding 256 characters.

If ASCII-coded data is to be used in a computer that uses EBCDIC representation, it is necessary to transform ASCII code to EBCDIC code. Similarly, if EBCDIC coded data is to be used in an ASCII computer, EBCDIC code has to be transformed to ASCII.

ISCII stands for Indian Standard Code for Information Interchange or Indian Script Code for Information Interchange. It is an encoding scheme for representing various writing systems of India. ISCII uses 8-bits for data representation.

It was evolved by a standardization committee under the Department of Electronics during 1986-88 and adopted by the Bureau of Indian Standards (BIS). Nowadays ISCII has been replaced by Unicode.

Using 8-bit ASCII we can represent only 256 characters. This cannot represent all characters of written languages of the world and other symbols. Unicode is developed to resolve this problem. It aims to provide a standard character encoding scheme, which is universal and efficient.

It provides a unique number for every character, no matter what the language and platform be. Unicode originally used 16 bits which can represent up to 65,536 characters. It is maintained by a non-profit organization called the Unicode Consortium.

The Consortium first published version 1.0.0 in 1991 and continues to develop standards based on that original work. Nowadays Unicode uses more than 16 bits and hence it can represent more characters. Unicode can represent characters in almost all written languages of the world.

Data Representation of Audio, Image and Video

In most cases, we may have to represent and process data other than numbers and characters. This may include audio data, images, and videos. We can see that like numbers and characters, the audio, image, and video data also carry information.

We will see different file formats for storing sound, image, and video .

Multimedia data such as audio, image, and video are stored in different types of files. The variety of file formats is due to the fact that there are quite a few approaches to compressing the data and a number of different ways of packaging the data.

For example, an image is most popularly stored in Joint Picture Experts Group (JPEG ) file format. An image file consists of two parts – header information and image data. Information such as the name of the file, size, modified data, file format, etc. is stored in the header part.

The intensity value of all pixels is stored in the data part of the file. The data can be stored uncompressed or compressed to reduce the file size. Normally, the image data is stored in compressed form. Let us understand what compression is.

Take a simple example of a pure black image of size 400X400 pixels. We can repeat the information black, black, …, black in all 16,0000 (400X400) pixels. This is the uncompressed form, while in the compressed form black is stored only once and information to repeat it 1,60,000 times is also stored.

Numerous such techniques are used to achieve compression. Depending on the application, images are stored in various file formats such as bitmap file format (BMP), Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Portable (Public) Network Graphic (PNG).

What we said about the header file information and compression is also applicable for audio and video files. Digital audio data can be stored in different file formats like WAV, MP3, MIDI, AIFF, etc. An audio file describes a format, sometimes referred to as the ‘container format’, for storing digital audio data.

For example, WAV file format typically contains uncompressed sound and MP3 files typically contain compressed audio data. The synthesized music data is stored in MIDI(Musical Instrument Digital Interface) files.

Similarly, video is also stored in different files such as AVI (Audio Video Interleave) – a file format designed to store both audio and video data in a standard package that allows synchronous audio with video playback, MP3, JPEG-2, WMV, etc.

FAQs About Data Representation in Computer

What is number system with example.

Let us discuss some of the number systems. Computer architecture supports the following number of systems: 1. Binary Number System 2. Octal Number System 3. Decimal Number System 4. Hexadecimal Number System

Related posts:

  • 10 Types of Computers | History of Computers, Advantages
  • What is Microprocessor? Evolution of Microprocessor, Types, Features
  • What is operating system? Functions, Types, Types of User Interface
  • What is Cloud Computing? Classification, Characteristics, Principles, Types of Cloud Providers
  • What is Debugging? Types of Errors
  • What are Functions of Operating System? 6 Functions
  • What is Flowchart in Programming? Symbols, Advantages, Preparation
  • Advantages and Disadvantages of Flowcharts
  • What is C++ Programming Language? C++ Character Set, C++ Tokens
  • What are C++ Keywords? Set of 59 keywords in C ++
  • What are Data Types in C++? Types
  • What are Operators in C? Different Types of Operators in C
  • What are Expressions in C? Types
  • What are Decision Making Statements in C? Types
  • Types of Storage Devices, Advantages, Examples

data representation computer science questions

Computer Scienced

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Praxis Core Math

Course: praxis core math   >   unit 1, data representations | lesson.

  • Data representations | Worked example
  • Center and spread | Lesson
  • Center and spread | Worked example
  • Random sampling | Lesson
  • Random sampling | Worked example
  • Scatterplots | Lesson
  • Scatterplots | Worked example
  • Interpreting linear models | Lesson
  • Interpreting linear models | Worked example
  • Correlation and Causation | Lesson
  • Correlation and causation | Worked example
  • Probability | Lesson
  • Probability | Worked example

What are data representations?

  • How much of the data falls within a specified category or range of values?
  • What is a typical value of the data?
  • How much spread is in the data?
  • Is there a trend in the data over time?
  • Is there a relationship between two variables?

What skills are tested?

  • Matching a data set to its graphical representation
  • Matching a graphical representation to a description
  • Using data representations to solve problems

How are qualitative data displayed?

  • A vertical bar chart lists the categories of the qualitative variable along a horizontal axis and uses the heights of the bars on the vertical axis to show the values of the quantitative variable. A horizontal bar chart lists the categories along the vertical axis and uses the lengths of the bars on the horizontal axis to show the values of the quantitative variable. This display draws attention to how the categories rank according to the amount of data within each. Example The heights of the bars show the number of students who want to study each language. Using the bar chart, we can conclude that the greatest number of students want to study Mandarin and the least number of students want to study Latin.
  • A pictograph is like a horizontal bar chart but uses pictures instead of the lengths of bars to represent the values of the quantitative variable. Each picture represents a certain quantity, and each category can have multiple pictures. Pictographs are visually interesting, but require us to use the legend to convert the number of pictures to quantitative values. Example Each represents 40 ‍   students. The number of pictures shows the number of students who want to study each language. Using the pictograph, we can conclude that twice as many students want to study French as want to study Latin.
  • A circle graph (or pie chart) is a circle that is divided into as many sections as there are categories of the qualitative variable. The area of each section represents, for each category, the value of the quantitative data as a fraction of the sum of values. The fractions sum to 1 ‍   . Sometimes the section labels include both the category and the associated value or percent value for that category. Example The area of each section represents the fraction of students who want to study that language. Using the circle graph, we can conclude that just under 1 2 ‍   the students want to study Mandarin and about 1 3 ‍   want to study Spanish.

How are quantitative data displayed?

  • Dotplots use one dot for each data point. The dots are plotted above their corresponding values on a number line. The number of dots above each specific value represents the count of that value. Dotplots show the value of each data point and are practical for small data sets. Example Each dot represents the typical travel time to school for one student. Using the dotplot, we can conclude that the most common travel time is 10 ‍   minutes. We can also see that the values for travel time range from 5 ‍   to 35 ‍   minutes.
  • Histograms divide the horizontal axis into equal-sized intervals and use the heights of the bars to show the count or percent of data within each interval. By convention, each interval includes the lower boundary but not the upper one. Histograms show only totals for the intervals, not specific data points. Example The height of each bar represents the number of students having a typical travel time within the corresponding interval. Using the histogram, we can conclude that the most common travel time is between 10 ‍   and 15 ‍   minutes and that all typical travel times are between 5 ‍   and 40 ‍   minutes.

How are trends over time displayed?

How are relationships between variables displayed.

  • (Choice A)   A
  • (Choice B)   B
  • (Choice C)   C
  • (Choice D)   D
  • (Choice E)   E
  • Your answer should be
  • an integer, like 6 ‍  
  • a simplified proper fraction, like 3 / 5 ‍  
  • a simplified improper fraction, like 7 / 4 ‍  
  • a mixed number, like 1   3 / 4 ‍  
  • an exact decimal, like 0.75 ‍  
  • a multiple of pi, like 12   pi ‍   or 2 / 3   pi ‍  
  • a proper fraction, like 1 / 2 ‍   or 6 / 10 ‍  
  • an improper fraction, like 10 / 7 ‍   or 14 / 8 ‍  

Things to remember

  • When matching data to a representation, check that the values are graphed accurately for all categories.
  • When reporting data counts or fractions, be clear whether a question asks about data within a single category or a comparison between categories.
  • When finding the number or fraction of the data meeting a criteria, watch for key words such as or , and , less than , and more than .

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

PapersByTopic

IGCSE Computer Science By Topic

1.1 data representation, 1.1.1 binary systems, 1.1.2 hexadecimal, 1.1.3 data storage, 1.2 communication and internet technologies, 1.2.1 data transmission, 1.2.2 security aspects, 1.2.3 internet principles of operation, 1.3 hardware and software, 1.3.1 logic gates, 1.3.2 computer architecture and the fetch-execute cycle, 1.3.3 input devices, 1.3.4 output devices, 1.3.5 memory, storage devices and media, 1.3.6 operating systems, 1.3.7 high- and low-level languages and their translators, 1.4 security, 1.4.1 security, 2.1 algorithm design and problem-solving, 2.1.1 problem-solving and design, 2.1.2 pseudocode and flowcharts, 2.2 programming, 2.2.1 programming concepts, 2.2.2 data structures; arrays, 2.3 databases, privacy overview.

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.

BHARAT SKILLS

Data Representation in Computer MCQ [PDF] 40 Top Question

Data representation in computer MCQ . Questions and answers with PDF for all Computer Related Entrance & Competitive Exams Preparation. Helpful for Class 11, GATE, IBPS, SBI (Bank PO & Clerk), SSC, Railway etc.

Data Representation in Computer MCQ

1. To perform calculation on stored data computer, uses ……… number system. [SBI Clerk 2009]

(1) decimal

(2) hexadecimal

2. The number system based on ‘0’ and ‘1’ only, is known as

(1) binary system

(2) barter system

(3) number system

(4) hexadecimal system

3. Decimal number system is the group of ………… numbers.

(4) 0 to 9 and A to F

4. A hexadecimal number is represented by

(1) three digits

(2) four binary digits

(3) four digits

(4) All of these

5. A hexadigit can be represented by [IBPS Clerk 2012]

(1) three binary (consecutive) bits

(2) four binary (consecutive) bits

(3) eight binary (consecutive) bits

(4) sixteen binary (consecutive) bits

(5) None of the above

6. What type of information system would be recognised by digital circuits?

(1) Hexadecimal system        

(2) Binary system

(3) Both ‘1’ and ‘2’                 

(4) Only roman system

7. The binary equivalent of decimal number 98 is [IBPS Clerk 2012]

(1) 1110001

(2) 1110100

(3) 1100010

(4) 1111001

(5) None of these

8. What is the value of the binary number 101?

9. The binary number 10101 is equivalent to decimal number ………….

10. To convert binary number to decimal, multiply the all binary digits by power of

11. Which of the following is a hexadecimal number equal to 3431 octal number?

12. LSD stands for

(1) Long Significant Digit

(2) Least Significant Digit

(3) Large Significant Digit

(4) Longer Significant Decimal

13. How many values can be represented by a single byte?

14. Which of the following is not a computer code?

(4) UNICODE

15. MSD refers as

(1) Most Significant Digit

(2) Many Significant Digit

(3) Multiple Significant Digit

(4) Most Significant Decimal

 16. The most widely used code that represents each character as a unique 8-bit code is [IBPS Clerk 2011]

(2) UNICODE

17. Today’s mostly used coding system is/are

(4) Both ‘1’ and ‘2’

18. In EBCDIC code, maximum possible characters set size is

19. Code ‘EBCDIC’ that is used in computing stands for

(1) Extension BCD Information Code                         

(2) Extended BCD Information Code

(3) Extension BCD Interchange Conduct                   

(4) Extended BCD Interchange Conduct

20. Most commonly used codes for representing bits are

21. The coding system allows non-english characters and special characters to be represented

22. Which of the following is invalid hexadecimal number?

 23. Gate having output 1 only when one of its input is 1 is called

 24. Gate is also known as inverter.

25. The only function of NOT gate is to ……..

(1) Stop signal

(2) Invert input signal

(3) Act as a universal gate

(4) Double input signal

26. Which of following are known as universal gates?

(1) NAND & NOR

(2) AND & OR

(3) XOR & OR

27. Gate whose output is 0 only when inputs are different is called

28. In the binary language, each letter of the alphabet, each number and each special character is made up of a unique combination of [BOB Clerk 2010]

c) 8 character

29. Decimal equivalent of (1111) 2 is [IBPS Clerk 2012]

30. ASCII code for letter A is

a) 1100011                 

b) 1000001                 

c) 1111111                 

31. Which of the following is not a binary number? [IBPS Clerk 2011]

32. Which of the following is an example of binary number? [IBPS Clerk 2011]

33. Numbers that are written with base 10 are classified as

(1) decimal number

(2) whole number

(3) hexadecimal number

(4) exponential integers

34. The octal system [IBPS Clerk 2011]

(1) needs less digits to represent a number than in the binary system

(2) needs more digits to represent a number than in the binary system

(3) needs the same number of digits to represent a number as in the binary system

(4) needs the same number of digits to represent a number as in the decimal system

35. Hexadecimal number system has ………. base.

36. ASCII stands for [IBPS Clerk 2011,2014]

(1) American Special Computer for Information Interaction

(2) American Standard Computer for Information Interchange

(3) American Special Code for Information Interchange

(4) American Special Computer for Information Interchange

(5) American Standard Code for Information Interchange

logo

Have an account?

pencil-icon

GCSE Computer Science Data Representatio...

User image

GCSE Computer Science Data Representation

user

25 questions

Player avatar

Introducing new   Paper mode

No student devices needed.   Know more

  • 1. Multiple Choice Edit 30 seconds 1 pt Convert 254 to binary 11110000 11111100 11111111 11111110
  • 2. Multiple Choice Edit 20 seconds 1 pt Convert 00010001 to decimal 17 27 37 47
  • 3. Multiple Choice Edit 20 seconds 1 pt Convert 11111111 to decimal 255 245 235 225
  • 4. Multiple Choice Edit 1 minute 12 pts Convert 01000100 to decimal 58 68 78 88
  • 5. Multiple Choice Edit 10 seconds 1 pt What two digits are used in binary? 0 & 2 1&2 0&1 1&1
  • 6. Multiple Choice Edit 10 seconds 1 pt Binary is also known as Base 1 Base 2 Base 10 Base jumping
  • 7. Multiple Choice Edit 10 seconds 1 pt A single 0 or 1 is known as a Bit Byte Nibble Binary
  • 8. Multiple Choice Edit 10 seconds 1 pt How many bits are there in a byte? 2 4 8 10
  • 9. Multiple Choice Edit 20 seconds 1 pt The numbers we normally use are called ________ numbers and have 10 different values. denary binary digital integers
  • 10. Multiple Choice Edit 30 seconds 1 pt A9 1010 1011 1010 1001 1001 1001 1110 0101
  • 11. Multiple Choice Edit 30 seconds 1 pt 1101 1110 ED DE EC CE
  • 12. Multiple Choice Edit 1 minute 1 pt Convert 85 into Hexadecimal 01010101 55 5A A5
  • 13. Multiple Choice Edit 1 minute 1 pt What is the answer to 0110 + 0101? 1011 1101 1110 1111
  • 14. Multiple Choice Edit 30 seconds 1 pt How does increasing colour depth affect an image file? Increases number of colours in image Increases resolution of image Increases image width Decreases metadata for an image 
  • 15. Multiple Choice Edit 45 seconds 1 pt What sample rate do most CDs have? 44.1kHz 50,000Hz 40.2kHz 4Hz
  • 16. Multiple Choice Edit 1 minute 1 pt What is bit depth? Number of bits assigned to each sample Rate at which the binary is calculated How deep the sound can go The speed of the sound wave
  • 17. Multiple Choice Edit 10 seconds 1 pt What is ASCII? How the internet displays images A character set A programming language Binary information on the computer
  • 18. Multiple Choice Edit 10 seconds 1 pt ASCII stands for… What doesn't it stand for? All Symbolic Computer Instructions Interchange Alternative Standard Computers Interchange Internet American Standard Code for Information Interchange
  • 19. Multiple Choice Edit 30 seconds 1 pt What is a pixel? Tiny squares of colour on a screen. A fairy like creature thatlives in a forest. A whole picture. A binary code.
  • 20. Multiple Choice Edit 30 seconds 1 pt The higher the colour depth, the _________the image file size. Smaller Larger

How many megabytes in a gigabyte?

Which of these is accurate for smallest to largest?

Kilobyte, Megabyte, Gigabyte, Terabyte

Megabyte, Kilobyte, Terabyte, Gigabyte

Gigabyte, Kilobyte, Terabyte, Megabyte

Terabyte, Gigabyte, Megabyte, Kilobyte

  • 23. Multiple Choice Edit 45 seconds 1 pt Makes the file smaller by deleting parts of the file permanently (forever) Lossy Compression Lossless Compression
  • 24. Multiple Choice Edit 30 seconds 1 pt Data compression usually works by.. Finding repeating patterns. Deleting random bits data
  • 25. Multiple Choice Edit 30 seconds 1 pt What type of compression would you use to compress a text file? Lossy Lossless

Explore all questions with a free account

Google Logo

Continue with email

Continue with phone

  • C++ Data Types
  • C++ Input/Output
  • C++ Pointers
  • C++ Interview Questions
  • C++ Programs
  • C++ Cheatsheet
  • C++ Projects
  • C++ Exception Handling
  • C++ Memory Management
  • How to implement Priority Queue - using Heap or Array?
  • How to Implement Priority Queue Using Multimap in C++?
  • How to implement Stack and Queue using ArrayDeque in Java
  • C++ Program to Access Elements of an Array Using Pointer
  • Java Program to Implement ArrayDeque API
  • How to implement Min Heap using STL?
  • How to Implement a Circular Queue in Java ?
  • Perl | Implementing a Queue
  • Java Program to Implement ArrayBlockingQueue API
  • Priority Queue using array in C++
  • How to Implement Queue in Java using Array and Generics?
  • Program to reverse an array using pointers
  • Array implementation of queue (Simple)
  • Implement Stack and Queue using Deque
  • Implement Stack using Queues
  • Implement a stack using single queue
  • Implementation of Deque using circular array
  • How to efficiently implement k Queues in a single array?
  • C program to Insert an element in an Array

C++ Program to Implement Queue using Array

A queue is a linear data structure that consists of elements arranged in a sequential order where one end is used to add elements, and another for removing them which results in the FIFO (First-In First-Out) order of operations. In this article, we will learn how to write a program to implement queues using an array in C++.

Queue using Array in C++

The queue is a linear data structure that has the following properties:-

  • It works on the principle of FIFO (First In First Out)
  • It has mainly 2 pointers to perform operations: Front & Rear . The Rear is for insertion and the Front is for deletion
  • It is a linear data structure that gives sequential one-way access to the elements. Even though we can implement it using an array, we cannot access any element through indices directly because access to elements is given only through the rear and front pointers.

To understand better about the queue data structure let us go through the following diagram which describes the workflow of the queue implemented using an array:

Explanation:

  • Initially our array based Queue consists of : [10, 30, 40, 50, 40, 60]
  • And then we perform an Enqueue and a Dequeue Operation for element 10 and 15 respectively.
  • After performing operations, our final queue is : [30, 40, 50, 40, 60, 15]

Representation of Queue in C++

We can represent a queue using a class that contains:

  • An array to store the data.
  • Front and back index pointer.
  • Constructor to initialize the queue.
  • Member functions that provide enqueue and dequeue operations.

Basic Operations on Queue in C++

In order to implement a queue using array, we must be aware of the basic operations that are performed in a queue to manipulate the elements present inside it. Following are the basic operations of the queue data structure:

The two main operations related to the queue data structure are enqueue and dequeue. Let us understand about the implementation of both the enqueue and dequeue operations.

Implementation of Enqueue Operation

The enqueue operation of the queue inserts an element into the queue if it is not full through the rear pointer. We need to check if the queue is full before enqueue (queue overflow).

Algorithm of Enqueue

Following is the algorithm for the enqueue operation:

Check if the queue is full using the isFull function. If not full, increment the rear pointer and add the new element to the rear position. If the queue was empty before enqueueing, also update the front pointer.

Implementation of Dequeue Operation

The dequeue operation of the queue deletes an element from the queue if it is not empty through the front pointer.

Algorithm of Dequeue

Following is the algorithm for the dequeue operation:

Check if the queue is empty using the isEmpty function. If not empty, store the element at the front position. Increment the front pointer to remove the element from the queue. If the queue becomes empty after dequeueing, reset both front and rear pointers to -1.

Implementation of Front Operation

This operation returns the element at the front of the queue. It is the element that will be removed first when we perform dequeue.

Algorithm of Front

Following is the algorithm for the front operation:

Check if the queue is empty. If the queue is empty, return -1. If the queue is not empty, return the element pointed by the front pointer.

Implementation of isEmpty Operation

This operation checks whether the given queue has any element or is it empty. This function returns true if the queue is empty, false otherwise. We know that when the queue is empty initially, the front = -1. When we dequeue all the elements, then the front > rear.

Algorithm of isEmpty

Following is the algorithm for the isEmpty operation:

If front == -1 OR front > rear is true, then we can say that the queue is empty. It it evaluates as false, then we can say that there are elements in the queue.

Implementation of isFull Operation

The isFull operation checks whenther the queue is full. It means that whether it contains n elements where n is the size of the array using which the queue is implemented.

Algorithm of isFull

Following is the algorithm for the isFull operation:

Check if the rear == MAX_ARRAY_SIZE – 1. If it is true, then the queue is full. If its false, then it means that there is space left in the queue.

The following program demonstrates how we can implement a queue using array in C++:

Problem with this Implementation

Consider the situation where we insert the elements in the queue till it is full. After that, we removed all the elements. Now, the front will point to the element one more than the rear and will be equal to the MAX_SIZE which is the condition for the Full queue. Now even though the queue is empty, it still will show queue overflow.

To resoolve this, we use the concept of the circular queue where we perform the circular or modular increment.

Refer to this aritcle for more information.

Applications of Queue

Due to its FIFO nature, queue is a widely utilised data structure in a variety of real world applications :-

  • Computer Science Fundamentals: its utilised in mostly the core operating system concepts like Job Scheduling Algorithms, Memory management for processing, basically putting the processes inside a queue so that they can be executed concurrently by the processor
  • Algorithms: To store addresses of tree nodes, linked list nodes, and Graph Nodes BFS (Breadth First Search) etc.
  • Real World Application : In a music or video player app you must’ve seen the option “ Add to Queue ” which is basically the Enqueue operation!

In this article we’ve covered the most important aspects of Queue data structure like working, basic operations, implementation using array in C++, applications etc. We have also seen that the queue provide O(1) time and space complexity for all operation. Though the operations provide limited capablity.

Please Login to comment...

Similar reads.

author

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

COMMENTS

  1. Chapter 2: Data Representation

    Get answers to all exercises of Chapter 2: Data Representation Sumita Arora Computer Science with Python CBSE Class 11 book. Clear your computer doubts instantly & get more marks in computers exam easily. Master the concepts with our detailed explanations & solutions.

  2. Exercises: Data representation

    Data representation exercises. Exercises not as directly relevant to this year's class are marked with ⚠️. DATAREP-1. Sizes and alignments. QUESTION DATAREP-1A. True or false: For any non-array type X, the size of X ( sizeof(X)) is greater than or equal to the alignment of type X ( alignof(X) ). Show solution. QUESTION DATAREP-1B.

  3. PDF Data Representation Topic Tests

    GCSE Computer Science (9-1) - Data Representation - Topic Test Edulito©2017 Page 3 Topic Test - Data Representation 1(a) Place the following terms in order according to size with the smallest at the top of the table. [2] Kilobyte, terabyte, nibble, petabyte, byte, gigabyte, megabyte, bit

  4. 1.1 Number Systems

    Topic Questions. A school network has several computers. Each computer in the network has a media access control (MAC) address. Hexadecimal is used for MAC addresses. Part of a MAC address is given. 97-5C-E1. Each pair of digits is stored as binary in an 8-bit register. Complete the binary register for these two pairs of digits.

  5. Data Representation Class 11 Computer Science Exam Questions

    Class 11 Computer Science Exam Questions Data Representation. Class 11 Computer Science students should read and understand the important questions and answers provided below for Data Representation which will help them to understand all important and difficult topics. Very Short answer Type Questions. Question: Convert 111110111101012 to octal.

  6. Data Representation

    Contents pages for the section covering Data Representation from binary representation to various data compression methods at GCSE, IB and A Level - for Computer Science students. ... IB or A-level computer science student, our guide provides a detailed explanation of how data is represented in binary, hexadecimal, and ASCII formats, as well as ...

  7. Data Representation Theory Revision Notes

    Once programmed a computer can only execute problems and produce solutions more efficiently than humans. Computational thinking …. Data Representation Theory Notes & Resources. Topics include binary, decimal, and hexadecimal numbers, and the conversions between them. Perfect for Computer Science teachers.

  8. OCR GCSE Computer Science: Data Representation Questions

    Study with Quizlet and memorize flashcards containing terms like Order the following smallest to largest: GB, Bit, PB, Byte, Nibble, MB, what is the effect of a binary right sift of 2 places on 00110100?, What are the following hexadecimal in binary and denary: a. 1 b. F c. 6 d. 4B e. A2 and more.

  9. Units and data representation

    All data is represented as binary digits, whether it is numbers, text, images or sound. Calculations are also done in binary. Part of Computer Science Computer systems

  10. 5 Data, Representation, and Information

    The preceding two chapters address the creation of models that capture phenomena of interest and the abstractions both for data and for computation that reduce these models to forms that can be executed by computer.We turn now to the ways computer scientists deal with information, especially in its static form as data that can be manipulated by programs.

  11. Data representation test questions

    For Higher Computing Science, revise the use of binary to represent and store data in a variety of forms. ... Data representation Test questions. ... Part of Computing Science Computer systems ...

  12. Data Representation in Computer Science

    Data in a computer system is represented in binary format, as a sequence of 0s and 1s, denoting 'off' and 'on' states respectively. The smallest component of this binary representation is known as a bit, which stands for 'binary digit'. A byte, on the other hand, generally encompasses 8 bits.

  13. Units

    Analogue data and digital data. Analogue data is a real-life signal that can vary greatly in value. Examples include: Digital data is binary data which represents analogue data. Computers work ...

  14. Data Representation in Computer: Number Systems, Characters

    A computer uses a fixed number of bits to represent a piece of data which could be a number, a character, image, sound, video, etc. Data representation is the method used internally to represent data in a computer. Let us see how various types of data can be represented in computer memory. Before discussing data representation of numbers, let ...

  15. AQA GCSE Data Representation quizzes

    10 questions - Hex to denary - write the answer. AQA 3.3 - Data Representation - Binary addition. 10 questions - Binary addition - write the answer. AQA 3.3 - Data Representation - 8 bit Binary addition. 10 questions - Binary addition - write the answer. AQA 3.3 - Data Representation - Binary, Hex, Denary mixed.

  16. Data representations

    Data representations are useful for interpreting data and identifying trends and relationships. When working with data representations, pay close attention to both the data values and the key words in the question. When matching data to a representation, check that the values are graphed accurately for all categories.

  17. GCSE

    2.3 Databases. Question Paper. Mark Scheme. Transition words are words like 'and', 'but', 'so' and 'because'. They show your reader computer science igcse past papers by topic phrases, sentences, or even paragraphs. When you use them, you make it easier for your readers to understand how IGCSE Computer Science By Topic.

  18. Data representation test questions

    Data representation Test questions. ... Part of Computing Science Computer systems. Save to My Bitesize Remove from My Bitesize. In this guide. Revise. Test. More guides on this topic.

  19. Data representation and Computer Systems, IGCSE CS 0478 1.1.1-1.1.3

    explore. Data representation and Computer Systems, IGCSE CS 0478 1.1.1-1.1.3 - dfm quiz for 10th grade students. Find other quizzes for Computers and more on Quizizz for free!

  20. Chapter 1

    Chapter 1 - Data Representation (IGCSE Computer Science) 1. Multiple Choice. A character set for all characters on a standard keyboard and control codes. 2. Multiple Choice. A method used to reduce the size of a sound file. 3. Multiple Choice.

  21. 1.2 Text, Sound and Images

    The Unicode character set is used to represent text that is typed into a computer. a) Describe what is meant by a character set. [2] How did you do? View Answer. 1b 1 mark. b) One disadvantage of using the Unicode character set, instead of the ASCII character set, is that the text stored takes up more storage space.

  22. Data Representation in Computer MCQ [PDF] 40 Top Question

    Data representation in computer MCQ.Questions and answers with PDF for all Computer Related Entrance & Competitive Exams Preparation. Helpful for Class 11, GATE, IBPS, SBI (Bank PO & Clerk), SSC, Railway etc.

  23. GCSE Computer Science Data Representation

    GCSE Computer Science Data Representation quiz for 10th grade students. Find other quizzes for Computers and more on Quizizz for free!

  24. C++ Program to Implement Queue using Array

    A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.