• Accountancy
  • Business Studies
  • Commercial Law
  • Organisational Behaviour
  • Human Resource Management
  • Entrepreneurship
  • CBSE Class 11 Statistics for Economics Notes

Chapter 1: Concept of Economics and Significance of Statistics in Economics

  • Statistics for Economics | Functions, Importance, and Limitations

Chapter 2: Collection of Data

  • Data Collection & Its Methods
  • Sources of Data Collection | Primary and Secondary Sources
  • Direct Personal Investigation: Meaning, Suitability, Merits, Demerits and Precautions
  • Indirect Oral Investigation : Suitability, Merits, Demerits and Precautions
  • Difference between Direct Personal Investigation and Indirect Oral Investigation
  • Information from Local Source or Correspondents: Meaning, Suitability, Merits, and Demerits
  • Questionnaires and Schedules Method of Data Collection
  • Difference between Questionnaire and Schedule
  • Qualities of a Good Questionnaire and types of Questions
  • What are the Published Sources of Collecting Secondary Data?
  • What Precautions should be taken before using Secondary Data?
  • Two Important Sources of Secondary Data: Census of India and Reports & Publications of NSSO
  • What is National Sample Survey Organisation (NSSO)?
  • What is Census Method of Collecting Data?
  • Sample Method of Collection of Data
  • Methods of Sampling
  • Father of Indian Census
  • What makes a Sampling Data Reliable?
  • Difference between Census Method and Sampling Method of Collecting Data
  • What are Statistical Errors?

Chapter 3: Organisation of Data

  • Organization of Data
  • Objectives and Characteristics of Classification of Data
  • Classification of Data in Statistics | Meaning and Basis of Classification of Data
  • Concept of Variable and Raw Data
  • Types of Statistical Series
  • Difference between Frequency Array and Frequency Distribution
  • Types of Frequency Distribution

Chapter 4: Presentation of Data: Textual and Tabular

  • Textual Presentation of Data: Meaning, Suitability, and Drawbacks

Tabular Presentation of Data: Meaning, Objectives, Features and Merits

  • Different Types of Tables
  • Classification and Tabulation of Data

Chapter 5: Diagrammatic Presentation of Data

  • Diagrammatic Presentation of Data: Meaning , Features, Guidelines, Advantages and Disadvantages
  • Types of Diagrams
  • Bar Graph | Meaning, Types, and Examples
  • Pie Diagrams | Meaning, Example and Steps to Construct
  • Histogram | Meaning, Example, Types and Steps to Draw
  • Frequency Polygon | Meaning, Steps to Draw and Examples
  • Ogive (Cumulative Frequency Curve) and its Types
  • What is Arithmetic Line-Graph or Time-Series Graph?
  • Diagrammatic and Graphic Presentation of Data

Chapter 6: Measures of Central Tendency: Arithmetic Mean

  • Measures of Central Tendency in Statistics
  • Arithmetic Mean: Meaning, Example, Types, Merits, and Demerits
  • What is Simple Arithmetic Mean?
  • Calculation of Mean in Individual Series | Formula of Mean
  • Calculation of Mean in Discrete Series | Formula of Mean
  • Calculation of Mean in Continuous Series | Formula of Mean
  • Calculation of Arithmetic Mean in Special Cases
  • Weighted Arithmetic Mean

Chapter 7: Measures of Central Tendency: Median and Mode

  • Median(Measures of Central Tendency): Meaning, Formula, Merits, Demerits, and Examples
  • Calculation of Median for Different Types of Statistical Series
  • Calculation of Median in Individual Series | Formula of Median
  • Calculation of Median in Discrete Series | Formula of Median
  • Calculation of Median in Continuous Series | Formula of Median
  • Graphical determination of Median
  • Mode: Meaning, Formula, Merits, Demerits, and Examples
  • Calculation of Mode in Individual Series | Formula of Mode
  • Calculation of Mode in Discrete Series | Formula of Mode
  • Grouping Method of Calculating Mode in Discrete Series | Formula of Mode
  • Calculation of Mode in Continuous Series | Formula of Mode
  • Calculation of Mode in Special Cases
  • Calculation of Mode by Graphical Method
  • Mean, Median and Mode| Comparison, Relationship and Calculation

Chapter 8: Measures of Dispersion

  • Measures of Dispersion | Meaning, Absolute and Relative Measures of Dispersion
  • Range | Meaning, Coefficient of Range, Merits and Demerits, Calculation of Range
  • Calculation of Range and Coefficient of Range
  • Interquartile Range and Quartile Deviation
  • Partition Value | Quartiles, Deciles and Percentiles
  • Quartile Deviation and Coefficient of Quartile Deviation: Meaning, Formula, Calculation, and Examples
  • Quartile Deviation in Discrete Series | Formula, Calculation and Examples
  • Quartile Deviation in Continuous Series | Formula, Calculation and Examples
  • Mean Deviation: Coefficient of Mean Deviation, Merits, and Demerits
  • Calculation of Mean Deviation for different types of Statistical Series
  • Mean Deviation from Mean | Individual, Discrete, and Continuous Series
  • Mean Deviation from Median | Individual, Discrete, and Continuous Series
  • Standard Deviation: Meaning, Coefficient of Standard Deviation, Merits, and Demerits
  • Standard Deviation in Individual Series
  • Methods of Calculating Standard Deviation in Discrete Series
  • Methods of calculation of Standard Deviation in frequency distribution series
  • Combined Standard Deviation: Meaning, Formula, and Example
  • How to calculate Variance?
  • Coefficient of Variation: Meaning, Formula and Examples
  • Lorenz Curveb : Meaning, Construction, and Application

Chapter 9: Correlation

  • Correlation: Meaning, Significance, Types and Degree of Correlation
  • Methods of measurements of Correlation
  • Calculation of Correlation with Scattered Diagram
  • Spearman's Rank Correlation Coefficient
  • Karl Pearson's Coefficient of Correlation
  • Karl Pearson's Coefficient of Correlation | Methods and Examples

Chapter 10: Index Number

  • Index Number | Meaning, Characteristics, Uses and Limitations
  • Methods of Construction of Index Number
  • Unweighted or Simple Index Numbers: Meaning and Methods
  • Methods of calculating Weighted Index Numbers
  • Fisher's Index Number as an Ideal Method
  • Fisher's Method of calculating Weighted Index Number
  • Paasche's Method of calculating Weighted Index Number
  • Laspeyre's Method of calculating Weighted Index Number
  • Laspeyre's, Paasche's, and Fisher's Methods of Calculating Index Number
  • Consumer Price Index (CPI) or Cost of Living Index Number: Construction of Consumer Price Index|Difficulties and Uses of Consumer Price Index
  • Methods of Constructing Consumer Price Index (CPI)
  • Wholesale Price Index (WPI) | Meaning, Uses, Merits, and Demerits
  • Index Number of Industrial Production : Characteristics, Construction & Example
  • Inflation and Index Number

Important Formulas in Statistics for Economics

  • Important Formulas in Statistics for Economics | Class 11

What is Tabulation?

The systematic presentation of numerical data in rows and columns is known as Tabulation . It is designed to make presentation simpler and analysis easier. This type of presentation facilitates comparison by putting relevant information close to one another, and it helps in further statistical analysis and interpretation. One of the most important devices for presenting the data in a condensed and readily comprehensible form is tabulation. It aims to provide as much information as possible in the minimum possible space while maintaining the quality and usefulness of the data.

Tabular Presentation of Data

“Tabulation involves the orderly and systematic presentation of numerical data in a form designed to elucidate the problem under consideration.” – L.R. Connor

Objectives of Tabulation

The aim of tabulation is to summarise a large amount of numerical information into the simplest form. The following are the main objectives of tabulation:

  • To make complex data simpler: The main aim of tabulation is to present the classified data in a systematic way. The purpose is to condense the bulk of information (data) under investigation into a simple and meaningful form.
  • To save space: Tabulation tries to save space by condensing data in a meaningful form while maintaining the quality and quantity of the data.
  • To facilitate comparison: It also aims to facilitate quick comparison of various observations by providing the data in a tabular form.
  • To facilitate statistical analysis: Tabulation aims to facilitate statistical analysis because it is the stage between data classification and data presentation. Various statistical measures, including averages, dispersion, correlation, and others, are easily calculated from data that has been systematically tabulated.
  • To provide a reference: Since data may be easily identifiable and used when organised in tables with titles and table numbers, tabulation aims to provide a reference for future studies.

Features of a Good Table

Tabulation is a very specialised job. It requires a thorough knowledge of statistical methods, as well as abilities, experience, and common sense. A good table must have the following characteristics:

  • Title: The top of the table must have a title and it needs to be very appealing and attractive.
  • Manageable Size: The table shouldn’t be too big or too small. The size of the table should be in accordance with its objectives and the characteristics of the data. It should completely cover all significant characteristics of data.
  • Attractive: A table should have an appealing appearance that appeals to both the sight and the mind so that the reader can grasp it easily without any strain.
  • Special Emphasis: The data to be compared should be placed in the left-hand corner of columns, with their titles in bold letters.
  • Fit with the Objective: The table should reflect the objective of the statistical investigation.
  • Simplicity: To make the table easily understandable, it should be simple and compact.
  • Data Comparison: The data to be compared must be placed closely in the columns.
  • Numbered Columns and Rows: When there are several rows and columns in a table, they must be numbered for reference.
  • Clarity: A table should be prepared so that even a layman may make conclusions from it. The table should contain all necessary information and it must be self-explanatory.
  • Units: The unit designations should be written on the top of the table, below the title. For example, Height in cm, Weight in kg, Price in ₹, etc. However, if different items have different units, then they should be mentioned in the respective rows and columns.
  • Suitably Approximated: If the figures are large, then they should be rounded or approximated.
  • Scientifically Prepared: The preparation of the table should be done in a systematic and logical manner and should be free from any kind of ambiguity and overlapping. 

Components of a Table

A table’s preparation is an art that requires skilled data handling. It’s crucial to understand the components of a good statistical table before constructing one. A table is created when all of these components are put together in a systematic order. In simple terms, a good table should include the following components:

1. Table Number:

Each table needs to have a number so it may be quickly identified and used as a reference.

  • If there are many tables, they should be numbered in a logical order.
  • The table number can be given at the top of the table or the beginning of the table title.
  • The table is also identified by its location using subscripted numbers like 1.2, 2.1, etc. For instance, Table Number 3.1 should be seen as the first table of the third chapter.

Each table should have a suitable title. A table’s contents are briefly described in the title.

  • The title should be simple, self-explanatory, and free from ambiguity.
  • A title should be brief and presented clearly, usually below the table number.
  • In certain cases, a long title is preferable for clarification. In these cases, a ‘Catch Title’ may be placed above the ‘Main Title’. For instance , the table’s contents might come after the firm’s name, which appears as a catch title.
  • Contents of Title: The title should include the following information:  (i) Nature of data, or classification criteria (ii) Subject-matter (iii) Place to which the data relates  (iv) Time to which the data relates  (v) Source to which the data belongs  (vi) Reference to the data, if available.

3. Captions or Column Headings:

A column designation is given to explain the figures in the column at the top of each column in a table. This is referred to as a “Column heading” or “Caption”.

  • Captions are used to describe the names or heads of vertical columns.
  • To save space, captions are generally placed in small letters in the middle of the columns.

4. Stubs or Row Headings:

Each row of the table needs to have a heading, similar to a caption or column heading. The headers of horizontal rows are referred to as stubs. A brief description of the row headers may also be provided at the table’s left-hand top.

5. Body of Table:

The table’s most crucial component is its body, which contains data (numerical information).

  • The location of any one figure or data in the table is fixed and determined by the row and column of the table.
  • The columns and rows in the main body’s arrangement of numerical data are arranged from top to bottom.
  • The size and shape of the main body should be planned in accordance with the nature of the figures and the purpose of the study.
  • As the body of the table summarises the facts and conclusions of the statistical investigation, it must be ensured that the table does not have irrelevant information.

6. Unit of Measurement:

If the unit of measurement of the figures in the table (real data) does not change throughout the table, it should always be provided along with the title.

  • However, these units must be mentioned together with stubs or captions if rows or columns have different units.
  • If there are large figures, they should be rounded up and the rounding method should be stated.

7. Head Notes:

If the main title does not convey enough information, a head note is included in small brackets in prominent words right below the main title.

  • A head-note is included to convey any relevant information.
  • For instance, the table frequently uses the units of measurement “in million rupees,” “in tonnes,” “in kilometres,” etc. Head notes are also known as Prefatory Notes .

8. Source Note:

A source note refers to the place where information was obtained.

  • In the case of secondary data, a source note is provided.
  • Name of the book, page number, table number, etc., from which the data were collected should all be included in the source. If there are multiple sources, each one must be listed in the source note.
  • If a reader wants to refer to the original data, the source note enables him to locate the data. Usually, the source note appears at the bottom of the table. For example, the source note may be: ‘Census of India, 2011’.
  • Importance: A source note is useful for three reasons: -> It provides credit to the source (person or group), who collected the data; -> It provides a reference to source material that may be more complete; -> It offers some insight into the reliability of the information and its source.

9. Footnotes:

The footnote is the last part of the table. The unique characteristic of the data content of the table that is not self-explanatory and has not previously been explained is mentioned in the footnote.

  • Footnotes are used to provide additional information that is not provided by the heading, title, stubs, caption, etc.
  • When there are many footnotes, they are numbered in order.
  • Footnotes are identified by the symbols *, @, £, etc.
  • In general, footnotes are used for the following reasons: (i) To highlight any exceptions to the data (ii)Any special circumstances affecting the data; and (iii)To clarify any information in the data.

define tabular data presentation

Merits of Tabular Presentation of Data

The following are the merits of tabular presentation of data:

  • Brief and Simple Presentation: Tabular presentation is possibly the simplest method of data presentation. As a result, information is simple to understand. A significant amount of statistical data is also presented in a very brief manner.
  • Facilitates Comparison: By grouping the data into different classes, tabulation facilitates data comparison.
  • Simple Analysis: Analysing data from tables is quite simple. One can determine the data’s central tendency, dispersion, and correlation by organising the data as a table.
  • Highlights Characteristics of the Data:  Tabulation highlights characteristics of the data. As a result of this, it is simple to remember the statistical facts.
  • Cost-effective: Tabular presentation is a very cost-effective way to convey data. It saves time and space.
  • Provides Reference: As the data provided in a tabular presentation can be used for other studies and research, it acts as a source of reference.

Please Login to comment...

Similar reads.

  • Statistics for Economics

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Home Blog Design Understanding Data Presentations (Guide + Examples)

Understanding Data Presentations (Guide + Examples)

Cover for guide on data presentation by SlideModel

In this age of overwhelming information, the skill to effectively convey data has become extremely valuable. Initiating a discussion on data presentation types involves thoughtful consideration of the nature of your data and the message you aim to convey. Different types of visualizations serve distinct purposes. Whether you’re dealing with how to develop a report or simply trying to communicate complex information, how you present data influences how well your audience understands and engages with it. This extensive guide leads you through the different ways of data presentation.

Table of Contents

What is a Data Presentation?

What should a data presentation include, line graphs, treemap chart, scatter plot, how to choose a data presentation type, recommended data presentation templates, common mistakes done in data presentation.

A data presentation is a slide deck that aims to disclose quantitative information to an audience through the use of visual formats and narrative techniques derived from data analysis, making complex data understandable and actionable. This process requires a series of tools, such as charts, graphs, tables, infographics, dashboards, and so on, supported by concise textual explanations to improve understanding and boost retention rate.

Data presentations require us to cull data in a format that allows the presenter to highlight trends, patterns, and insights so that the audience can act upon the shared information. In a few words, the goal of data presentations is to enable viewers to grasp complicated concepts or trends quickly, facilitating informed decision-making or deeper analysis.

Data presentations go beyond the mere usage of graphical elements. Seasoned presenters encompass visuals with the art of data storytelling , so the speech skillfully connects the points through a narrative that resonates with the audience. Depending on the purpose – inspire, persuade, inform, support decision-making processes, etc. – is the data presentation format that is better suited to help us in this journey.

To nail your upcoming data presentation, ensure to count with the following elements:

  • Clear Objectives: Understand the intent of your presentation before selecting the graphical layout and metaphors to make content easier to grasp.
  • Engaging introduction: Use a powerful hook from the get-go. For instance, you can ask a big question or present a problem that your data will answer. Take a look at our guide on how to start a presentation for tips & insights.
  • Structured Narrative: Your data presentation must tell a coherent story. This means a beginning where you present the context, a middle section in which you present the data, and an ending that uses a call-to-action. Check our guide on presentation structure for further information.
  • Visual Elements: These are the charts, graphs, and other elements of visual communication we ought to use to present data. This article will cover one by one the different types of data representation methods we can use, and provide further guidance on choosing between them.
  • Insights and Analysis: This is not just showcasing a graph and letting people get an idea about it. A proper data presentation includes the interpretation of that data, the reason why it’s included, and why it matters to your research.
  • Conclusion & CTA: Ending your presentation with a call to action is necessary. Whether you intend to wow your audience into acquiring your services, inspire them to change the world, or whatever the purpose of your presentation, there must be a stage in which you convey all that you shared and show the path to staying in touch. Plan ahead whether you want to use a thank-you slide, a video presentation, or which method is apt and tailored to the kind of presentation you deliver.
  • Q&A Session: After your speech is concluded, allocate 3-5 minutes for the audience to raise any questions about the information you disclosed. This is an extra chance to establish your authority on the topic. Check our guide on questions and answer sessions in presentations here.

Bar charts are a graphical representation of data using rectangular bars to show quantities or frequencies in an established category. They make it easy for readers to spot patterns or trends. Bar charts can be horizontal or vertical, although the vertical format is commonly known as a column chart. They display categorical, discrete, or continuous variables grouped in class intervals [1] . They include an axis and a set of labeled bars horizontally or vertically. These bars represent the frequencies of variable values or the values themselves. Numbers on the y-axis of a vertical bar chart or the x-axis of a horizontal bar chart are called the scale.

Presentation of the data through bar charts

Real-Life Application of Bar Charts

Let’s say a sales manager is presenting sales to their audience. Using a bar chart, he follows these steps.

Step 1: Selecting Data

The first step is to identify the specific data you will present to your audience.

The sales manager has highlighted these products for the presentation.

  • Product A: Men’s Shoes
  • Product B: Women’s Apparel
  • Product C: Electronics
  • Product D: Home Decor

Step 2: Choosing Orientation

Opt for a vertical layout for simplicity. Vertical bar charts help compare different categories in case there are not too many categories [1] . They can also help show different trends. A vertical bar chart is used where each bar represents one of the four chosen products. After plotting the data, it is seen that the height of each bar directly represents the sales performance of the respective product.

It is visible that the tallest bar (Electronics – Product C) is showing the highest sales. However, the shorter bars (Women’s Apparel – Product B and Home Decor – Product D) need attention. It indicates areas that require further analysis or strategies for improvement.

Step 3: Colorful Insights

Different colors are used to differentiate each product. It is essential to show a color-coded chart where the audience can distinguish between products.

  • Men’s Shoes (Product A): Yellow
  • Women’s Apparel (Product B): Orange
  • Electronics (Product C): Violet
  • Home Decor (Product D): Blue

Accurate bar chart representation of data with a color coded legend

Bar charts are straightforward and easily understandable for presenting data. They are versatile when comparing products or any categorical data [2] . Bar charts adapt seamlessly to retail scenarios. Despite that, bar charts have a few shortcomings. They cannot illustrate data trends over time. Besides, overloading the chart with numerous products can lead to visual clutter, diminishing its effectiveness.

For more information, check our collection of bar chart templates for PowerPoint .

Line graphs help illustrate data trends, progressions, or fluctuations by connecting a series of data points called ‘markers’ with straight line segments. This provides a straightforward representation of how values change [5] . Their versatility makes them invaluable for scenarios requiring a visual understanding of continuous data. In addition, line graphs are also useful for comparing multiple datasets over the same timeline. Using multiple line graphs allows us to compare more than one data set. They simplify complex information so the audience can quickly grasp the ups and downs of values. From tracking stock prices to analyzing experimental results, you can use line graphs to show how data changes over a continuous timeline. They show trends with simplicity and clarity.

Real-life Application of Line Graphs

To understand line graphs thoroughly, we will use a real case. Imagine you’re a financial analyst presenting a tech company’s monthly sales for a licensed product over the past year. Investors want insights into sales behavior by month, how market trends may have influenced sales performance and reception to the new pricing strategy. To present data via a line graph, you will complete these steps.

First, you need to gather the data. In this case, your data will be the sales numbers. For example:

  • January: $45,000
  • February: $55,000
  • March: $45,000
  • April: $60,000
  • May: $ 70,000
  • June: $65,000
  • July: $62,000
  • August: $68,000
  • September: $81,000
  • October: $76,000
  • November: $87,000
  • December: $91,000

After choosing the data, the next step is to select the orientation. Like bar charts, you can use vertical or horizontal line graphs. However, we want to keep this simple, so we will keep the timeline (x-axis) horizontal while the sales numbers (y-axis) vertical.

Step 3: Connecting Trends

After adding the data to your preferred software, you will plot a line graph. In the graph, each month’s sales are represented by data points connected by a line.

Line graph in data presentation

Step 4: Adding Clarity with Color

If there are multiple lines, you can also add colors to highlight each one, making it easier to follow.

Line graphs excel at visually presenting trends over time. These presentation aids identify patterns, like upward or downward trends. However, too many data points can clutter the graph, making it harder to interpret. Line graphs work best with continuous data but are not suitable for categories.

For more information, check our collection of line chart templates for PowerPoint and our article about how to make a presentation graph .

A data dashboard is a visual tool for analyzing information. Different graphs, charts, and tables are consolidated in a layout to showcase the information required to achieve one or more objectives. Dashboards help quickly see Key Performance Indicators (KPIs). You don’t make new visuals in the dashboard; instead, you use it to display visuals you’ve already made in worksheets [3] .

Keeping the number of visuals on a dashboard to three or four is recommended. Adding too many can make it hard to see the main points [4]. Dashboards can be used for business analytics to analyze sales, revenue, and marketing metrics at a time. They are also used in the manufacturing industry, as they allow users to grasp the entire production scenario at the moment while tracking the core KPIs for each line.

Real-Life Application of a Dashboard

Consider a project manager presenting a software development project’s progress to a tech company’s leadership team. He follows the following steps.

Step 1: Defining Key Metrics

To effectively communicate the project’s status, identify key metrics such as completion status, budget, and bug resolution rates. Then, choose measurable metrics aligned with project objectives.

Step 2: Choosing Visualization Widgets

After finalizing the data, presentation aids that align with each metric are selected. For this project, the project manager chooses a progress bar for the completion status and uses bar charts for budget allocation. Likewise, he implements line charts for bug resolution rates.

Data analysis presentation example

Step 3: Dashboard Layout

Key metrics are prominently placed in the dashboard for easy visibility, and the manager ensures that it appears clean and organized.

Dashboards provide a comprehensive view of key project metrics. Users can interact with data, customize views, and drill down for detailed analysis. However, creating an effective dashboard requires careful planning to avoid clutter. Besides, dashboards rely on the availability and accuracy of underlying data sources.

For more information, check our article on how to design a dashboard presentation , and discover our collection of dashboard PowerPoint templates .

Treemap charts represent hierarchical data structured in a series of nested rectangles [6] . As each branch of the ‘tree’ is given a rectangle, smaller tiles can be seen representing sub-branches, meaning elements on a lower hierarchical level than the parent rectangle. Each one of those rectangular nodes is built by representing an area proportional to the specified data dimension.

Treemaps are useful for visualizing large datasets in compact space. It is easy to identify patterns, such as which categories are dominant. Common applications of the treemap chart are seen in the IT industry, such as resource allocation, disk space management, website analytics, etc. Also, they can be used in multiple industries like healthcare data analysis, market share across different product categories, or even in finance to visualize portfolios.

Real-Life Application of a Treemap Chart

Let’s consider a financial scenario where a financial team wants to represent the budget allocation of a company. There is a hierarchy in the process, so it is helpful to use a treemap chart. In the chart, the top-level rectangle could represent the total budget, and it would be subdivided into smaller rectangles, each denoting a specific department. Further subdivisions within these smaller rectangles might represent individual projects or cost categories.

Step 1: Define Your Data Hierarchy

While presenting data on the budget allocation, start by outlining the hierarchical structure. The sequence will be like the overall budget at the top, followed by departments, projects within each department, and finally, individual cost categories for each project.

  • Top-level rectangle: Total Budget
  • Second-level rectangles: Departments (Engineering, Marketing, Sales)
  • Third-level rectangles: Projects within each department
  • Fourth-level rectangles: Cost categories for each project (Personnel, Marketing Expenses, Equipment)

Step 2: Choose a Suitable Tool

It’s time to select a data visualization tool supporting Treemaps. Popular choices include Tableau, Microsoft Power BI, PowerPoint, or even coding with libraries like D3.js. It is vital to ensure that the chosen tool provides customization options for colors, labels, and hierarchical structures.

Here, the team uses PowerPoint for this guide because of its user-friendly interface and robust Treemap capabilities.

Step 3: Make a Treemap Chart with PowerPoint

After opening the PowerPoint presentation, they chose “SmartArt” to form the chart. The SmartArt Graphic window has a “Hierarchy” category on the left.  Here, you will see multiple options. You can choose any layout that resembles a Treemap. The “Table Hierarchy” or “Organization Chart” options can be adapted. The team selects the Table Hierarchy as it looks close to a Treemap.

Step 5: Input Your Data

After that, a new window will open with a basic structure. They add the data one by one by clicking on the text boxes. They start with the top-level rectangle, representing the total budget.  

Treemap used for presenting data

Step 6: Customize the Treemap

By clicking on each shape, they customize its color, size, and label. At the same time, they can adjust the font size, style, and color of labels by using the options in the “Format” tab in PowerPoint. Using different colors for each level enhances the visual difference.

Treemaps excel at illustrating hierarchical structures. These charts make it easy to understand relationships and dependencies. They efficiently use space, compactly displaying a large amount of data, reducing the need for excessive scrolling or navigation. Additionally, using colors enhances the understanding of data by representing different variables or categories.

In some cases, treemaps might become complex, especially with deep hierarchies.  It becomes challenging for some users to interpret the chart. At the same time, displaying detailed information within each rectangle might be constrained by space. It potentially limits the amount of data that can be shown clearly. Without proper labeling and color coding, there’s a risk of misinterpretation.

A heatmap is a data visualization tool that uses color coding to represent values across a two-dimensional surface. In these, colors replace numbers to indicate the magnitude of each cell. This color-shaded matrix display is valuable for summarizing and understanding data sets with a glance [7] . The intensity of the color corresponds to the value it represents, making it easy to identify patterns, trends, and variations in the data.

As a tool, heatmaps help businesses analyze website interactions, revealing user behavior patterns and preferences to enhance overall user experience. In addition, companies use heatmaps to assess content engagement, identifying popular sections and areas of improvement for more effective communication. They excel at highlighting patterns and trends in large datasets, making it easy to identify areas of interest.

We can implement heatmaps to express multiple data types, such as numerical values, percentages, or even categorical data. Heatmaps help us easily spot areas with lots of activity, making them helpful in figuring out clusters [8] . When making these maps, it is important to pick colors carefully. The colors need to show the differences between groups or levels of something. And it is good to use colors that people with colorblindness can easily see.

Check our detailed guide on how to create a heatmap here. Also discover our collection of heatmap PowerPoint templates .

Pie charts are circular statistical graphics divided into slices to illustrate numerical proportions. Each slice represents a proportionate part of the whole, making it easy to visualize the contribution of each component to the total.

The size of the pie charts is influenced by the value of data points within each pie. The total of all data points in a pie determines its size. The pie with the highest data points appears as the largest, whereas the others are proportionally smaller. However, you can present all pies of the same size if proportional representation is not required [9] . Sometimes, pie charts are difficult to read, or additional information is required. A variation of this tool can be used instead, known as the donut chart , which has the same structure but a blank center, creating a ring shape. Presenters can add extra information, and the ring shape helps to declutter the graph.

Pie charts are used in business to show percentage distribution, compare relative sizes of categories, or present straightforward data sets where visualizing ratios is essential.

Real-Life Application of Pie Charts

Consider a scenario where you want to represent the distribution of the data. Each slice of the pie chart would represent a different category, and the size of each slice would indicate the percentage of the total portion allocated to that category.

Step 1: Define Your Data Structure

Imagine you are presenting the distribution of a project budget among different expense categories.

  • Column A: Expense Categories (Personnel, Equipment, Marketing, Miscellaneous)
  • Column B: Budget Amounts ($40,000, $30,000, $20,000, $10,000) Column B represents the values of your categories in Column A.

Step 2: Insert a Pie Chart

Using any of the accessible tools, you can create a pie chart. The most convenient tools for forming a pie chart in a presentation are presentation tools such as PowerPoint or Google Slides.  You will notice that the pie chart assigns each expense category a percentage of the total budget by dividing it by the total budget.

For instance:

  • Personnel: $40,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 40%
  • Equipment: $30,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 30%
  • Marketing: $20,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 20%
  • Miscellaneous: $10,000 / ($40,000 + $30,000 + $20,000 + $10,000) = 10%

You can make a chart out of this or just pull out the pie chart from the data.

Pie chart template in data presentation

3D pie charts and 3D donut charts are quite popular among the audience. They stand out as visual elements in any presentation slide, so let’s take a look at how our pie chart example would look in 3D pie chart format.

3D pie chart in data presentation

Step 03: Results Interpretation

The pie chart visually illustrates the distribution of the project budget among different expense categories. Personnel constitutes the largest portion at 40%, followed by equipment at 30%, marketing at 20%, and miscellaneous at 10%. This breakdown provides a clear overview of where the project funds are allocated, which helps in informed decision-making and resource management. It is evident that personnel are a significant investment, emphasizing their importance in the overall project budget.

Pie charts provide a straightforward way to represent proportions and percentages. They are easy to understand, even for individuals with limited data analysis experience. These charts work well for small datasets with a limited number of categories.

However, a pie chart can become cluttered and less effective in situations with many categories. Accurate interpretation may be challenging, especially when dealing with slight differences in slice sizes. In addition, these charts are static and do not effectively convey trends over time.

For more information, check our collection of pie chart templates for PowerPoint .

Histograms present the distribution of numerical variables. Unlike a bar chart that records each unique response separately, histograms organize numeric responses into bins and show the frequency of reactions within each bin [10] . The x-axis of a histogram shows the range of values for a numeric variable. At the same time, the y-axis indicates the relative frequencies (percentage of the total counts) for that range of values.

Whenever you want to understand the distribution of your data, check which values are more common, or identify outliers, histograms are your go-to. Think of them as a spotlight on the story your data is telling. A histogram can provide a quick and insightful overview if you’re curious about exam scores, sales figures, or any numerical data distribution.

Real-Life Application of a Histogram

In the histogram data analysis presentation example, imagine an instructor analyzing a class’s grades to identify the most common score range. A histogram could effectively display the distribution. It will show whether most students scored in the average range or if there are significant outliers.

Step 1: Gather Data

He begins by gathering the data. The scores of each student in class are gathered to analyze exam scores.

After arranging the scores in ascending order, bin ranges are set.

Step 2: Define Bins

Bins are like categories that group similar values. Think of them as buckets that organize your data. The presenter decides how wide each bin should be based on the range of the values. For instance, the instructor sets the bin ranges based on score intervals: 60-69, 70-79, 80-89, and 90-100.

Step 3: Count Frequency

Now, he counts how many data points fall into each bin. This step is crucial because it tells you how often specific ranges of values occur. The result is the frequency distribution, showing the occurrences of each group.

Here, the instructor counts the number of students in each category.

  • 60-69: 1 student (Kate)
  • 70-79: 4 students (David, Emma, Grace, Jack)
  • 80-89: 7 students (Alice, Bob, Frank, Isabel, Liam, Mia, Noah)
  • 90-100: 3 students (Clara, Henry, Olivia)

Step 4: Create the Histogram

It’s time to turn the data into a visual representation. Draw a bar for each bin on a graph. The width of the bar should correspond to the range of the bin, and the height should correspond to the frequency.  To make your histogram understandable, label the X and Y axes.

In this case, the X-axis should represent the bins (e.g., test score ranges), and the Y-axis represents the frequency.

Histogram in Data Presentation

The histogram of the class grades reveals insightful patterns in the distribution. Most students, with seven students, fall within the 80-89 score range. The histogram provides a clear visualization of the class’s performance. It showcases a concentration of grades in the upper-middle range with few outliers at both ends. This analysis helps in understanding the overall academic standing of the class. It also identifies the areas for potential improvement or recognition.

Thus, histograms provide a clear visual representation of data distribution. They are easy to interpret, even for those without a statistical background. They apply to various types of data, including continuous and discrete variables. One weak point is that histograms do not capture detailed patterns in students’ data, with seven compared to other visualization methods.

A scatter plot is a graphical representation of the relationship between two variables. It consists of individual data points on a two-dimensional plane. This plane plots one variable on the x-axis and the other on the y-axis. Each point represents a unique observation. It visualizes patterns, trends, or correlations between the two variables.

Scatter plots are also effective in revealing the strength and direction of relationships. They identify outliers and assess the overall distribution of data points. The points’ dispersion and clustering reflect the relationship’s nature, whether it is positive, negative, or lacks a discernible pattern. In business, scatter plots assess relationships between variables such as marketing cost and sales revenue. They help present data correlations and decision-making.

Real-Life Application of Scatter Plot

A group of scientists is conducting a study on the relationship between daily hours of screen time and sleep quality. After reviewing the data, they managed to create this table to help them build a scatter plot graph:

In the provided example, the x-axis represents Daily Hours of Screen Time, and the y-axis represents the Sleep Quality Rating.

Scatter plot in data presentation

The scientists observe a negative correlation between the amount of screen time and the quality of sleep. This is consistent with their hypothesis that blue light, especially before bedtime, has a significant impact on sleep quality and metabolic processes.

There are a few things to remember when using a scatter plot. Even when a scatter diagram indicates a relationship, it doesn’t mean one variable affects the other. A third factor can influence both variables. The more the plot resembles a straight line, the stronger the relationship is perceived [11] . If it suggests no ties, the observed pattern might be due to random fluctuations in data. When the scatter diagram depicts no correlation, whether the data might be stratified is worth considering.

Choosing the appropriate data presentation type is crucial when making a presentation . Understanding the nature of your data and the message you intend to convey will guide this selection process. For instance, when showcasing quantitative relationships, scatter plots become instrumental in revealing correlations between variables. If the focus is on emphasizing parts of a whole, pie charts offer a concise display of proportions. Histograms, on the other hand, prove valuable for illustrating distributions and frequency patterns. 

Bar charts provide a clear visual comparison of different categories. Likewise, line charts excel in showcasing trends over time, while tables are ideal for detailed data examination. Starting a presentation on data presentation types involves evaluating the specific information you want to communicate and selecting the format that aligns with your message. This ensures clarity and resonance with your audience from the beginning of your presentation.

1. Fact Sheet Dashboard for Data Presentation

define tabular data presentation

Convey all the data you need to present in this one-pager format, an ideal solution tailored for users looking for presentation aids. Global maps, donut chats, column graphs, and text neatly arranged in a clean layout presented in light and dark themes.

Use This Template

2. 3D Column Chart Infographic PPT Template

define tabular data presentation

Represent column charts in a highly visual 3D format with this PPT template. A creative way to present data, this template is entirely editable, and we can craft either a one-page infographic or a series of slides explaining what we intend to disclose point by point.

3. Data Circles Infographic PowerPoint Template

define tabular data presentation

An alternative to the pie chart and donut chart diagrams, this template features a series of curved shapes with bubble callouts as ways of presenting data. Expand the information for each arch in the text placeholder areas.

4. Colorful Metrics Dashboard for Data Presentation

define tabular data presentation

This versatile dashboard template helps us in the presentation of the data by offering several graphs and methods to convert numbers into graphics. Implement it for e-commerce projects, financial projections, project development, and more.

5. Animated Data Presentation Tools for PowerPoint & Google Slides

Canvas Shape Tree Diagram Template

A slide deck filled with most of the tools mentioned in this article, from bar charts, column charts, treemap graphs, pie charts, histogram, etc. Animated effects make each slide look dynamic when sharing data with stakeholders.

6. Statistics Waffle Charts PPT Template for Data Presentations

define tabular data presentation

This PPT template helps us how to present data beyond the typical pie chart representation. It is widely used for demographics, so it’s a great fit for marketing teams, data science professionals, HR personnel, and more.

7. Data Presentation Dashboard Template for Google Slides

define tabular data presentation

A compendium of tools in dashboard format featuring line graphs, bar charts, column charts, and neatly arranged placeholder text areas. 

8. Weather Dashboard for Data Presentation

define tabular data presentation

Share weather data for agricultural presentation topics, environmental studies, or any kind of presentation that requires a highly visual layout for weather forecasting on a single day. Two color themes are available.

9. Social Media Marketing Dashboard Data Presentation Template

define tabular data presentation

Intended for marketing professionals, this dashboard template for data presentation is a tool for presenting data analytics from social media channels. Two slide layouts featuring line graphs and column charts.

10. Project Management Summary Dashboard Template

define tabular data presentation

A tool crafted for project managers to deliver highly visual reports on a project’s completion, the profits it delivered for the company, and expenses/time required to execute it. 4 different color layouts are available.

11. Profit & Loss Dashboard for PowerPoint and Google Slides

define tabular data presentation

A must-have for finance professionals. This typical profit & loss dashboard includes progress bars, donut charts, column charts, line graphs, and everything that’s required to deliver a comprehensive report about a company’s financial situation.

Overwhelming visuals

One of the mistakes related to using data-presenting methods is including too much data or using overly complex visualizations. They can confuse the audience and dilute the key message.

Inappropriate chart types

Choosing the wrong type of chart for the data at hand can lead to misinterpretation. For example, using a pie chart for data that doesn’t represent parts of a whole is not right.

Lack of context

Failing to provide context or sufficient labeling can make it challenging for the audience to understand the significance of the presented data.

Inconsistency in design

Using inconsistent design elements and color schemes across different visualizations can create confusion and visual disarray.

Failure to provide details

Simply presenting raw data without offering clear insights or takeaways can leave the audience without a meaningful conclusion.

Lack of focus

Not having a clear focus on the key message or main takeaway can result in a presentation that lacks a central theme.

Visual accessibility issues

Overlooking the visual accessibility of charts and graphs can exclude certain audience members who may have difficulty interpreting visual information.

In order to avoid these mistakes in data presentation, presenters can benefit from using presentation templates . These templates provide a structured framework. They ensure consistency, clarity, and an aesthetically pleasing design, enhancing data communication’s overall impact.

Understanding and choosing data presentation types are pivotal in effective communication. Each method serves a unique purpose, so selecting the appropriate one depends on the nature of the data and the message to be conveyed. The diverse array of presentation types offers versatility in visually representing information, from bar charts showing values to pie charts illustrating proportions. 

Using the proper method enhances clarity, engages the audience, and ensures that data sets are not just presented but comprehensively understood. By appreciating the strengths and limitations of different presentation types, communicators can tailor their approach to convey information accurately, developing a deeper connection between data and audience understanding.

[1] Government of Canada, S.C. (2021) 5 Data Visualization 5.2 Bar Chart , 5.2 Bar chart .  https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch9/bargraph-diagrammeabarres/5214818-eng.htm

[2] Kosslyn, S.M., 1989. Understanding charts and graphs. Applied cognitive psychology, 3(3), pp.185-225. https://apps.dtic.mil/sti/pdfs/ADA183409.pdf

[3] Creating a Dashboard . https://it.tufts.edu/book/export/html/1870

[4] https://www.goldenwestcollege.edu/research/data-and-more/data-dashboards/index.html

[5] https://www.mit.edu/course/21/21.guide/grf-line.htm

[6] Jadeja, M. and Shah, K., 2015, January. Tree-Map: A Visualization Tool for Large Data. In GSB@ SIGIR (pp. 9-13). https://ceur-ws.org/Vol-1393/gsb15proceedings.pdf#page=15

[7] Heat Maps and Quilt Plots. https://www.publichealth.columbia.edu/research/population-health-methods/heat-maps-and-quilt-plots

[8] EIU QGIS WORKSHOP. https://www.eiu.edu/qgisworkshop/heatmaps.php

[9] About Pie Charts.  https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c8.htm

[10] Histograms. https://sites.utexas.edu/sos/guided/descriptive/numericaldd/descriptiven2/histogram/ [11] https://asq.org/quality-resources/scatter-diagram

define tabular data presentation

Like this article? Please share

Data Analysis, Data Science, Data Visualization Filed under Design

Related Articles

How to Make a Presentation Graph

Filed under Design • March 27th, 2024

How to Make a Presentation Graph

Detailed step-by-step instructions to master the art of how to make a presentation graph in PowerPoint and Google Slides. Check it out!

All About Using Harvey Balls

Filed under Presentation Ideas • January 6th, 2024

All About Using Harvey Balls

Among the many tools in the arsenal of the modern presenter, Harvey Balls have a special place. In this article we will tell you all about using Harvey Balls.

How to Design a Dashboard Presentation: A Step-by-Step Guide

Filed under Business • December 8th, 2023

How to Design a Dashboard Presentation: A Step-by-Step Guide

Take a step further in your professional presentation skills by learning what a dashboard presentation is and how to properly design one in PowerPoint. A detailed step-by-step guide is here!

Leave a Reply

define tabular data presentation

Statology

Statistics Made Easy

What is Tabular Data? (Definition & Example)

In statistics, tabular data refers to data that is organized in a table with rows and columns.

tabular data format

Within the table, the rows represent observations and the columns represent attributes for those observations.

For example, the following table represents tabular data:

example of tabular data

This dataset has 9 rows and 5 columns.

Each row represents one basketball player and the five columns describe different attributes about the player including:

  • Player name
  • Minutes played

The opposite of tabular data would be visual data , which would be some type of plot or chart that helps us visualize the values in a dataset.

For example, we might have the following bar chart that helps us visualize the total minutes played by each player in the dataset:

tabular data vs. visual data

This would be an example of visual data .

It contains the exact same information about player names and minutes played for the players in the dataset, but it’s simply displayed in a visual form instead of a tabular form.

Or we might have the following scatterplot that helps us visualize the relationship between minutes played and points scored for each player:

define tabular data presentation

This is another example of visual data .

When is Tabular Data Used in Practice?

In practice, tabular data is the most common type of data that you’ll run across in the real world.

In the real world, most data that is saved in an Excel spreadsheet is considered tabular data because the rows represent observations and the columns represent attributes for those observations.

For example, here’s what our basketball dataset from earlier might look like in an Excel spreadsheet:

define tabular data presentation

This format is one of the most natural ways to collect and store values in a dataset, which is why it’s used so often.

Additional Resources

The following tutorials explain other common terms in statistics:

Why is Statistics Important? Why is Sample Size Important in Statistics? What is an Observation in Statistics? What is Considered Raw Data in Statistics?

define tabular data presentation

Hey there. My name is Zach Bobbitt. I have a Master of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Data presentation: A comprehensive guide

Learn how to create data presentation effectively and communicate your insights in a way that is clear, concise, and engaging.

Raja Bothra

Building presentations

team preparing data presentation

Hey there, fellow data enthusiast!

Welcome to our comprehensive guide on data presentation.

Whether you're an experienced presenter or just starting, this guide will help you present your data like a pro.

We'll dive deep into what data presentation is, why it's crucial, and how to master it. So, let's embark on this data-driven journey together.

What is data presentation?

Data presentation is the art of transforming raw data into a visual format that's easy to understand and interpret. It's like turning numbers and statistics into a captivating story that your audience can quickly grasp. When done right, data presentation can be a game-changer, enabling you to convey complex information effectively.

Why are data presentations important?

Imagine drowning in a sea of numbers and figures. That's how your audience might feel without proper data presentation. Here's why it's essential:

  • Clarity : Data presentations make complex information clear and concise.
  • Engagement : Visuals, such as charts and graphs, grab your audience's attention.
  • Comprehension : Visual data is easier to understand than long, numerical reports.
  • Decision-making : Well-presented data aids informed decision-making.
  • Impact : It leaves a lasting impression on your audience.

Types of data presentation

Now, let's delve into the diverse array of data presentation methods, each with its own unique strengths and applications. We have three primary types of data presentation, and within these categories, numerous specific visualization techniques can be employed to effectively convey your data.

1. Textual presentation

Textual presentation harnesses the power of words and sentences to elucidate and contextualize your data. This method is commonly used to provide a narrative framework for the data, offering explanations, insights, and the broader implications of your findings. It serves as a foundation for a deeper understanding of the data's significance.

2. Tabular presentation

Tabular presentation employs tables to arrange and structure your data systematically. These tables are invaluable for comparing various data groups or illustrating how data evolves over time. They present information in a neat and organized format, facilitating straightforward comparisons and reference points.

3. Graphical presentation

Graphical presentation harnesses the visual impact of charts and graphs to breathe life into your data. Charts and graphs are powerful tools for spotlighting trends, patterns, and relationships hidden within the data. Let's explore some common graphical presentation methods:

  • Bar charts: They are ideal for comparing different categories of data. In this method, each category is represented by a distinct bar, and the height of the bar corresponds to the value it represents. Bar charts provide a clear and intuitive way to discern differences between categories.
  • Pie charts: It excel at illustrating the relative proportions of different data categories. Each category is depicted as a slice of the pie, with the size of each slice corresponding to the percentage of the total value it represents. Pie charts are particularly effective for showcasing the distribution of data.
  • Line graphs: They are the go-to choice when showcasing how data evolves over time. Each point on the line represents a specific value at a particular time period. This method enables viewers to track trends and fluctuations effortlessly, making it perfect for visualizing data with temporal dimensions.
  • Scatter plots: They are the tool of choice when exploring the relationship between two variables. In this method, each point on the plot represents a pair of values for the two variables in question. Scatter plots help identify correlations, outliers, and patterns within data pairs.

The selection of the most suitable data presentation method hinges on the specific dataset and the presentation's objectives. For instance, when comparing sales figures of different products, a bar chart shines in its simplicity and clarity. On the other hand, if your aim is to display how a product's sales have changed over time, a line graph provides the ideal visual narrative.

Additionally, it's crucial to factor in your audience's level of familiarity with data presentations. For a technical audience, more intricate visualization methods may be appropriate. However, when presenting to a general audience, opting for straightforward and easily understandable visuals is often the wisest choice.

In the world of data presentation, choosing the right method is akin to selecting the perfect brush for a masterpiece. Each tool has its place, and understanding when and how to use them is key to crafting compelling and insightful presentations. So, consider your data carefully, align your purpose, and paint a vivid picture that resonates with your audience.

What to include in data presentation

When creating your data presentation, remember these key components:

  • Data points : Clearly state the data points you're presenting.
  • Comparison : Highlight comparisons and trends in your data.
  • Graphical methods : Choose the right chart or graph for your data.
  • Infographics : Use visuals like infographics to make information more digestible.
  • Numerical values : Include numerical values to support your visuals.
  • Qualitative information : Explain the significance of the data.
  • Source citation : Always cite your data sources.

How to structure an effective data presentation

Creating a well-structured data presentation is not just important; it's the backbone of a successful presentation. Here's a step-by-step guide to help you craft a compelling and organized presentation that captivates your audience:

1. Know your audience

Understanding your audience is paramount. Consider their needs, interests, and existing knowledge about your topic. Tailor your presentation to their level of understanding, ensuring that it resonates with them on a personal level. Relevance is the key.

2. Have a clear message

Every effective data presentation should convey a clear and concise message. Determine what you want your audience to learn or take away from your presentation, and make sure your message is the guiding light throughout your presentation. Ensure that all your data points align with and support this central message.

3. Tell a compelling story

Human beings are naturally wired to remember stories. Incorporate storytelling techniques into your presentation to make your data more relatable and memorable. Your data can be the backbone of a captivating narrative, whether it's about a trend, a problem, or a solution. Take your audience on a journey through your data.

4. Leverage visuals

Visuals are a powerful tool in data presentation. They make complex information accessible and engaging. Utilize charts, graphs, and images to illustrate your points and enhance the visual appeal of your presentation. Visuals should not just be an accessory; they should be an integral part of your storytelling.

5. Be clear and concise

Avoid jargon or technical language that your audience may not comprehend. Use plain language and explain your data points clearly. Remember, clarity is king. Each piece of information should be easy for your audience to digest.

6. Practice your delivery

Practice makes perfect. Rehearse your presentation multiple times before the actual delivery. This will help you deliver it smoothly and confidently, reducing the chances of stumbling over your words or losing track of your message.

A basic structure for an effective data presentation

Armed with a comprehensive comprehension of how to construct a compelling data presentation, you can now utilize this fundamental template for guidance:

In the introduction, initiate your presentation by introducing both yourself and the topic at hand. Clearly articulate your main message or the fundamental concept you intend to communicate.

Moving on to the body of your presentation, organize your data in a coherent and easily understandable sequence. Employ visuals generously to elucidate your points and weave a narrative that enhances the overall story. Ensure that the arrangement of your data aligns with and reinforces your central message.

As you approach the conclusion, succinctly recapitulate your key points and emphasize your core message once more. Conclude by leaving your audience with a distinct and memorable takeaway, ensuring that your presentation has a lasting impact.

Additional tips for enhancing your data presentation

To take your data presentation to the next level, consider these additional tips:

  • Consistent design : Maintain a uniform design throughout your presentation. This not only enhances visual appeal but also aids in seamless comprehension.
  • High-quality visuals : Ensure that your visuals are of high quality, easy to read, and directly relevant to your topic.
  • Concise text : Avoid overwhelming your slides with excessive text. Focus on the most critical points, using visuals to support and elaborate.
  • Anticipate questions : Think ahead about the questions your audience might pose. Be prepared with well-thought-out answers to foster productive discussions.

By following these guidelines, you can structure an effective data presentation that not only informs but also engages and inspires your audience. Remember, a well-structured presentation is the bridge that connects your data to your audience's understanding and appreciation.

Do’s and don'ts on a data presentation

  • Use visuals : Incorporate charts and graphs to enhance understanding.
  • Keep it simple : Avoid clutter and complexity.
  • Highlight key points : Emphasize crucial data.
  • Engage the audience : Encourage questions and discussions.
  • Practice : Rehearse your presentation.

Don'ts:

  • Overload with data : Less is often more; don't overwhelm your audience.
  • Fit Unrelated data : Stay on topic; don't include irrelevant information.
  • Neglect the audience : Ensure your presentation suits your audience's level of expertise.
  • Read word-for-word : Avoid reading directly from slides.
  • Lose focus : Stick to your presentation's purpose.

Summarizing key takeaways

  • Definition : Data presentation is the art of visualizing complex data for better understanding.
  • Importance : Data presentations enhance clarity, engage the audience, aid decision-making, and leave a lasting impact.
  • Types : Textual, Tabular, and Graphical presentations offer various ways to present data.
  • Choosing methods : Select the right method based on data, audience, and purpose.
  • Components : Include data points, comparisons, visuals, infographics, numerical values, and source citations.
  • Structure : Know your audience, have a clear message, tell a compelling story, use visuals, be concise, and practice.
  • Do's and don'ts : Do use visuals, keep it simple, highlight key points, engage the audience, and practice. Don't overload with data, include unrelated information, neglect the audience's expertise, read word-for-word, or lose focus.

1. What is data presentation, and why is it important in 2023?

Data presentation is the process of visually representing data sets to convey information effectively to an audience. In an era where the amount of data generated is vast, visually presenting data using methods such as diagrams, graphs, and charts has become crucial. By simplifying complex data sets, presentation of the data may helps your audience quickly grasp much information without drowning in a sea of chart's, analytics, facts and figures.

2. What are some common methods of data presentation?

There are various methods of data presentation, including graphs and charts, histograms, and cumulative frequency polygons. Each method has its strengths and is often used depending on the type of data you're using and the message you want to convey. For instance, if you want to show data over time, try using a line graph. If you're presenting geographical data, consider to use a heat map.

3. How can I ensure that my data presentation is clear and readable?

To ensure that your data presentation is clear and readable, pay attention to the design and labeling of your charts. Don't forget to label the axes appropriately, as they are critical for understanding the values they represent. Don't fit all the information in one slide or in a single paragraph. Presentation software like Prezent and PowerPoint can help you simplify your vertical axis, charts and tables, making them much easier to understand.

4. What are some common mistakes presenters make when presenting data?

One common mistake is trying to fit too much data into a single chart, which can distort the information and confuse the audience. Another mistake is not considering the needs of the audience. Remember that your audience won't have the same level of familiarity with the data as you do, so it's essential to present the data effectively and respond to questions during a Q&A session.

5. How can I use data visualization to present important data effectively on platforms like LinkedIn?

When presenting data on platforms like LinkedIn, consider using eye-catching visuals like bar graphs or charts. Use concise captions and e.g., examples to highlight the single most important information in your data report. Visuals, such as graphs and tables, can help you stand out in the sea of textual content, making your data presentation more engaging and shareable among your LinkedIn connections.

Create your data presentation with prezent

Prezent can be a valuable tool for creating data presentations. Here's how Prezent can help you in this regard:

  • Time savings : Prezent saves up to 70% of presentation creation time, allowing you to focus on data analysis and insights.
  • On-brand consistency : Ensure 100% brand alignment with Prezent's brand-approved designs for professional-looking data presentations.
  • Effortless collaboration : Real-time sharing and collaboration features make it easy for teams to work together on data presentations.
  • Data storytelling : Choose from 50+ storylines to effectively communicate data insights and engage your audience.
  • Personalization : Create tailored data presentations that resonate with your audience's preferences, enhancing the impact of your data.

In summary, Prezent streamlines the process of creating data presentations by offering time-saving features, ensuring brand consistency, promoting collaboration, and providing tools for effective data storytelling. Whether you need to present data to clients, stakeholders, or within your organization, Prezent can significantly enhance your presentation-making process.

So, go ahead, present your data with confidence, and watch your audience be wowed by your expertise.

Thank you for joining us on this data-driven journey. Stay tuned for more insights, and remember, data presentation is your ticket to making numbers come alive!

Sign up for our free trial or book a demo !

Get the latest from Prezent community

Join thousands of subscribers who receive our best practices on communication, storytelling, presentation design, and more. New tips weekly. (No spam, we promise!)

websights

  • List of Commerce Articles
  • Tabular Presentation Of Data

Tabular Presentation of Data

What is tabular presentation of data.

It is a table that helps to represent even a large amount of data in an engaging, easy to read, and coordinated manner. The data is arranged in rows and columns. This is one of the most popularly used forms of presentation of data as data tables are simple to prepare and read.

The most significant benefit of tabulation is that it coordinates data for additional statistical treatment and decision making. The analysis used in tabulation is of four types. They are:

  • Qualitative
  • Quantitative

1. Qualitative classification: When the classification is done according to traits such as physical status, nationality, social status, etc., it is known as qualitative classification.

2. Quantitative classification:  In this, the data is classified on the basis of features that are quantitative in nature. In other words, these features can be estimated quantitatively.

3. Temporal classification: In this classification, time becomes the categorising variable and data are classified according to time. Time, maybe in years, months, weeks, days, hours, etc.,

4. Spatial classification: When the categorisation is done on the basis of location, it is known as spatial classification. The place may be a country, state, district, block, village/town, etc.

Related read: T.R. Jain and V.K. Ohri Solutions for Presentation of Data

Basics of Tabular Presentation

Objectives of tabulation.

Following are the o bjectives of tabulation :

  • To simplify the complex data
  • To bring out essential features of the data
  • To facilitate comparison
  • To facilitate statistical analysis
  • Saving of space

What are the Three Limitations of a Table?

Following are the major limitations of a table:

(1) Lacks description

  • The table represents only figures and not attributes.
  • It ignores the qualitative aspects of the facts.

(2) Incapable of presenting individual items

  • It does not present individual items.
  • It presents aggregate data.

(3) Needs special knowledge

  • The understanding of the table requires special knowledge.
  • It cannot be easily used by a layman.

Explain the Main Parts of a Table:

Following are the main parts of a table:

Multiple choice questions

The above-mentioned concept is for CBSE class 11 Statistics for Economics – Tabular Presentation of Data. For solutions and study materials for class 11 Statistics for Economics, visit BYJU’S or download the app for more information and the best learning experience.

Important Topics in Commerce:

  • What is a Balance Sheet?
  • What Are Current Assets
  • What is Goodwill?
  • Treatment of Goodwill
  • What Is Partnership

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Request OTP on Voice Call

Post My Comment

define tabular data presentation

It’s help full

It’s very helpful site

define tabular data presentation

  • Share Share

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

7   Introduction to Tabular Data

An email inbox is a list of messages. For each message, your inbox stores a bunch of information: its sender, the subject line, the conversation it’s part of, the body, and quite a bit more.

define tabular data presentation

A music playlist. For each song, your music player maintains a bunch of information: its name, the singer, its length, its genre, and so on.

define tabular data presentation

A filesystem folder or directory. For each file, your filesystem records a name, a modification date, size, and other information.

define tabular data presentation

Do Now! Can you come up with more examples?

Responses to a party invitation.

A gradebook.

A calendar agenda.

They consists of rows and columns. For instance, each song or email message or file is a row. Each of their characteristics— the song title, the message subject, the filename— is a column.

Each row has the same columns as the other rows, in the same order.

A given column has the same type, but different columns can have different types. For instance, an email message has a sender’s name, which is a string; a subject line, which is a string; a sent date, which is a date; whether it’s been read, which is a Boolean; and so on.

The rows are usually in some particular order. For instance, the emails are ordered by which was most recently sent.

Exercise Find the characteristics of tabular data in the other examples described above, as well as in the ones you described.

We will now learn how to program with tables and to think about decomposing tasks involving them. You can also look up the full Pyret documentation for table operations .

7.1   Creating Tabular Data

table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end

Exercise Change different parts of the above example— e.g., remove a necessary value from a row, add an extraneous one, remove a comma, add an extra comma, leave an extra comma at the end of a row— and see what errors you get.

check: table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end is-not table: age, name row: 30, "Alice" row: 40, "Bob" row: 25, "Carol" end end

people = table: name, age row: "Alice", 30 row: "Bob", 40 row: "Carol", 25 end

create the sheet on your own,

create a sheet collaboratively with friends,

find data on the Web that you can import into a sheet,

create a Google Form that you get others to fill out, and obtain a sheet out of their responses

7.2   Processing Rows

Let’s now learn how we can actually process a table. Pyret offers a variety of built-in operations that make it quite easy to perform interesting computations over tables. In addition, as we will see later [ From Tables to Lists ], if we don’t find these sufficient, we can write our own. For now, we’ll focus on the operations Pyret provides.

Which emails were sent by a particular user?

Which songs were sung by a particular artist?

Which are the most frequently played songs in a playlist?

Which are the least frequently played songs in a playlist?

7.2.1   Keeping

email = table: sender, recipient, subject row: 'Matthias Felleisen', 'Pedro Diaz', 'Introduction' row: 'Joe Politz', 'Pedro Diaz', 'Class on Friday' row: 'Matthias Felleisen', 'Pedro Diaz', 'Book comments' row: 'Mia Minnes', 'Pedro Diaz', 'CSE8A Midterm' end

sieve email using sender: sender == 'Matthias Felleisen' end

sieve playlist using artist: (artist == 'Deep Purple') or (artist == 'Van Halen') end

Exercise Write a table for to use as playlist that works with the sieve expression above.
Exercise Write a sieve expression on the email table above that would result in a table with zero rows.

7.2.2   Ordering

order playlist: play-count ascending end

Note that what goes between the : and end is not an expression. Therefore, we cannot write arbitrary code here. We can only name columns and indicate which way they should be ordered.

7.2.3   Combining Keeping and Ordering

Of the emails from a particular person, which is the oldest?

Of the songs by a particular artist, which have we played the least often?

Do Now! Take a moment to think about how you would write these with what you have seen so far.

mf-emails = sieve email using sender: sender == 'Matthias Felleisen' end order mf-emails: sent-date ascending end

Exercise Write the second example as a composition of keep and order operations on a playlist table.

7.2.4   Extending

extend employees using hourly-wage, hours-worked: total-wage: hourly-wage * hours-worked end

ext-email = extend email using subject: subject-length: string-length(subject) end order ext-email: subject-length descending end

7.2.5   Transforming, Cleansing, and Normalizing

There are times when a table is “almost right”, but requires a little adjusting. For instance, we might have a table of customer requests for a free sample, and want to limit each customer to at most a certain number. We might get temperature readings from different countries in different formats, and want to convert them all to one single format. Because unit errors can be dangerous ! We might have a gradebook where different graders have used different levels of precision, and want to standardize all of them to have the same level of precision.

transform orders using count: count: num-min(count, 3) end

transform gradebook using total-grade: total-grade: num-round(total-grade) end

transform weather using temp, unit: temp: if unit == "F": fahrenheit-to-celsius(temp) else: temp end, unit: if unit == "F": "C" else: unit end end

Do Now! In this example, why do we also transform unit ?

7.2.6   Selecting

select name, total-grade from gradebook end

ss = select artist, song from playlist end order ss: artist ascending end

7.2.7   Summary of Row-Wise Table Operations

We’ve seen a lot in a short span. Specifically, we have seen several operations that consume a table and produce a new one according to some criterion. It’s worth summarizing the impact each of them has in terms of key table properties (where “-” means the entry is left unchanged):

The italicized entries reflect how the new table may differ from the old. Note that an entry like “reduced” or “altered” should be read as potentially reduced or altered; depending on the specific operation and the content of the table, there may be no change at all. (For instance, if a table is already sorted according to the criterion given in an order expression, the row order will not change.) However, in general one should expect the kind of change described in the above grid.

Observe that both dimensions of this grid provide interesting information. Unsurprisingly, each row has at least some kind of impact on a table (otherwise the operation would be useless and would not exist). Likewise, each column also has at least one way of impacting it. Furthermore, observe that most entries leave the table unchanged: that means each operation has limited impact on the table, careful to not overstep the bounds of its mandate.

On the one hand, the decision to limit the impact of each operation means that to achieve complex tasks, we may have to compose several operations together. We have already seen examples of this earlier this chapter. However, there is also a much more subtle consequence: it also means that to achieve complex tasks, we can compose several operations and get exactly what we want. If we had fewer operations that each did more, then composing them might have various undesired or (worse) unintended consequences, making it very difficult for us to obtain exactly the answer we want. Instead, the operations above follow the principle of orthogonality : no operation shadows what any other operation does, so they can be composed freely.

As a result of having these operations, we can think of tables also algebrically. Concretely, when given a problem, we should again begin with concrete examples of what we’re starting with and where we want to end. Then we can ask ourselves questions like, “Does the number of columns stay the same, grow, or shrink?”, “Does the number of rows stay the same or shrink?”, and so on. The grid above now provides us a toolkit by which we can start to decompose the task into individual operations. Of course, we still have to think: the order of operations matters, and sometimes we have to perform an operation mutiple times. Still, this grid is a useful guide to hint us towards the operations that might help solve our problem.

What is Tabular Data? (Definition & Example)

In statistics, tabular data refers to data that is organized in a table with rows and columns.

tabular data format

Within the table, the rows represent observations and the columns represent attributes for those observations.

For example, the following table represents tabular data:

example of tabular data

This dataset has 9 rows and 5 columns.

Each row represents one basketball player and the five columns describe different attributes about the player including:

  • Player name
  • Minutes played

The opposite of tabular data would be visual data , which would be some type of plot or chart that helps us visualize the values in a dataset.

For example, we might have the following bar chart that helps us visualize the total minutes played by each player in the dataset:

tabular data vs. visual data

This would be an example of visual data .

It contains the exact same information about player names and minutes played for the players in the dataset, but it’s simply displayed in a visual form instead of a tabular form.

Or we might have the following scatterplot that helps us visualize the relationship between minutes played and points scored for each player:

define tabular data presentation

This is another example of visual data .

When is Tabular Data Used in Practice?

In practice, tabular data is the most common type of data that you’ll run across in the real world.

In the real world, most data that is saved in an Excel spreadsheet is considered tabular data because the rows represent observations and the columns represent attributes for those observations.

For example, here’s what our basketball dataset from earlier might look like in an Excel spreadsheet:

define tabular data presentation

This format is one of the most natural ways to collect and store values in a dataset, which is why it’s used so often.

Additional Resources

The following tutorials explain other common terms in statistics:

Why is Statistics Important? Why is Sample Size Important in Statistics? What is an Observation in Statistics? What is Considered Raw Data in Statistics?

How to Write a Nested IFERROR Statement in Excel

How to use make.names function in r (with examples), related posts, how to normalize data between -1 and 1, vba: how to check if string contains another..., how to interpret f-values in a two-way anova, how to create a vector of ones in..., how to determine if a probability distribution is..., what is a symmetric histogram (definition & examples), how to find the mode of a histogram..., how to find quartiles in even and odd..., how to calculate sxy in statistics (with example), how to calculate expected value of x^3.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Korean J Anesthesiol
  • v.70(3); 2017 Jun

Statistical data presentation

1 Department of Anesthesiology and Pain Medicine, Dongguk University Ilsan Hospital, Goyang, Korea.

Sangseok Lee

2 Department of Anesthesiology and Pain Medicine, Sanggye Paik Hospital, Inje University College of Medicine, Seoul, Korea.

Data are usually collected in a raw format and thus the inherent information is difficult to understand. Therefore, raw data need to be summarized, processed, and analyzed. However, no matter how well manipulated, the information derived from the raw data should be presented in an effective format, otherwise, it would be a great loss for both authors and readers. In this article, the techniques of data and information presentation in textual, tabular, and graphical forms are introduced. Text is the principal method for explaining findings, outlining trends, and providing contextual information. A table is best suited for representing individual information and represents both quantitative and qualitative information. A graph is a very effective visual tool as it displays data at a glance, facilitates comparison, and can reveal trends and relationships within the data such as changes over time, frequency distribution, and correlation or relative share of a whole. Text, tables, and graphs for data and information presentation are very powerful communication tools. They can make an article easy to understand, attract and sustain the interest of readers, and efficiently present large amounts of complex information. Moreover, as journal editors and reviewers glance at these presentations before reading the whole article, their importance cannot be ignored.

Introduction

Data are a set of facts, and provide a partial picture of reality. Whether data are being collected with a certain purpose or collected data are being utilized, questions regarding what information the data are conveying, how the data can be used, and what must be done to include more useful information must constantly be kept in mind.

Since most data are available to researchers in a raw format, they must be summarized, organized, and analyzed to usefully derive information from them. Furthermore, each data set needs to be presented in a certain way depending on what it is used for. Planning how the data will be presented is essential before appropriately processing raw data.

First, a question for which an answer is desired must be clearly defined. The more detailed the question is, the more detailed and clearer the results are. A broad question results in vague answers and results that are hard to interpret. In other words, a well-defined question is crucial for the data to be well-understood later. Once a detailed question is ready, the raw data must be prepared before processing. These days, data are often summarized, organized, and analyzed with statistical packages or graphics software. Data must be prepared in such a way they are properly recognized by the program being used. The present study does not discuss this data preparation process, which involves creating a data frame, creating/changing rows and columns, changing the level of a factor, categorical variable, coding, dummy variables, variable transformation, data transformation, missing value, outlier treatment, and noise removal.

We describe the roles and appropriate use of text, tables, and graphs (graphs, plots, or charts), all of which are commonly used in reports, articles, posters, and presentations. Furthermore, we discuss the issues that must be addressed when presenting various kinds of information, and effective methods of presenting data, which are the end products of research, and of emphasizing specific information.

Data Presentation

Data can be presented in one of the three ways:

–as text;

–in tabular form; or

–in graphical form.

Methods of presentation must be determined according to the data format, the method of analysis to be used, and the information to be emphasized. Inappropriately presented data fail to clearly convey information to readers and reviewers. Even when the same information is being conveyed, different methods of presentation must be employed depending on what specific information is going to be emphasized. A method of presentation must be chosen after carefully weighing the advantages and disadvantages of different methods of presentation. For easy comparison of different methods of presentation, let us look at a table ( Table 1 ) and a line graph ( Fig. 1 ) that present the same information [ 1 ]. If one wishes to compare or introduce two values at a certain time point, it is appropriate to use text or the written language. However, a table is the most appropriate when all information requires equal attention, and it allows readers to selectively look at information of their own interest. Graphs allow readers to understand the overall trend in data, and intuitively understand the comparison results between two groups. One thing to always bear in mind regardless of what method is used, however, is the simplicity of presentation.

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g001.jpg

Values are expressed as mean ± SD. Group C: normal saline, Group D: dexmedetomidine. SBP: systolic blood pressure, DBP: diastolic blood pressure, MBP: mean blood pressure, HR: heart rate. * P < 0.05 indicates a significant increase in each group, compared with the baseline values. † P < 0.05 indicates a significant decrease noted in Group D, compared with the baseline values. ‡ P < 0.05 indicates a significant difference between the groups.

Text presentation

Text is the main method of conveying information as it is used to explain results and trends, and provide contextual information. Data are fundamentally presented in paragraphs or sentences. Text can be used to provide interpretation or emphasize certain data. If quantitative information to be conveyed consists of one or two numbers, it is more appropriate to use written language than tables or graphs. For instance, information about the incidence rates of delirium following anesthesia in 2016–2017 can be presented with the use of a few numbers: “The incidence rate of delirium following anesthesia was 11% in 2016 and 15% in 2017; no significant difference of incidence rates was found between the two years.” If this information were to be presented in a graph or a table, it would occupy an unnecessarily large space on the page, without enhancing the readers' understanding of the data. If more data are to be presented, or other information such as that regarding data trends are to be conveyed, a table or a graph would be more appropriate. By nature, data take longer to read when presented as texts and when the main text includes a long list of information, readers and reviewers may have difficulties in understanding the information.

Table presentation

Tables, which convey information that has been converted into words or numbers in rows and columns, have been used for nearly 2,000 years. Anyone with a sufficient level of literacy can easily understand the information presented in a table. Tables are the most appropriate for presenting individual information, and can present both quantitative and qualitative information. Examples of qualitative information are the level of sedation [ 2 ], statistical methods/functions [ 3 , 4 ], and intubation conditions [ 5 ].

The strength of tables is that they can accurately present information that cannot be presented with a graph. A number such as “132.145852” can be accurately expressed in a table. Another strength is that information with different units can be presented together. For instance, blood pressure, heart rate, number of drugs administered, and anesthesia time can be presented together in one table. Finally, tables are useful for summarizing and comparing quantitative information of different variables. However, the interpretation of information takes longer in tables than in graphs, and tables are not appropriate for studying data trends. Furthermore, since all data are of equal importance in a table, it is not easy to identify and selectively choose the information required.

For a general guideline for creating tables, refer to the journal submission requirements 1) .

Heat maps for better visualization of information than tables

Heat maps help to further visualize the information presented in a table by applying colors to the background of cells. By adjusting the colors or color saturation, information is conveyed in a more visible manner, and readers can quickly identify the information of interest ( Table 2 ). Software such as Excel (in Microsoft Office, Microsoft, WA, USA) have features that enable easy creation of heat maps through the options available on the “conditional formatting” menu.

All numbers were created by the author. SBP: systolic blood pressure, DBP: diastolic blood pressure, MBP: mean blood pressure, HR: heart rate.

Graph presentation

Whereas tables can be used for presenting all the information, graphs simplify complex information by using images and emphasizing data patterns or trends, and are useful for summarizing, explaining, or exploring quantitative data. While graphs are effective for presenting large amounts of data, they can be used in place of tables to present small sets of data. A graph format that best presents information must be chosen so that readers and reviewers can easily understand the information. In the following, we describe frequently used graph formats and the types of data that are appropriately presented with each format with examples.

Scatter plot

Scatter plots present data on the x - and y -axes and are used to investigate an association between two variables. A point represents each individual or object, and an association between two variables can be studied by analyzing patterns across multiple points. A regression line is added to a graph to determine whether the association between two variables can be explained or not. Fig. 2 illustrates correlations between pain scoring systems that are currently used (PSQ, Pain Sensitivity Questionnaire; PASS, Pain Anxiety Symptoms Scale; PCS, Pain Catastrophizing Scale) and Geop-Pain Questionnaire (GPQ) with the correlation coefficient, R, and regression line indicated on the scatter plot [ 6 ]. If multiple points exist at an identical location as in this example ( Fig. 2 ), the correlation level may not be clear. In this case, a correlation coefficient or regression line can be added to further elucidate the correlation.

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g002.jpg

Bar graph and histogram

A bar graph is used to indicate and compare values in a discrete category or group, and the frequency or other measurement parameters (i.e. mean). Depending on the number of categories, and the size or complexity of each category, bars may be created vertically or horizontally. The height (or length) of a bar represents the amount of information in a category. Bar graphs are flexible, and can be used in a grouped or subdivided bar format in cases of two or more data sets in each category. Fig. 3 is a representative example of a vertical bar graph, with the x -axis representing the length of recovery room stay and drug-treated group, and the y -axis representing the visual analog scale (VAS) score. The mean and standard deviation of the VAS scores are expressed as whiskers on the bars ( Fig. 3 ) [ 7 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g003.jpg

By comparing the endpoints of bars, one can identify the largest and the smallest categories, and understand gradual differences between each category. It is advised to start the x - and y -axes from 0. Illustration of comparison results in the x - and y -axes that do not start from 0 can deceive readers' eyes and lead to overrepresentation of the results.

One form of vertical bar graph is the stacked vertical bar graph. A stack vertical bar graph is used to compare the sum of each category, and analyze parts of a category. While stacked vertical bar graphs are excellent from the aspect of visualization, they do not have a reference line, making comparison of parts of various categories challenging ( Fig. 4 ) [ 8 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g004.jpg

A pie chart, which is used to represent nominal data (in other words, data classified in different categories), visually represents a distribution of categories. It is generally the most appropriate format for representing information grouped into a small number of categories. It is also used for data that have no other way of being represented aside from a table (i.e. frequency table). Fig. 5 illustrates the distribution of regular waste from operation rooms by their weight [ 8 ]. A pie chart is also commonly used to illustrate the number of votes each candidate won in an election.

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g005.jpg

Line plot with whiskers

A line plot is useful for representing time-series data such as monthly precipitation and yearly unemployment rates; in other words, it is used to study variables that are observed over time. Line graphs are especially useful for studying patterns and trends across data that include climatic influence, large changes or turning points, and are also appropriate for representing not only time-series data, but also data measured over the progression of a continuous variable such as distance. As can be seen in Fig. 1 , mean and standard deviation of systolic blood pressure are indicated for each time point, which enables readers to easily understand changes of systolic pressure over time [ 1 ]. If data are collected at a regular interval, values in between the measurements can be estimated. In a line graph, the x-axis represents the continuous variable, while the y-axis represents the scale and measurement values. It is also useful to represent multiple data sets on a single line graph to compare and analyze patterns across different data sets.

Box and whisker chart

A box and whisker chart does not make any assumptions about the underlying statistical distribution, and represents variations in samples of a population; therefore, it is appropriate for representing nonparametric data. AA box and whisker chart consists of boxes that represent interquartile range (one to three), the median and the mean of the data, and whiskers presented as lines outside of the boxes. Whiskers can be used to present the largest and smallest values in a set of data or only a part of the data (i.e. 95% of all the data). Data that are excluded from the data set are presented as individual points and are called outliers. The spacing at both ends of the box indicates dispersion in the data. The relative location of the median demonstrated within the box indicates skewness ( Fig. 6 ). The box and whisker chart provided as an example represents calculated volumes of an anesthetic, desflurane, consumed over the course of the observation period ( Fig. 7 ) [ 9 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g006.jpg

Three-dimensional effects

Most of the recently introduced statistical packages and graphics software have the three-dimensional (3D) effect feature. The 3D effects can add depth and perspective to a graph. However, since they may make reading and interpreting data more difficult, they must only be used after careful consideration. The application of 3D effects on a pie chart makes distinguishing the size of each slice difficult. Even if slices are of similar sizes, slices farther from the front of the pie chart may appear smaller than the slices closer to the front ( Fig. 8 ).

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g008.jpg

Drawing a graph: example

Finally, we explain how to create a graph by using a line graph as an example ( Fig. 9 ). In Fig. 9 , the mean values of arterial pressure were randomly produced and assumed to have been measured on an hourly basis. In many graphs, the x- and y-axes meet at the zero point ( Fig. 9A ). In this case, information regarding the mean and standard deviation of mean arterial pressure measurements corresponding to t = 0 cannot be conveyed as the values overlap with the y-axis. The data can be clearly exposed by separating the zero point ( Fig. 9B ). In Fig. 9B , the mean and standard deviation of different groups overlap and cannot be clearly distinguished from each other. Separating the data sets and presenting standard deviations in a single direction prevents overlapping and, therefore, reduces the visual inconvenience. Doing so also reduces the excessive number of ticks on the y-axis, increasing the legibility of the graph ( Fig. 9C ). In the last graph, different shapes were used for the lines connecting different time points to further allow the data to be distinguished, and the y-axis was shortened to get rid of the unnecessary empty space present in the previous graphs ( Fig. 9D ). A graph can be made easier to interpret by assigning each group to a different color, changing the shape of a point, or including graphs of different formats [ 10 ]. The use of random settings for the scale in a graph may lead to inappropriate presentation or presentation of data that can deceive readers' eyes ( Fig. 10 ).

An external file that holds a picture, illustration, etc.
Object name is kjae-70-267-g009.jpg

Owing to the lack of space, we could not discuss all types of graphs, but have focused on describing graphs that are frequently used in scholarly articles. We have summarized the commonly used types of graphs according to the method of data analysis in Table 3 . For general guidelines on graph designs, please refer to the journal submission requirements 2) .

Conclusions

Text, tables, and graphs are effective communication media that present and convey data and information. They aid readers in understanding the content of research, sustain their interest, and effectively present large quantities of complex information. As journal editors and reviewers will scan through these presentations before reading the entire text, their importance cannot be disregarded. For this reason, authors must pay as close attention to selecting appropriate methods of data presentation as when they were collecting data of good quality and analyzing them. In addition, having a well-established understanding of different methods of data presentation and their appropriate use will enable one to develop the ability to recognize and interpret inappropriately presented data or data presented in such a way that it deceives readers' eyes [ 11 ].

<Appendix>

Output for presentation.

Discovery and communication are the two objectives of data visualization. In the discovery phase, various types of graphs must be tried to understand the rough and overall information the data are conveying. The communication phase is focused on presenting the discovered information in a summarized form. During this phase, it is necessary to polish images including graphs, pictures, and videos, and consider the fact that the images may look different when printed than how appear on a computer screen. In this appendix, we discuss important concepts that one must be familiar with to print graphs appropriately.

The KJA asks that pictures and images meet the following requirement before submission 3)

“Figures and photographs should be submitted as ‘TIFF’ files. Submit files of figures and photographs separately from the text of the paper. Width of figure should be 84 mm (one column). Contrast of photos or graphs should be at least 600 dpi. Contrast of line drawings should be at least 1,200 dpi. The Powerpoint file (ppt, pptx) is also acceptable.”

Unfortunately, without sufficient knowledge of computer graphics, it is not easy to understand the submission requirement above. Therefore, it is necessary to develop an understanding of image resolution, image format (bitmap and vector images), and the corresponding file specifications.

Resolution is often mentioned to describe the quality of images containing graphs or CT/MRI scans, and video files. The higher the resolution, the clearer and closer to reality the image is, while the opposite is true for low resolutions. The most representative unit used to describe a resolution is “dpi” (dots per inch): this literally translates to the number of dots required to constitute 1 inch. The greater the number of dots, the higher the resolution. The KJA submission requirements recommend 600 dpi for images, and 1,200 dpi 4) for graphs. In other words, resolutions in which 600 or 1,200 dots constitute one inch are required for submission.

There are requirements for the horizontal length of an image in addition to the resolution requirements. While there are no requirements for the vertical length of an image, it must not exceed the vertical length of a page. The width of a column on one side of a printed page is 84 mm, or 3.3 inches (84/25.4 mm ≒ 3.3 inches). Therefore, a graph must have a resolution in which 1,200 dots constitute 1 inch, and have a width of 3.3 inches.

Bitmap and Vector

Methods of image construction are important. Bitmap images can be considered as images drawn on section paper. Enlarging the image will enlarge the picture along with the grid, resulting in a lower resolution; in other words, aliasing occurs. On the other hand, reducing the size of the image will reduce the size of the picture, while increasing the resolution. In other words, resolution and the size of an image are inversely proportionate to one another in bitmap images, and it is a drawback of bitmap images that resolution must be considered when adjusting the size of an image. To enlarge an image while maintaining the same resolution, the size and resolution of the image must be determined before saving the image. An image that has already been created cannot avoid changes to its resolution according to changes in size. Enlarging an image while maintaining the same resolution will increase the number of horizontal and vertical dots, ultimately increasing the number of pixels 5) of the image, and the file size. In other words, the file size of a bitmap image is affected by the size and resolution of the image (file extensions include JPG [JPEG] 6) , PNG 7) , GIF 8) , and TIF [TIFF] 9) . To avoid this complexity, the width of an image can be set to 4 inches and its resolution to 900 dpi to satisfy the submission requirements of most journals [ 12 ].

Vector images overcome the shortcomings of bitmap images. Vector images are created based on mathematical operations of line segments and areas between different points, and are not affected by aliasing or pixelation. Furthermore, they result in a smaller file size that is not affected by the size of the image. They are commonly used for drawings and illustrations (file extensions include EPS 10) , CGM 11) , and SVG 12) ).

Finally, the PDF 13) is a file format developed by Adobe Systems (Adobe Systems, CA, USA) for electronic documents, and can contain general documents, text, drawings, images, and fonts. They can also contain bitmap and vector images. While vector images are used by researchers when working in Powerpoint, they are saved as 960 × 720 dots when saved in TIFF format in Powerpoint. This results in a resolution that is inappropriate for printing on a paper medium. To save high-resolution bitmap images, the image must be saved as a PDF file instead of a TIFF, and the saved PDF file must be imported into an imaging processing program such as Photoshop™(Adobe Systems, CA, USA) to be saved in TIFF format [ 12 ].

1) Instructions to authors in KJA; section 5-(9) Table; https://ekja.org/index.php?body=instruction

2) Instructions to Authors in KJA; section 6-1)-(10) Figures and illustrations in Manuscript preparation; https://ekja.org/index.php?body=instruction

3) Instructions to Authors in KJA; section 6-1)-(10) Figures and illustrations in Manuscript preparation; https://ekja.org/index.php?body=instruction

4) Resolution; in KJA, it is represented by “contrast.”

5) Pixel is a minimum unit of an image and contains information of a dot and color. It is derived by multiplying the number of vertical and horizontal dots regardless of image size. For example, Full High Definition (FHD) monitor has 1920 × 1080 dots ≒ 2.07 million pixel.

6) Joint Photographic Experts Group.

7) Portable Network Graphics.

8) Graphics Interchange Format

9) Tagged Image File Format; TIFF

10) Encapsulated PostScript.

11) Computer Graphics Metafile.

12) Scalable Vector Graphics.

13) Portable Document Format.

  • Textual And Tabular Presentation Of Data

Think about a scenario where your report cards are printed in a textual format. Your grades and remarks about you are presented in a paragraph format instead of data tables. Would be very confusing right? This is why data must be presented correctly and clearly. Let us take a look.

Suggested Videos

Presentation of data.

Presentation of data is of utter importance nowadays. Afterall everything that’s pleasing to our eyes never fails to grab our attention. Presentation of data refers to an exhibition or putting up data in an attractive and useful manner such that it can be easily interpreted. The three main forms of presentation of data are:

  • Textual presentation
  • Data tables
  • Diagrammatic presentation

Here we will be studying only the textual and tabular presentation, i.e. data tables in some detail.

Textual Presentation

The discussion about the presentation of data starts off with it’s most raw and vague form which is the textual presentation. In such form of presentation, data is simply mentioned as mere text, that is generally in a paragraph. This is commonly used when the data is not very large.

This kind of representation is useful when we are looking to supplement qualitative statements with some data. For this purpose, the data should not be voluminously represented in tables or diagrams. It just has to be a statement that serves as a fitting evidence to our qualitative evidence and helps the reader to get an idea of the scale of a phenomenon .

For example, “the 2002 earthquake proved to be a mass murderer of humans . As many as 10,000 citizens have been reported dead”. The textual representation of data simply requires some intensive reading. This is because the quantitative statement just serves as an evidence of the qualitative statements and one has to go through the entire text before concluding anything.

Further, if the data under consideration is large then the text matter increases substantially. As a result, the reading process becomes more intensive, time-consuming and cumbersome.

Data Tables or Tabular Presentation

A table facilitates representation of even large amounts of data in an attractive, easy to read and organized manner. The data is organized in rows and columns. This is one of the most widely used forms of presentation of data since data tables are easy to construct and read.

Components of  Data Tables

  • Table Number : Each table should have a specific table number for ease of access and locating. This number can be readily mentioned anywhere which serves as a reference and leads us directly to the data mentioned in that particular table.
  • Title:  A table must contain a title that clearly tells the readers about the data it contains, time period of study, place of study and the nature of classification of data .
  • Headnotes:  A headnote further aids in the purpose of a title and displays more information about the table. Generally, headnotes present the units of data in brackets at the end of a table title.
  • Stubs:  These are titles of the rows in a table. Thus a stub display information about the data contained in a particular row.
  • Caption:  A caption is the title of a column in the data table. In fact, it is a counterpart if a stub and indicates the information contained in a column.
  • Body or field:  The body of a table is the content of a table in its entirety. Each item in a body is known as a ‘cell’.
  • Footnotes:  Footnotes are rarely used. In effect, they supplement the title of a table if required.
  • Source:  When using data obtained from a secondary source, this source has to be mentioned below the footnote.

Construction of Data Tables

There are many ways for construction of a good table. However, some basic ideas are:

  • The title should be in accordance with the objective of study:  The title of a table should provide a quick insight into the table.
  • Comparison:  If there might arise a need to compare any two rows or columns then these might be kept close to each other.
  • Alternative location of stubs:  If the rows in a data table are lengthy, then the stubs can be placed on the right-hand side of the table.
  • Headings:  Headings should be written in a singular form. For example, ‘good’ must be used instead of ‘goods’.
  • Footnote:  A footnote should be given only if needed.
  • Size of columns:  Size of columns must be uniform and symmetrical.
  • Use of abbreviations:  Headings and sub-headings should be free of abbreviations.
  • Units: There should be a clear specification of units above the columns.

The Advantages of Tabular Presentation

  • Ease of representation:  A large amount of data can be easily confined in a data table. Evidently, it is the simplest form of data presentation.
  • Ease of analysis:  Data tables are frequently used for statistical analysis like calculation of central tendency, dispersion etc.
  • Helps in comparison:  In a data table, the rows and columns which are required to be compared can be placed next to each other. To point out, this facilitates comparison as it becomes easy to compare each value.
  • Economical:  Construction of a data table is fairly easy and presents the data in a manner which is really easy on the eyes of a reader. Moreover, it saves time as well as space.

Classification of Data and Tabular Presentation

Qualitative classification.

In this classification, data in a table is classified on the basis of qualitative attributes. In other words, if the data contained attributes that cannot be quantified like rural-urban, boys-girls etc. it can be identified as a qualitative classification of data.

Quantitative Classification

In quantitative classification, data is classified on basis of quantitative attributes.

Temporal Classification

Here data is classified according to time. Thus when data is mentioned with respect to different time frames, we term such a classification as temporal.

Spatial Classification

When data is classified according to a location, it becomes a spatial classification.

A Solved Example for You

Q:  The classification in which data in a table is classified according to time is known as:

  • Qualitative
  • Quantitative

Ans:  The form of classification in which data is classified based on time frames is known as the temporal classification of data and tabular presentation.

Customize your course in 30 seconds

Which class are you in.

tutor

  • Diagrammatic Presentation of Data

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Download the App

Google Play

Probability, Statistics, and Data

Chapter 10 tabular data.

Tabular data is data on entities that has been aggregated in some way. A typical example would be to count the number of successes and failures in an experiment, and to report those aggregate numbers rather than the outcomes of the individual trials. Another way that tabular data arises is via binning, where we count the number of outcomes of an experiment that fall in certain groups, and report those numbers.

Inference on categorical variables has traditionally been performed by approximating counts with continuous variables and performing parametric methods such as the \(z\) -tests of proportions and the \(\chi^2\) -tests. With modern computing power, it is possible to calculate the probability of each experimental outcome exactly, leading to exact methods that do not rely on continuous approximation. These include the binomial test and the multinomial test. A third approach is to use Monte Carlo methods , where the computer performs simulations to estimate the probability of events under the null hypothesis.

10.1 Tables and plots

For categorical (factor) variables, the most basic information of interest is the count of observations in each value of the variable. Often, the data is better presented as proportions, which are the count divided by the total number of observations. For visual display, categorical variables are naturally shown as barplots or pie charts.

In this section, we demonstrate with two data sets. The first is fosdata::wrist , from a study of wrist fractures that recorded the fracture side and handedness of 104 elderly patients. The wrist data was used by Raittio et al. 78 to evaluate the effectiveness of two types of casts for treating a common type of wrist fracture. The second is fosdata::snails which records features of snail shells collected in England. The snails data was collected in 1950 by Cain and Sheppard 79 as an investigation into natural selection. They explored the relationship between the appearance of snails and the environment in which snails live.

Let’s begin with the wrist data set. Each row in the wrist data is an individual patient. Here we only pay attention to two variables, both coded as 1 for “right” and 2 for “left”:

For ease of interpretation, let’s change the variables into factors, which is really what they are.

The built-in command table can count the number of rows that take each value.

This table shows that there were 97 right-handed patients and 7 left-handed patients. The proportions 80 function converts the table of counts to a table of proportions:

So only 6.7% of patients in this study were left-handed. Does that sound reasonable for a random sample? We will investigate this question in the next section.

Passing two variables to table will produce a matrix of counts for each pair of values, but a better tool for the job is the xtabs function. The xtabs function builds a table, called a contingency table or cross table . The first argument to xtabs is a formula, with the factor variables to be tabulated on the right of the ~ (tilde).

One could ask if people are more likely to fracture their wrist on their non-dominant side, since more right-handed patients fractured their left hand (56) than their right hand (41).

Categorical data is often given as counts, rather than individual observations in rows. The snails data gives a count for each combination of Location, Color, and Banding. It does not have a row for each individual snail.

To make a table of Color vs. Banding for snails, use xtabs and give the Count for each group on the left side of the formula:

Frequently when creating tables of this type, we will want to know the row and column sums as well. These are generated by the function addmargins .

Other times, we are interested in the proportions that are in each cell. The proportions function could convert these counts to overall proportions, but more interesting here is to ask what the color distribution was for each type of banding. This is called a marginal distribution , and proportions will compute it with the margin option:

The sum of proportions is 1 across each row. We see that 38% of unbanded (X0000) snails were brown, but only 2% of five-banded (X12345) snails were brown. The comparison of different banding types is easier to see with a plot. Tables produced by xtabs are not tidy, and therefore not suitable for sending to ggplot. Converting the table to a data frame with as.data.frame works, but instead we compute the counts with dplyr:

Snail color and banding.

Figure 10.1: Snail color and banding.

A common approach to visualizing categorical variables is with a pie chart. Pie charts are out of favor among data scientists because colors are hard to distinguish and the sizes of wedges are difficult to compare visually. In fact, ggplot2 does not include a built-in pie chart geometry. Instead, one applies polar coordinates to a barplot. Here is an example showing the proportions of snails found in each habitat:

define tabular data presentation

Can you tell whether there were more snails in the Hedgerows or in the Mixed Deciduous Wood? If you have colorblindness or happen to be reading a black and white copy of this text, you probably cannot even tell which wedge is which.

10.2 Inference on a proportion

The simplest setting for categorical data is that of a single binomial random variable. Here \(X \sim \text{Binom}(n,p)\) is a count of successes on \(n\) independent identical trials with probability of success \(p\) . A typical experiment would fix a value of \(n\) , perform \(n\) Bernoulli trials, and produce a value of \(X\) . From this single value of \(X\) , we are interested in learning about the unknown population parameter \(p\) . For example, we may want to test whether the true proportion of times a die shows a 6 when rolled is actually 1/6. We might choose to toss the die \(n = 1000\) times and count the number of times \(X\) that a 6 occurs.

Polling is an important application. Before an election, a polling organization will sample likely voters and ask them whether they prefer a particular candidate. The results of the poll should give an estimate for the true proportion of voters \(p\) who prefer that candidate. The case of a voter poll is not formally a Bernoulli trial unless you allow the possibility of asking the same person twice; however, if the population is large then polling approximates a Bernoulli trial well enough to use these methods.

If \(X\) is the number of successes on \(n\) trials, the point estimate for the true proportion \(p\) is given by \[ \hat p = \frac{X}{n} \]

Recall that \(E[\hat{p}] = \frac 1nE[X] = \frac{1}{n}np = p\) , so \(\hat{p}\) is an unbiased estimate of \(p\) . The standard deviation \(\sigma(\hat{p})\) is \(\sqrt{p(1-p)/n}\) , so that a larger sample size \(n\) will lead to less variation in \(\hat{p}\) and therefore a better estimate of \(p\) .

Our goal is to use the sample statistic \(\hat{p}\) to calculate confidence intervals and perform hypothesis testing with regards to \(p\) .

This section introduces one sample tests of proportions . Here, we present the theory associated with performing exact binomial hypothesis tests using binom.test , as well as prop.test , which uses the normal approximation.

A one sample test of proportions requires a hypothesized value of \(p_0\) . Often \(p_0 = 0.5\) , meaning we expect success and failure to be equally likely outcomes of the Bernoulli trial. Or, \(p_0\) may come from historic values or a known larger population. The hypotheses are:

\[ H_0: p = p_0; \qquad H_a: p \not= p_0\]

You run \(n\) trials and obtain \(x\) successes, so your estimate for \(p\) is given by \(\hat p = x/n\) . (We are thinking of \(x\) as data rather than as a random variable.) Presumably, \(\hat p\) is not exactly equal to \(p_0\) , and you wish to determine the probability of obtaining an estimate that unlikely or more unlikely, assuming \(H_0\) is true.

10.2.1 Exact binomial test

Our first approach is the binomial test , which is an exact test in that it calculates a \(p\) -value using probabilities coming from the binomial distribution.

For the \(p\) -value, we are going to add the probabilities of all outcomes that are no more likely than the outcome that was obtained, since if we are going to reject when we obtain \(x\) successes, we would also reject if we obtain a number of successes that was even less likely to occur. Formally, the \(p\) -value for the exact binomial test is given by:

\[ \sum_{y:\ P(X = y) \leq P(X = x)} P(X=y)\]

Consider the wrist data. Approximately 10.6% of the world’s population is left-handed 81 . Is this sample of elderly Finns consistent with the proportion of left-handers in the world? In this binomial random variable, we choose left-handedness as success. Then \(p\) is the true proportion of elderly Finns who are left-handed and \(p_0 = 0.106\) . Our hypotheses are:

\[ H_0: p = 0.106; \qquad H_a: p \not= 0.106 \]

The sample contains 104 observations and has 7 left-handed patients, giving \(\hat{p} = 7/104 \approx 0.067\) , which is lower than \(p_0\) . The probability of getting exactly 7 successes under \(H_0\) is dbinom(7, 104, 0.106) , or 0.061. Anything less than 7 successes is less likely under the null hypothesis, so we would add all of those to get part of the \(p\) -value. To determine which values we add for successes greater than 7, we look for all outcomes that have probability of occurring (under the null hypothesis) less than 0.061. That is all outcomes 15 through 104, since \(X = 14\) is more likely than \(X = 7\) ( dbinom(14, 104, 0.106) = 0.075 > 0.061) while \(X = 15\) is less likely than \(X = 7\) ( dbinom(15, 104, 0.106) = 0.053 < 0.061).

The calculation is illustrated in Figure 10.2 , where the dashed red line indicates the probability of observing exactly 7 successes. We sum all of the probabilities that are at or below the dashed red line.

The pmf for $X \sim \text{Binom}(104, 0.106)$, with a line at $P(X = 7)$. $X$ values past 25 are negligible and not shown.

Figure 10.2: The pmf for \(X \sim \text{Binom}(104, 0.106)\) , with a line at \(P(X = 7)\) . \(X\) values past 25 are negligible and not shown.

The \(p\) -value is \[ P(X \le 7) + P(X \ge 15) \] where \(X\sim \text{Binom}(n = 104, p = 0.106)\) .

R will make these computations for us, naturally, in the following way.

With a \(p\) -value of 0.26, we fail to reject the null hypothesis. There is not sufficient evidence to conclude that elderly Finns have a different proportion of left-handers than the world’s proportion of lefties.

The binom.test function also produces the 95% confidence interval for \(p\) . In this example, we are 95% confident that the true proportion of left-handed elderly Finns is in the interval \([0.027, 0.134]\) . Since \(0.106\) lies in the 95% confidence interval, we failed to reject the null hypothesis at the \(\alpha = 0.05\) level.

10.2.2 One sample test of proportions

When \(n\) is large and \(p\) isn’t too close to 0 or 1, binomial random variables with \(n\) trials and probability of success \(p\) are well approximated by normal random variables with mean \(np\) and standard deviation \(\sqrt{np(1 - p)}\) . This can be used to get approximate \(p\) -values associated with the hypothesis test \(H_0: p = p_0\) versus \(H_a: p\not= p_0\) .

As before, we need to compute the probability under \(H_0\) of obtaining an outcome that is as likely or less likely than obtaining \(x\) successes. However, in this case we are using the normal approximation, which is symmetric about its mean. The \(p\) -value is twice the area of the tail outside of \(x\) .

The prop.test function performs this calculation, and has identical syntax to binom.test .

We return to the wrist example, testing \(H_0: p = 0.106\) versus \(H_a: p\not= 0.106\) . Let \(X\) be a binomial random variable with \(n = 104\) and \(p = 0.106\) . \(X\) is approximated by a normal variable \(Y\) with \[\begin{align} \mu(Y) &= np = 104 \cdot 0.106 = 11.024\\ \sigma(Y) &= \sqrt{np(1 - p)} = \sqrt{104 \cdot 0.106 \cdot 0.894} = 3.13934. \end{align}\]

Binomial rv with normal approximation overlaid.

Figure 10.3: Binomial rv with normal approximation overlaid.

Figure 10.3 is a plot of the pmf of \(X\) with its normal approximation \(Y\) overlaid. The shaded area corresponds to \(Y \leq 7\) . The \(p\) -value is twice that area, which we compute with pnorm :

For a better approximation, we perform a continuity correction (see also Example 4.26 ). The basic idea is that \(Y\) is a continuous rv, so when \(Y = 7.3\) , for example, we need to decide what integer value that should be associated with. Rounding suggests that \(Y=7.3\) should correspond to \(X=7\) and be included in the shaded area. The continuity correction includes values from 7 to 7.5 in the \(p\) -value, resulting in a corrected \(p\) -value of:

The continuity correction gives a more accurate approximation to the underlying binomial rv, but not necessarily a closer approximation to the exact binomial test.

The built-in R function for the one sample test of proportions is prop.test :

The prop.test function performs continuity correction by default. The \(p\) -value here is almost identical to the result of binom.test , and as before we fail to reject \(H_0\) . The confidence interval produced is also quite similar to the exact binomial test.

Look at the full output of prop.test(x = 7, n = 104, p = 0.106) , and observe that the test statistic is given as a \(\chi^2\) random variable with 1 degree of freedom. Confirm that the test statistic is \(c = \left((\tilde x - np_0)/\sqrt{np_0(1 - p_0)}\right)^2\) , where \(\tilde x\) is the number of successes after a continuity correction (in this case, \(\tilde x = 7.5\) ).

Use pchisq(c, 1, lower.tail = FALSE) to recompute the \(p\) -value using this test statistic. You should get the same answer \(p=\) 0.2616377.

The Economist/YouGov Poll leading up to the 2016 presidential election sampled 3669 likely voters and found that 1798 intended to vote for Clinton. Assuming that this is a random sample from all likely voters, find a 99% confidence interval for \(p\) , the true proportion of likely voters who intended to vote for Clinton at the time of the poll.

We are 99% confident that the true proportion of likely Clinton voters was between .465 and .507. In fact, 48.2% of voters did vote for Clinton, and the true value does fall in the 99% confidence interval range.

Most polls do not report a confidence interval. Typically, they report the point estimator \(\hat{p}\) and the margin of error , which is half the width of the 95% confidence interval. For this poll, \(\hat{p} \approx 0.486\) and the 95% confidence interval is \([0.470, 0.502]\) so the pollsters would report that they found 48.6% in favor of Clinton with a margin of error of 1.6%.

10.3 \(\chi^2\) tests

The \(\chi^2\) test is a general approach to testing the hypothesis that tabular data follows a given distribution. It relies on the Central Limit Theorem, in that the various counts in the tabular data are assumed to be approximately normally distributed.

The setting for \(\chi^2\) testing requires tabular data. For each cell in the table, the count of observations that fall in that cell is a random variable. We denote the observed counts in the \(k\) cells by \(X_1, \dotsc, X_k\) . The null hypothesis requires an expected count for each cell, \(E[X_i]\) . The test statistic is the \(\chi^2\) statistic.

If \(X_1, \dotsc, X_k\) are the observed counts of cells in tabular data, then the \(\chi^2\) statistic is:

\[ \chi^2 = \sum_{i=1}^k \frac{(X_i - E[X_i])^2}{E[X_i]} \]

The \(\chi^2\) statistic is always positive, and will be larger when the observed values \(X_i\) are far from the expected values \(E[X_i]\) . In all cases we consider, the \(\chi^2\) statistic will have approximately the \(\chi^2\) distribution with \(d\) degrees of freedom, for some \(d < k\) . The \(p\) -value for the test is the probability of a \(\chi^2\) value as large or larger than the observed \(\chi^2\) . The R function chisq.test computes \(\chi^2\) and the corresponding \(p\) -value.

The \(\chi^2\) test is always a one-tailed test. For example, if we observe \(\chi^2 = 10\) and have four degrees of freedom, the \(p\) -value corresponds to the shaded area in Figure 10.4 .

$\chi^2$ distribution with $p$-value shaded.

Figure 10.4: \(\chi^2\) distribution with \(p\) -value shaded.

The full theory behind the \(\chi^2\) test is beyond the scope of this book, but in the remainder of this section we give some motivation for the formula for \(\chi^2\) and the meaning of degrees of freedom. A reader less interested in theory could proceed to Section 10.3.1 .

Consider the value in one particular cell of tabular data. For each of the \(n\) observations in the sample, the observation either lies in the cell or it does not, hence the count in that one cell can be considered as a binomial rv \(X_i\) . Let \(p_i\) be the probability a random observation is in that cell. Then \(E[X_i] = np_i\) and \(\sigma(X_i) = \sqrt{np_i(1-p_i)}\) . If \(np_i\) is sufficiently large (at least 5, say) then \(X_i\) is approximately normal and \[ \frac{X_i - np_i}{\sqrt{np_i(1-p_i)}} \sim Z_i, \] where \(Z_i\) is a standard normal variable. Squaring both sides and multiplying by \((1-p_i)\) we have \[ (1-p_i)Z_i^2 \sim \frac{(X_i - np_i)^2}{np_i} = \frac{(X_i - E[X_i])^2}{E[X_i]} \]

As long as all cell counts are large enough, the \(\chi^2\) statistic is approximately \[ \chi^2 = \sum_{i=1}^k (1-p_i)Z_i^2 \] In this expression, the \(Z_i\) are standard normal but not independent random variables. In many circumstances, one can rewrite these \(k\) variables in terms of a smaller number \(d\) of independent standard normal rvs and find that the \(\chi^2\) statistic does have a \(\chi^2\) distribution with \(d\) degrees of freedom. The details of this process require some advanced linear algebra and making precise what we mean when we say \(X_i\) are approximately normal. The details of the dependence are not hard to work out in the simplest case when the table has two cells.

Consider a table with two cells, with \(n\) observations, cell probabilities \(p_1\) , \(p_2\) , and cell counts given by the random variables \(X_1\) and \(X_2\) :

\[ \begin{array}{|c|c|} \hline X_1 & X_2 \\ \hline \end{array} \]

This is simply a single binomial rv in disguise, with \(X_1\) the count of successes and \(X_2\) the count of failures. In particular, \(p_1 + p_2 = 1\) and \(X_1 + X_2 = n\) . Notice that

\[ \frac{X_1 - np_1}{\sqrt{np_1(1-p_1)}} + \frac{X_2 - np_2}{\sqrt{np_2(1-p_2)}} = \frac{X_1 + X_2 - n(p_1 + p_2)}{\sqrt{np_1p_2}} = \frac{n - n(1)}{\sqrt{np_1p_2}} = 0. \]

So the two variables \(Z_1\) and \(Z_2\) are not independent, and satisfy the equation \(Z_1 + Z_2 = 0\) . Then both can be written in terms of a single rv \(Z\) with \(Z_1 = Z\) and \(Z_2 = -Z\) . As long as \(X_1\) and \(X_2\) are both large, \(Z_i\) will be approximately standard normal, and \[ \chi^2 = (1-p_1)Z_1^2 + (1-p_2)Z_2^2 = (1- p_1 + 1 - p_2)Z^2 = Z^2. \] We see that \(\chi^2\) has the \(\chi^2\) distribution with one df.

The table in this example has two entries, giving two possible counts \(X_1\) and \(X_2\) . The constraint that these counts must sum to \(n\) leaves only the single degree of freedom to choose \(X_1\) .

10.3.1 \(\chi^2\) test for given probabilities

In this section, we consider data coming from a single categorical variable, typically displayed in a \(1 \times k\) table:

\[ \begin{array}{|c|c|c|c|} \hline X_1 & X_2 & \quad\dotsb\quad & X_k \\ \hline \end{array} \]

For our null hypothesis, we take some vector of probabilities \(p_1, \dotsc, p_k\) as given. Because there are \(k\) cells to fill and a single constraint (the cells sum to \(n\) ), the \(\chi^2\) statistic will have \(k-1\) df.

The most common case is when we assume all cells are equally likely, in which case this approach is called the \(\chi^2\) test for uniformity .

Doyle, Bottomley, and Angell 82 investigated the “Relative Age Effect,” the disadvantage of being the youngest in a cohort relative to being the oldest. The difference in outcomes can persist for years beyond any difference in actual ability relative to age difference.

In this study, the authors found the number of boys enrolled in the British elite soccer academies for under 18 years of age. They binned the boys into three age groups: oldest, middle, and youngest, with approximately 1/3 of all British children in each group. The number of boys in the elite soccer academies was:

\[ \begin{array}{|c|c|c|} \hline \text{Oldest} & \text{Middle} & \text{Youngest} \\ \hline 631 & 321 & 155 \\ \hline \end{array} \]

The null hypothesis is that the boys should be equally distributed among the three groups, or equivalently \(H_0: p_i = \frac{1}{3}\) . There are a total of 1107 boys in this study. Under the null hypothesis, we expect \(369 = 1107/3\) in each group. Then \[ \chi^2 = \frac{(631-369)^2}{369} + \frac{(321-369)^2}{369} + \frac{(155-369)^2}{369} \approx 316.38. \] The test statistic \(\chi^2\) has the \(\chi^2\) distribution with \(2 = 3-1\) degrees of freedom, and a quick glance at that distribution shows that our observed 316.38 is impossibly unlikely to occur by chance. The \(p\) -value is essentially 0, and we reject the null hypothesis. Boys’ ages in elite British soccer academies are not uniformly distributed across the three age bands for a given year.

In R, the computation is done with chisq.test :

Benford’s Law is used in forensic accounting to detect falsified or manufactured data. When data, such as financial or economic data, occurs over several orders of magnitude, the first digits of the values follow the distribution

\[ P(\text{first digit is}~d) = \log_{10}(1 + 1/d) \]

The data fosdata::rio_instagram has the number of Instagram followers for gold medal winners at the 2016 Olympics. First, we extract the first digits of each athlete’s number of followers:

Let’s visually compare the counts of observed first digits (as bars) to the expected counts from Benford’s Law (red dots):

define tabular data presentation

Is the observed data consistent with Benford’s Law?

The observed value of \(\chi^2\) is 9.876, from a \(\chi^2\) distribution with \(8 = 9 - 1\) degrees of freedom. This is not extraordinary. The \(p\) -value is 0.2738 and we fail to reject \(H_0\) . The data is consistent with Benford’s Law.

10.4 \(\chi^2\) goodness of fit

In this section, we consider tabular data that is hypothesized to follow a parametric model. When the parameters of the model are estimated from the observed data, the model fits the data better than it should. Each estimated parameter reduces the degrees of freedom in the \(\chi^2\) distribution by one.

When testing goodness of fit, the \(\chi^2\) statistic is approximately \(\chi^2\) with degrees of freedom given by the following:

\[ \text{degrees of freedom} = \text{bins} - 1 - \text{parameters estimated from the data}. \]

We will explore this claim through simulation in Section 10.4.1 .

Goals in a soccer game arrive at random moments and could be reasonably modeled by a Poisson process. If so, the total number of goals scored in a soccer game should be a Poisson rv.

The data set world_cup from fosdata contains the results of the 2014 and 2015 FIFA World Cup soccer finals. Let’s get the number of goals scored by each team in each game of the 2015 finals:

We want to perform a hypothesis test to determine whether a Poisson model is a good fit for the distribution of goals scored. The Poisson distribution has one parameter, the rate \(\lambda\) . The expected value of a Poisson rv is \(\lambda\) , so we estimate \(\lambda\) from the data:

Here \(\lambda \approx 1.4\) , meaning 1.4 goals were scored per game, on average. Figure 10.5 displays the observed counts of goals with the expected counts from the Poisson model \(\text{Pois}(\lambda)\) in red.

Goals scored by each team in each game of the 2015 World Cup. Poisson model shown with red dots.

Figure 10.5: Goals scored by each team in each game of the 2015 World Cup. Poisson model shown with red dots.

Since the \(\chi^2\) test relies on the Central Limit Theorem, each cell in the table should have a large expected value to be approximately normal. Traditionally, the threshold is that a cell’s expected count should be at least five. Here, all cells with 4 or more goals are too small. The solution is to bin these small counts into one category, giving five total categories: zero goals, one goal, two goals, three goals, or 4+ goals. The observed and expected counts for the five categories are:

The \(\chi^2\) test statistic will have \(3 = 5 - 1 - 1\) df, since:

  • There are 5 bins.
  • The bins sum to 104, losing one df.
  • The model’s one parameter \(\lambda\) was estimated from the data, losing one df.

We compute the \(\chi^2\) test statistic and \(p\) -value manually, because the chisq.test function is unaware that our expected values were modeled from the data, and would use the incorrect df.

The observed value of \(\chi^2\) is 6.15. The \(p\) -value of this test is 0.105, and we would not reject \(H_0\) at the \(\alpha = .05\) level. This test does not give evidence against goal scoring being Poisson.

Note that there is one aspect of this data that is highly unlikely under the assumption that the data comes from a Poisson random variable: ten goals were scored on two different occasions. The \(\chi^2\) test did not consider that, because we binned those large values into a single category. If you believe that data might not be Poisson because you suspect it will have unusually large values (rather than unusually many large values), then the \(\chi^2\) test will not be very powerful.

10.4.1 Simulations

This section investigates the test statistic in the \(\chi^2\) goodness of fit test via simulation. We observe that it does follow the \(\chi^2\) distribution with df equal to bins minus one minus number of parameters estimated from the data.

Suppose that data comes from a Poisson variable \(X\) with mean 2 and there are \(N = 200\) data points.

The expected count in bin 5 is 200 * dpois(5,2) which is 7.2, large enough to use. The expected count in bin 6 is only 2.4, so we combine all bins 5 and higher. In a real experiment, the sample data could affect the number of bins chosen, but we ignore that technicality.

Next, compute the expected counts for each bin using the rate \(\lambda\) estimated from the data. Bins 0-4 can use dpois but bin 5 needs the entire tail of the Poisson distribution.

Finally, we produce one value of the test statistic:

Naively using chisq.test with the data and the fit probabilities gives the same value of \(\chi^2 = 0.9264\) , but produces a \(p\) -value using 5 df, which is wrong. The function does not know that we used one df to estimate a parameter.

We now replicate to produce a sample of values of the test statistic to verify that 4 is the correct df for this test:

define tabular data presentation

The black curve is the probability density from our simulated data. The blue curve is \(\chi^2\) with 4 degrees of freedom, equal to (bins - parameters - 1). The red curve is \(\chi^2\) with 5 degrees of freedom and does not match the observations. This seems to be pretty compelling.

10.5 \(\chi^2\) tests on cross tables

Given two categorical variables \(A\) and \(B\) , we can form a cross table with one cell for each pair of values \((A_i,B_j)\) . That cell’s count is a random variable \(X_{ij}\) :

\[ \begin{array}{c|c|c|c|c|} & B_1 & B_2 & \quad\dotsb\quad & B_n \\ \hline A_1 & X_{11} & X_{12} & \quad\dotsb\quad & X_{1n} \\ \hline A_2 & X_{21} & X_{22} & \quad\dotsb\quad & X_{2n} \\ \hline \vdots & \vdots &\vdots & \quad\ddots\quad & \vdots \\ \hline A_m & X_{m1} & X_{m2} & \quad\dotsb\quad & X_{mn} \\ \hline \end{array} \]

As in all \(\chi^2\) tests, the null hypothesis leads to an expected value for each cell. In this setting, we require a probability \(p_{ij}\) that an observation lies in cell \((i,j)\) , \(p_{ij} = P(A = A_i\ \cap\ B = B_j)\) . These probabilities are called the joint probability distribution of \(A\) and \(B\) .

The hypothesized joint probability distribution needs to come from somewhere. It could come from historical or population data, or by fitting a parametric model, in which case the methods of the previous two sections apply.

We assume that \(B\) is random (and perhaps \(A\) as well, but not necessarily) and we consider the null hypothesis that the probability distribution of \(B\) is independent of the levels of \(A\) . Let \(N\) be the total number of observations. If we let \(a_i = \frac{1}{N} \sum_j X_{ij}\) denote the proportion of observations for which \(A = A_i\) and \(b_j = \frac{1}{N} \sum_i X_{ij}\) denote the proportion of responses for which \(B = B_j\) , then under the assumption of \(H_0\) we would hypothesize that \[ p_{ij} = a_i b_j. \] It follows that \(E[X_{ij}] = N a_i b_j\) .

The test statistic is \[ \chi^2 = \sum_{i,j} \frac{(X_{ij} - E[X_{ij}])^2}{E[X_{ij}]}. \]

When the expected cell counts \(E[X_{ij}]\) are all large enough, the test statistic has approximately a \(\chi^2\) distribution with \((\text{columns} - 1)(\text{rows} - 1)\) degrees of freedom. There are two explanations for why this is the correct degrees of freedom, depending on the details of the experimental design. The mechanics of the test itself, however, do not depend on the experimental design. Sections 10.5.1 and 10.5.2 discuss the details.

10.5.1 \(\chi^2\) test of independence

In the \(\chi^2\) test of independence, the levels of \(A\) and \(B\) are both random. In this case, we are testing

\[ H_0: A {\text{ and }} B {\text{ are independent random variables}} \] versus the alternative that they are not independent. The values of \(p_{ij} = a_i b_j\) have a natural interpretation as \(p_{ij} = P(A = A_i \cap B = B_j) = P(A = A_i) P(B = B_j)\) .

To understand the degrees of freedom in the test for independence, the experimental design matters. We fix \(N\) the total number of observations, and for each subject the two categorical variables \(A\) and \(B\) are measured (see Example 10.9 ). The row and column marginal sums of the cross table are random. Then:

  • There are \(mn\) cells.
  • There are \(m + n\) marginal probabilities \(a_i\) and \(b_j\) estimated from the data, and \(\sum a_i = \sum b_i = 1\) , so we lose \(m + n - 2\) df.
  • All cell counts must add to \(N\) , losing one df.
  • \(mn - (m + n - 2) - 1 = (m - 1)(n - 1)\)

Are grove snail color and banding patterns related? Figure 10.1 suggests that brown snails are more likely to be unbanded than the other colors.

In R, the \(\chi^2\) test for independence is simple: we pass the cross table to chisq.test .

The cross table is \(3 \times 4\) , so the \(\chi^2\) statistic has \((3-1)(4-1) = 6\) df. The \(p\) -value is very small, so we reject \(H_0\) . Snail color and banding are not independent.

Let’s reproduce the results of chisq.test . First, compute marginal probabilities.

Next, compute the joint distribution \(p_{ij} = a_ib_j\) . This uses the matrix multiplication operator %*% and the matrix transpose t to compute all 12 entries at once. The result is multiplied by \(N = 2904\) to get expected cell counts:

Finally, compute the \(\chi^2\) test statistic and the \(p\) -value, which match the results of chisq.test .

It is instructive to view each cell’s contribution to \(\chi^2\) graphically as a “heatmap” to provide a sense of which cells were most responsible for the dependence between color and banding.

define tabular data presentation

Clearly, most of the interaction between Banding and Color comes from the overabundance of unbanded (X00000) Brown snails. The authors of the original study were interested in environmental effects on color and bandedness of snails. It is possible, though a more thorough analysis would be required, that an environment that favors the survival of brown snails also favors unbanded snails.

To what extent do animals display conformity? That is, will they forgo personal information in order to follow the majority? Researchers 83 studied conformity among dogs. They trained a subject dog to walk around a wall in one direction in order to receive a treat. After training, the subject dog then watched other dogs walk around the wall in the opposite direction. If the subject dog changes its behavior to match the dogs it observed, it is evidence of conforming behavior.

The data from this experiment is available as the dogs data frame in the fosdata package.

This data set has quite a bit going on. In particular, each dog repeated the experiment three times, which means that it would be unwise to assume independence across trials. So, we will restrict to the first trial only. We also restrict to dogs that did not drop out of the experiment.

Subject dogs participated under three conditions. The control group (condition = 0) observed no other dogs, and was simply asked to repeat what they were trained to do. Another group (condition = 1) saw one dog that went the “wrong” way around the wall three times. Another group (condition = 3) saw three different dogs that each went the wrong way around the wall one time.

We summarize the results of the experiment with a table showing the three experimental conditions in the three rows and whether the subject dog conformed or not in the two columns.

The null hypothesis is that conform and condition are independent variables, so that the three groups of dogs would have the same conforming behavior. We store the cross table and apply the \(\chi^2\) test for independence:

The \(p\) -value is 0.61, so there is not significant evidence that the conform and condition variables are dependent. Dogs do not disobey training to conform, at least according to this simple analysis.

The \(\chi^2\) test reports 2 df because we have 3 rows and 2 columns, and \((3-1)(2-1) = 2\) . The test also produces a warning, because the expected cell count for conforming dogs under condition 0 is low. With a high \(p\) -value and good cell counts elsewhere, the lack of accuracy is not a concern.

A link to the paper associated with the dogs data is given in ?fosdata::dogs . Find the place in the paper where they perform the above \(\chi^2\) test, and read the authors’ explanation related to it.

10.5.2 \(\chi^2\) test of homogeneity

In a \(\chi^2\) test of homogeneity, one of the variables \(A\) and \(B\) is not random. For example, if an experimenter decides to collect data on cats by finding 100 American shorthair cats, 100 highlander cats, and 100 munchkin cats and measuring eye color for each of the 300 cats, then the number of cats of each breed is not a random variable. A test of this type is called a \(\chi^2\) test of homogeneity, or a \(\chi^2\) test with one fixed margin. However, we are still interested in whether the distribution of eye color depends on the breed of the cat, and we proceed exactly in the same manner as before, with a slightly reworded null hypothesis and a different justification of the degrees of freedom. We denote \(B\) as the variable that is random. Our null hypothesis is:

\[ H_0: {\text{ the distribution of $B$ does not depend on the level of $A$}} \] and the alternative hypothesis is that the distribution of \(B\) does depend on the level of \(A\) . We compute degrees of freedom as follows:

  • There are \(n\) marginal probabilities \(b_1, \ldots, b_n\) . Since these must sum to 1, we lose \(n - 1\) degrees of freedom.
  • Each row sums to a fixed number, so we lose \(m\) degrees of freedom.
  • We do not lose any degrees of freedom for all bins summing to \(N\) , since that is implied by the column condition.
  • Total degrees of freedom are \(mn - (n - 1) - m = (m - 1)(n - 1)\) , as in the case of the \(\chi^2\) test of independence.

The mechanics of a \(\chi^2\) test of homogeneity are the same as a \(\chi^2\) test of independence.

Consider the sharks data set 84 in the fosdata package. Participants were paid 25 cents to listen to either silence, ominous music, or uplifting music while possibly watching a video on sharks. An equal number were recruited for each type of music. They were then asked to give their rating from 1-7 on their willingness to help conserve endangered sharks. We are interested in whether the distribution of the participants’ willingness to conserve sharks depends on the type of music they listened to.

We start by computing the cross table of the data.

The rows do not add up to exactly the same number because some participants dropped out of the study. We ignore this problem and continue.

We see that there is not sufficient evidence to conclude that the distribution of willingness to help conserve endangered sharks depends on the type of music heard ( \(p = .6982\) ).

10.5.3 Two sample test for equality of proportions

An important special case of the \(\chi^2\) test for independence is the two sample test for equality of proportions.

Suppose that \(n_1\) trials are made from population 1 with \(x_1\) successes, and that \(n_2\) trials are made from population 2 with \(x_2\) successes. We wish to test \(H_0: p_1 = p_2\) versus \(H_a: p_1 \not= p_2\) , where \(p_i\) is the true probability of success from population \(i\) . We create a \(2\times 2\) table of values as follows:

\[ \begin{array}{c|c|c|} & \text{Pop. 1} & \text{Pop. 2} \\ \hline \text{Successes} & x_{1} & x_{2} \\ \hline \text{Failures} & n_1 - x_1 & n_2 - x_2 \\ \hline \end{array} \]

The null hypothesis says that \(p_1 = p_2\) . We estimate this common probability using all the data:

\[ \hat{p} = \frac{\text{Successes}}{\text{Trials}} = \frac{x_1 + x_2}{n_1 + n_2} \]

The expected number of successes under \(H_0\) is calculated from \(n_1\) , \(n_2\) , and \(\hat{p}\) :

\[ \begin{array}{c|c|c|} & \text{Pop. 1} & \text{Pop. 2} \\ \hline \text{Exp. Successes} & n_1\hat{p} & n_2\hat{p} \\ \hline \text{Exp. Failures} & n_1(1-\hat{p}) & n_2(1-\hat{p})\\ \hline \end{array} \]

We then compute the \(\chi^2\) test statistic. This has 1 df, since there were 4 cells, two constraints that the columns sum to \(n_1\) , \(n_2\) , and one parameter estimated from the data.

The test statistic and \(p\) -value can be computed with chisq.test . The prop.test function performs the same computation, and allows for finer control over the test in this specific setting.

Researchers randomly assigned patients with wrist fractures to receive a cast in one of two positions, the VFUDC position and the functional position. The assignment of cast position should be independent of which wrist (left or right) was fractured. We produce a cross table from the data in fosdata::wrist and run the \(\chi^2\) test for independence:

For prop.test we need to know the group sizes, \(n_1 = 45\) with right-side fractures and \(n_2 = 60\) with left-side fractures. We also need the number of successes, which we arbitrarily select as cast position 1.

The prop.test function applies a continuity correction by default. chisq.test only applies continuity correction in this \(2 \times 2\) case. There seems to be some disagreement on whether or not continuity correction is desirable. From the point of view of this text, we would choose the version that has observed type I error rate closest to the assigned rate of \(\alpha\) . Let’s run some simulations, using \(n_1 = 45\) , \(n_2 = 60\) , and success probability \(p = 50/105\) to match the wrist example.

We see that for this sample size and common probability of success, correct = FALSE comes closer to the desired type I error rate of 0.05, but is a bit too high. This holds across a wide range of \(p\) , \(n_1\) and \(n_2\) . Using continuity correction tends to have effective error rates lower than the designed type I error rates, while correct = FALSE has type I error rates closer to the designed type I error rates.

Consider the babynames data set in the babynames package. Is there a statistically significant difference in the proportion of girls named “Bella” 85 in 2007 and the proportion of girls named “Bella” in 2009?

We will need to do some data wrangling on this data and define a binomial variable bella :

We see that the number of girls named “Bella” nearly doubled from 2007 to 2009. The two sample proportions test shows that this was highly significant.

10.6 Exact and Monte Carlo methods

The \(\chi^2\) methods of the previous sections all approximate discrete variables with continuous (normal) variables. Exact and Monte Carlo methods are very general approaches to testing tabular data, and neither method requires assumptions of normality.

Exact methods produce exact \(p\) -values by examining all possible ways the \(N\) outcomes could fill the table. The first step of an exact method is to compute the test statistic associated to the observed data, often \(\chi^2\) . Then for each possible table, compute the test statistic and the probability of that table occurring, assuming the null hypothesis. The \(p\) -value is the sum of the probabilities of the tables whose associated test statistics are as extreme or more extreme than the observed test statistic. This \(p\) -value is exact because (assuming the null hypothesis) it is exactly the probability of obtaining a test statistic as or more extreme than the one coming from the data.

Unfortunately, the number of ways to fill out a table grows exponentially with the number of cells in the table (or more precisely, exponentially in the degrees of freedom). This makes exact methods unreasonably slow when \(N\) is large or the table has many cells. Monte Carlo methods present a compromise that avoids assumptions but stays computationally tractable. Rather than investigate every possible way to fill the table, we randomly create many tables according to the null hypothesis. For each, the \(\chi^2\) statistic is computed. The \(p\) -value is taken to be the proportion of generated tables that have a larger \(\chi^2\) statistic than the observed data. Though we compute the \(\chi^2\) statistic for the observed and simulated tables, we do not rely on assumptions about its distribution – it may not have a \(\chi^2\) distribution at all.

Return to the data on age cohorts in soccer, introduced in Example 10.6 . There were three relative age groups in each cohort year: old, middle, and young. Our null hypothesis is that each age group should be equally likely for an elite soccer player in a given cohort. The data has \(N = 1107\) boys, with 631, 321, and 155 in the old, middle, and young groups.

To apply Monte Carlo methods, we need to generate simulated \(3\times 1\) tables. We use the R function rmultinom , which generates multinomially distributed random number vectors. As with all random variable generation functions in R, the first argument to rmultinom is the number of simulations we want. Then there are two required parameters, the number of observations \(N\) in each table and the null hypothesis probability distribution. Here are ten tables that might result from the experiment, one in each column.

From the first column, one possible outcome of the soccer study would be to find 355, 394, and 358 boys in the old, medium, and young groups. The next nine columns are also possible outcomes, each with \(N = 1107\) observations. It is apparent that the observed value of 631 boys in the “old” group is exceptionally large under \(H_0\) .

To get a \(p\) -value, we first compute the \(\chi^2\) statistic for the observed data:

The \(\chi^2\) statistic is a measure of how far our observed group sizes are from the expected group sizes. For the observed boys, \(\chi^2\) is 316.3794. Next compute the \(\chi^2\) statistic for each set of simulated group sizes:

Again, it is clear that the observed data is quite different than the data that was simulated under \(H_0\) . We should use more than 10 simulations, of course, but for this particular data you will never see a value as large as 316 in the simulations. The true \(p\) -value for this experiment is essentially zero.

R can carry out the Monte Carlo method within the chisq.test function:

The function performed 2000 simulations and none of them had a higher \(\chi^2\) value than the observed data. The \(p\) -value was reported as 1/2001, because R always includes the actual data set in addition to the 2000 simulated values. This is a common technique that makes a relatively small absolute difference in estimates.

Continuing with the boys elite soccer age data, we show how to apply the exact multinomial test .

The idea of the exact test is to sum the probabilities of all tables that lead to test statistics that are as extreme or more extreme than the observed test statistic. The table of boys is \(3 \times 1\) , and we need the three values in the table to sum to 1107. In Exercise 10.29 , you are asked to show that there are 614386 possible ways to fill a \(3 \times 1\) table with numbers that sum to 1107.

The multinomial.test function in the EMT package carries out this process.

As before, the \(p\) -value is 0. The EMT::multinomial.test function can also run Monte Carlo tests using the parameter MonteCarlo = TRUE .

Vignette: Tables

Tables are an often overlooked part of data visualization and presentation. They can also be difficult to do well! In this vignette, we introduce the knitr::kable function, which produces tables compatible with .pdf, .docx and .html output inside of your R Markdown documents.

To make a table using knitr::kable , create a data frame and apply kable to it.

Suppose you are studying the palmerpenguins::penguins data set, and you want to report the mean, standard deviation, range, and number of samples of bill length in each species type. The dplyr package helps to produce the data frame, and we use kable options to create a caption and better column headings. The table is displayed as Table 10.1 .

The kable package provides only basic table styles. To adjust the width and other features of table style, use the kableExtra package.

Another interesting use of tables is in combination with broom::tidy , which converts the outputs of many common statistical tests into data frames. Let’s see how it works with t.test .

Display the results of a \(t\) -test of the body temperature data from fosdata::normtemp in a table.

We selected only the first six variables so that the table would better fit the page.

As a final example, let’s test groups of cars from mtcars to see if their mean mpg is different from 25. The groups we want are the four possible combinations of transmission ( am ) and engine ( vs ). This requires four \(t\) -tests, and could be a huge hassle! But, check this out:

Exercises 10.1 – 10.2 require material through Section 10.1 .

define tabular data presentation

Consider the cern data set in the fosdata package. Create a figure similar to Figure 10.1 which illustrates the total number of likes for each type of post, colored by the platform. French Twitter may not show up because it has so few likes.

Exercises 10.3 – 10.8 require material through Section 10.2 .

Suppose you are testing \(H_0: p = 0.4\) versus \(H_a: p \not= 0.4\) . You collect 20 pieces of data and observe 12 successes. Use dbinom to compute the \(p\) -value associated with the exact binomial test, and check using binom.test .

Suppose you are testing \(H_0: p = 0.4\) versus \(H_a: p \not= 0.4\) . You collect 100 pieces of data and observe 33 successes. Use the normal approximation to the binomial to find an approximate \(p\) -value associated with the hypothesis test.

Shaquille O’Neal (Shaq) was an NBA basketball player from 1992–2011. He was a notoriously bad free throw shooter 87 . Shaq always claimed, however, that the true probability of him making a free throw was greater than 50%. Throughout his career, Shaq made 5,935 out of 11,252 free throws attempted. Is there sufficient evidence to conclude that Shaq indeed had a better than 50/50 chance of making a free throw?

Diaconis, Holmes and Montgomery 88 claim that vigorously flipped coins tend to come up the same way they started. In a real coin tossing experiment 89 , two UC Berkeley students tossed coins a total of 40 thousand times in order to assess whether this is true. Out of the 40,000 tosses, 20,245 landed on the same side as they were tossed from.

  • Find a (two-sided) 99% confidence interval for \(p\) , the true proportion of times a coin will land on the same side it is tossed from.
  • Clearly state the null and alternative hypotheses, defining any parameters that you use.
  • Is there sufficient evidence to reject the null hypothesis at the \(\alpha = .05\) level based on this experiment? What is the \(p\) -value?

This exercise requires material from Section 6.7 or knowledge of loops. The curious case of the dishonest statistician – suppose a statistician wants to “prove” that a coin is not a fair coin. They decide to start flipping the coin, and after 10 tosses they will run a hypothesis test on \(H_0: p = 1/2\) versus \(H_a: p \not= 1/2\) . If they reject at the \(\alpha = .05\) level, they stop. Otherwise, they toss the coin one more time and run the test again. They repeatedly toss and run the test until either they reject \(H_0\) or they toss the coin 100 times (hey, they’re dishonest and lazy). Estimate using simulation the probability that the dishonest statistician will reject \(H_0\) .

Suppose you wish to test whether a die truly comes up “6” 1/6 of the time. You decide to roll the die until you observe 100 sixes. You do this, and it takes 560 rolls to observe 100 sixes.

  • State the appropriate null and alternative hypotheses.
  • Explain why prop.test and binom.test are not formally valid to do a hypothesis test.
  • Use reasoning similar to that in the explanation of binom.test above and the function dnbinom to compute a \(p\) -value.
  • Should you accept or reject the null hypothesis?

Exercises 10.9 – 10.12 require material through Section 10.3 .

Suppose you are collecting categorical data that comes in three levels. You wish to test whether the levels are equally likely using a \(\chi^2\) test. You collect 150 items and obtain a test statistic of 4.32. What is the \(p\) -value associated with this experiment?

Recall that the colors of M&M’s supposedly follow this distribution:

\[ \begin{array}{cccccc} Yellow & Red & Orange & Brown & Green & Blue \\ 0.14 & 0.13 & 0.20 & 0.12 & 0.20 & 0.21 \end{array} \]

Imagine you bought 10,000 M&M’s and got the following color counts:

\[ \begin{array}{cccccc} Yellow & Red & Orange & Brown & Green & Blue \\ 1357 & 1321 & 1946 & 1182 & 2052 & 2142 \end{array} \]

Does your sample appear to follow the known color distribution? Perform the appropriate \(\chi^2\) test at the \(\alpha = .05\) level and interpret.

The data set fosdata::bechdel has information on budget and earnings for many popular movies.

  • Is the budget data consistent with Benford’s Law?
  • Is the intgross data consistent with Benford’s Law?
  • Is the domgross data consistent with Benford’s Law? (Hint: one movie had no domestic gross. Bonus: which one was it?)

The United States Census Bureau produces estimates of population for all cities and towns in the U.S. On the census website http://www.census.gov , find population estimates for all incorporated places (cities and towns) for any one state. Import that data into R. Do the values for city and town population numbers follow Benford’s Law? Report your results with a plot and a \(p\) -value as in Example 10.7 .

Exercises 10.13 – 10.17 require material through Section 10.4 .

Did the goals scored by each team in each game of the 2014 FIFA Men’s World Cup soccer final follow a Poisson distribution? Perform a \(\chi^2\) goodness of fit test at the \(\alpha = 0.05\) level, binning values 4 and above. Data is in fosdata::world_cup .

Consider the austen data set in the fosdata package. In this exercise, we are testing to see whether the number of times that words are repeated after their first occurrence is Poisson. Restrict to the first chapter of Pride and Prejudice , and count the number of times that each word is repeated, and see that we obtain the following table:

Use a \(\chi^2\) goodness of fit test with \(\alpha = .05\) to test whether the distribution of repetitions of words is consistent with a Poisson distribution.

Powerball is a lottery game in which players try to guess the numbers on six balls drawn randomly. The first five are white balls and the sixth is a special red ball called the powerball. The results of all Powerball drawings from February 2010 to July 2020 are available in fosdata::powerball .

  • Plot the numbers drawn over time. Use color to distinguish the six balls. What do you observe? You will need pivot_longer to tidy the data.
  • Use a \(\chi^2\) test of uniformity to check if all numbers ever drawn fit a uniform distribution.
  • Restrict to draws after October 4, 2015, and only consider the white balls drawn, Ball1 - Ball5 . Do they fit a uniform distribution?
  • Restrict to draws after October 4, 2015, and only consider Ball1. Check that it is not uniform. Explain why not.

In this exercise, we explore doing \(\chi^2\) goodness of fit tests for continuous variables. Consider the hdl variable in the adipose data set in fosdata . We wish to test whether the data is normal using a \(\chi^2\) goodness of fit test and 7 bins.

  • Estimate the mean \(\mu\) and the standard deviation \(\sigma\) of the HDL.
  • Use qnorm(seq(0, 1, length.out = 8), mu, sigma) to create the dividing points ( breaks ) between 7 equally likely regions. The first region is \((-\infty, 0.8988)\) .
  • Use table(cut(aa, breaks = breaks)) to obtain the observed distribution of values in bins. The expected number in each bin is the number of data points over 7, since each bin is equally likely.
  • Compute the \(\chi^2\) test statistic as the difference between observed and expected squared, divided by the expected.
  • Compute the probability of getting this test-statistic or larger using pchisq . The degrees of freedom is the number of bins minus 3, one because the sum has to be 71 and the other because you are estimating two parameters from the data.
  • Is there evidence to conclude that HDL is not normally distributed?

Consider the fosdata::normtemp data set. Use a goodness of fit test with 10 bins, all with equal probabilities, to test the normality of the temperature data set. Note that in this case, you will need to estimate two parameters, so the degrees of freedom will need to be adjusted appropriately.

Exercises 10.18 – 10.28 require material through Section 10.5 .

Clark and Westerberg 90 investigated whether people can learn to toss heads more often than tails. The participants were told to start with a heads up coin, toss the coin from the same height every time, and catch it at the same height, while trying to get the number of revolutions to work out so as to obtain heads. After training, the first participant got 162 heads and 138 tails.

  • Find a 95% confidence interval for \(p\) , the proportion of times this participant will get heads.
  • Clearly state the null and alternative hypotheses, defining any parameters.
  • Is there sufficient evidence to reject the null hypothesis at the \(\alpha = .01\) level based on this experiment? What is the \(p\) -value?
  • The second participant got 175 heads and 125 tails. Is there sufficient evidence to conclude that the probability of getting heads is different for the two participants at the \(\alpha = .05\) level?

Left digit bias is when people attribute a difference to two numbers based on the first digit of the number, when there is not really a large difference between the numbers. In an article 91 , researchers studied left digit bias in the context of treatment choices for patients who were just over or just under 80 years old.

Researchers found that 265 of 5036 patients admitted with acute myocardial infarction who were admitted in the two weeks after their 80th birthday underwent Coronary-Artery Bypass Graft (CABG) surgery, while 308 out of 4426 patients with the same diagnosis admitted in the two weeks before their 80th birthday underwent CABG. There is no recommendation in clinical guidelines to reduce CABG use at the age of 80. Is there a statistically significant difference in the percentage of patients receiving CABG in the two groups?

Exercises 10.20 and 10.21 consider the psychology of randomness, as studied in Bar-Hillel et al. 92

The researchers considered whether people are good at creating random sequences of heads and tails in a unique way. The researchers recruited 175 people and asked them to create a random sequence of 10 heads and tails, though the researchers were only interested in the first guess. Of the 175 people, 143 predicted heads on the first toss. Let \(p\) be the probability that a randomly selected person will predict heads on the first toss. Perform a hypothesis test of \(p = 0.5\) versus \(p \not= 0.5\) at the \(\alpha = 0.05\) level.

The researchers also considered whether the linguistic convention of naming heads before tails impacts participants’ choice for their first imaginary coin toss. The authors recruited 54 people and told them to create a sample of size 10 by entering H for heads and T for tails. They recruited 51 people and told them to create a sample of size 10 by entering T for tails and H for heads. A total of 47 of the 54 people in Group 1 chose heads first, while 16 of the 51 people in Group 2 chose heads first. Perform a hypothesis test of \(p_1 = p_2\) versus \(p_1 \not= p_2\) at the \(\alpha = .05\) level, where \(p_i\) is the percentage of heads that people given instructions in Group \(i\) would create as their first guess.

If someone offered you either one really great marble and three mediocre ones, or four mediocre marbles, which would you choose?

Third-grade children in Rijen, the Netherlands, were split into two groups. 93 In group 1, 43 out of 48 children preferred a blue and white striped marble to a solid red marble. In group 2, 12 out of 44 children preferred four solid red marbles to three solid red marbles and one blue and white striped marble. Let \(p_1\) be the proportion of children who would prefer a blue and white marble to a red marble, and let \(p_2\) be the proportion of children who would prefer three red marbles and one blue and white striped marble to four red marbles. Perform a hypothesis test of \(p_1 = p_2\) versus \(p_1 \not= p_2\) at the \(\alpha = .05\) level.

A 2017 study 94 considered the care of patients with burns. A patient who stayed in the hospital for seven or more days past the last surgery for a burn is considered an extended postoperative stay. The researchers examined records and found that for patients with scalds, 30 did not have extended stays while 16 did have extended stays. For patients with flame burns, 51 did not have extended stays while 78 did have extended stays. Test whether the proportion of extended stays is the same for scald patients as for flame burn patients at the \(\alpha = .05\) level.

Ronald Reagan became president of the United States in 1980. The babynames::babynames data set contains information on babies named “Reagan” born in the United States. Is there a statistically significant difference in the percentage of babies (of either sex) named “Reagan” in the United States in 1982 and in 1978? If so, which direction was the change?

Consider the dogs data set in the fosdata package. For dogs in trial 1 that were shown a single dog going around the wall in the “wrong” direction three times, is there a statistically significant difference in the proportion that stay and the proportion that switch depending on their start direction?

Consider the sharks data set in the fosdata package. Participants were assigned to listen to either silence, ominous music, or uplifting music while watching a video about sharks. They then ranked sharks on various scales.

  • Create a cross table of the type of music listened to and the response to dangerous ; “how well does dangerous describe sharks.”
  • Perform a \(\chi^2\) test of homogeneity to test whether the ranking of how well “dangerous” describes sharks has the same distribution across the type of music heard.

Police sergeants in the Boston Police Department take an exam for promotion to lieutenant. In 2008, 91 sergeants took the lieutenant promotion test. Of them, 65 were white and 26 were Black or Hispanic. 95 The passing rate for white officers was 94%, while the passing rate for minorities was 69%. Was there a significant difference in the passing rates for whites and for minority test takers?

Bicycle signage. (Image credit: Hess and Peterson.)

Figure 10.6: Bicycle signage. (Image credit: Hess and Peterson.)

Hess and Peterson 96 studied whether bicycle signage can affect an automobile driver’s perception of bicycle rights and safety. Load the fosdata::bicycle_signage data, and see the help page for descriptions of the variables.

  • Create a contingency table of the variables bike_move_right2 and treatment .
  • Calculate the proportion of participants who agreed and disagreed for each type of sign treatment. Which sign was most likely to lead participants to disagree?
  • Perform a \(\chi^2\) test of independence on the variables bike_move_right2 and treatment at the \(\alpha = .05\) level. Interpret your answer.

Exercise 10.29 requires material through Section 10.6 .

In Example 10.15 , we stated that the number of possible ways to fill a \(3 \times 1\) table with non-negative integers that sum to 1107 is 614,386. Explain why this is the case. (Hint: if you know the first two values, then the third one is determined.)

Raittio et al., “Two Casting Methods Compared in Patients with C olles’ Fracture.” ↩︎

A J Cain and P M Sheppard, “Selection in the Polymorphic Land Snail Cep æ a Nemoralis,” Heredity 4, no. 3 (1950): 275–94. ↩︎

The R function proportions is new to R 4.0.1 and is recommended as a drop-in replacement for the unfortunately named prop.table . ↩︎

M Papadatou-Pastou et al., “Human Handedness: A Meta-Analysis.” Psychological Bulletin 146, no. 6 (2020): 481–524, https://doi.org/10.1037/bul0000229 . ↩︎

John R Doyle, Paul A Bottomley, and Rob Angell, “Tails of the Travelling Gaussian Model and the Relative Age Effect: Tales of Age Discrimination and Wasted Talent,” PLOS One 12, no. 4 (April 2017): 1–22, https://doi.org/10.1371/journal.pone.0176206 . ↩︎

Markus Germar et al., “Dogs (Canis Familiaris) Stick to What They Have Learned Rather Than Conform to Their Conspecifics’ Behavior,” PLOS One 13, no. 3 (March 2018): 1–16, https://doi.org/10.1371/journal.pone.0194808 . ↩︎

Andrew P Nosal et al., “The Effect of Background Music in Shark Documentaries on Viewers’ Perceptions of Sharks.” PLOS One 11, no. 8 (2016): e0159279, https://doi.org/10.1371/journal.pone.0159279 . ↩︎

“Bella” was the name of the character played by Kristen Stewart in the movie Twilight , released in 2008. Fun fact, one of the authors has a family member who appeared in The Twilight Saga: Breaking Dawn - Part 2 . ↩︎

Kate Kahle, Aviv J Sharon, and Ayelet Baram-Tsabari, “Footprints of Fascination: Digital Traces of Public Engagement with Particle Physics on CERN’s Social Media Platforms.” PLOS One 11, no. 5 (2016): e0156409. ↩︎

Shaq is reported to have said, “Me shooting 40 percent at the foul line is just God’s way of saying that nobody’s perfect. If I shot 90 percent from the line, it just wouldn’t be right.” ↩︎

Persi Diaconis, Susan Holmes, and Richard Montgomery, “Dynamical Bias in the Coin Toss,” SIAM Review 49, no. 2 (2007): 211–35. ↩︎

Priscilla Ku and Janet Larwood, “40,000 Coin Tosses Yield Ambiguous Evidence for Dynamical Bias,” 2009, https://www.stat.berkeley.edu/~aldous/Real-World/coin_tosses.html . ↩︎

Matthew P A Clark and Brian D Westerberg, “Holiday Review. How Random Is the Toss of a Coin?” Canadian Medical Association Journal 181, no. 12 (December 2009): E306–8. ↩︎

Andrew R Olenski et al., “Behavioral Heuristics in Coronary-Artery Bypass Graft Surgery.” N Engl J Med 382, no. 8 (February 2020): 778–79. ↩︎

M Bar-Hillel, E Peer, and A Acquisti, “ ‘Heads or Tails?’ – a Reachability Bias in Binary Choice,” Journal of Experimental Psychology: Learning, Memory, and Cognition 40, no. 6 (2014): 1656--1663, https://doi.org/10.1037/xlm0000005 . ↩︎

Ellen R K Evers, Yoel Inbar, and Marcel Zeelenberg, “Set-Fit Effects in Choice.” J Exp Psychol Gen 143, no. 2 (April 2014): 504–9. ↩︎

Islam Abdelrahman et al., “Division of Overall Duration of Stay into Operative Stay and Postoperative Stay Improves the Overall Estimate as a Measure of Quality of Outcome in Burn Care,” PLOS One 12, no. 3 (March 2017): e0174579–79. ↩︎

Zack Huffman, “Boston Police Promotion Exam Deemed Biased” (Courthouse News Service, November 18, 2015), https://www.courthousenews.com/boston-police-promotion-exam-deemed-biased/ . ↩︎

George Hess and M Nils Peterson, “"Bicycles May Use Full Lane" Signage Communicates U.S. Roadway Rules and Increases Perception of Safety,” PLOS One 10, no. 8 (August 2015): e0136973. ↩︎

Data Presentation

Josée Dupuis, PhD, Professor of Biostatistics, Boston University School of Public Health

Wayne LaMorte, MD, PhD, MPH, Professor of Epidemiology, Boston University School of Public Health

Introduction

While graphical summaries of data can certainly be powerful ways of communicating results clearly and unambiguously in a way that facilitates our ability to think about the information, poorly designed graphical displays can be ambiguous, confusing, and downright misleading. The keys to excellence in graphical design and communication are much like the keys to good writing. Adhere to fundamental principles of style and communicate as logically, accurately, and clearly as possible. Excellence in writing is generally achieved by avoiding unnecessary words and paragraphs; it is efficient. In a similar fashion, excellence in graphical presentation is generally achieved by efficient designs that avoid unnecessary ink.

Excellence in graphical presentation depends on:

  • Choosing the best medium for presenting the information
  • Designing the components of the graph in a way that communicates the information as clearly and accurately as possible.

Table or Graph?

  • Tables are generally best if you want to be able to look up specific information or if the values must be reported precisely.
  • Graphics are best for illustrating trends and making comparisons

The side by side illustrations below show the same information, first in table form and then in graphical form. While the information in the table is precise, the real goal is to compare a series of clinical outcomes in subjects taking either a drug or a placebo. The graphical presentation on the right makes it possible to quickly see that for each of the outcomes evaluated, the drug produced relief in a great proportion of subjects. Moreover, the viewer gets a clear sense of the magnitude of improvement, and the error bars provided a sense of the uncertainty in the data.

Principles for Table Display

  • Sort table rows in a meaningful way
  • Avoid alphabetical listing!
  • Use rates, proportions or ratios in addition (or instead of) totals
  • Show more than two time points if available
  • Multiple time points may be better presented in a Figure
  • Similar data should go down columns
  • Highlight important comparisons
  • Show the source of the data

Consider the data in the table below from http://www.cancer.gov/cancertopics/types/commoncancers

Our ability to quickly understand the relative frequency of these cancers is hampered by presenting them in alphabetical order. It is much easier for the reader to grasp the relative frequency by listing them from most frequent to least frequent as in the next table.

However, the same information might be presented more effectively with a dot plot, as shown below.

define tabular data presentation

Data from http://www.cancer.gov/cancertopics/types/commoncancers

Principles of Graphical Excellence from E.R. Tufte

Pattern perception.

Pattern perception is done by

  • Detection: recognition of geometry encoding physical values
  • Assembly: grouping of detected symbol elements; discerning overall patterns in data
  • Estimation: assessment of relative magnitudes of two physical values

Geographic Variation in Cancer

As an example, Tufte offers a series of maps that summarize the age-adjusted mortality rates for various types of cancer in the 3,056 counties in the United States. The maps showing the geographic variation in stomach cancer are shown below.

These maps summarize an enormous amount of information and present it efficiently, coherently, and effectively.in a way that invites the viewer to make comparisons and to think about the substance of the findings. Consider, for example, that the region to the west of the Great Lakes was settled largely by immigrants from Germany and Scand anavia, where traditional methods of preserving food included pickling and curing of fish by smoking. Could these methods be associated with an increased risk of stomach cancer?

John Snow's Spot Map of Cholera Cases

Consider also the spot map that John Snow presented after the cholera outbreak in the Broad Street section of London in September 1854. Snow ascertained the place of residence or work of the victims and represented them on a map of the area using a small black disk to represent each victim and stacking them when more than one occurred at a particular location. Snow reasoned that cholera was probably caused by something that was ingested, because of the intense diarrhea and vomiting of the victims, and he noted that the vast majority of cholera deaths occurred in people who lived or worked in the immediate vicinity of the broad street pump (shown with a red dot that we added for clarity). He further ascertained that most of the victims drank water from the Broad Street pump, and it was this evidence that persuaded the authorities to remove the handle from the pump in order to prevent more deaths.

Map of the Broad Street area of London showing stacks of black disks to represent the number of cholera cases that occurred at various locations. The cases seem to be clustered around the Broad Street water pump.

Humans can readily perceive differences like this when presented effectively as in the two previous examples. However, humans are not good at estimating differences without directly seeing them (especially for steep curves), and we are particularly bad at perceiving relative angles (the principal perception task used in a pie chart).

The use of pie charts is generally discouraged. Consider the pie chart on the left below. It is difficult to accurately assess the relative size of the components in the pie chart, because the human eye has difficulty judging angles. The dot plot on the right shows the same data, but it is much easier to quickly assess the relative size of the components and how they changed from Fiscal Year 2000 to Fiscal Year 2007.

Consider the information in the two pie charts below (showing the same information).The 3-dimensional pie chart on the left distorts the relative proportions. In contrast the 2-dimensional pie chart on the right makes it much easier to compare the relative size of the varies components..

More Principles of Graphical Excellence

Exclude unneeded dimensions.

These 3-dimensional techniques distort the data and actually interfere with our ability to make accurate comparisons. The distortion caused by 3-dimensional elements can be particularly severe when the graphic is slanted at an angle or when the viewer tends to compare ends up unwittingly comparing the areas of the ink rather than the heights of the bars.

It is much easier to make comparisons with a chart like the one below.

define tabular data presentation

Source: Huang, C, Guo C, Nichols C, Chen S, Martorell R. Elevated levels of protein in urine in adulthood after exposure to

the Chinese famine of 1959–61 during gestation and the early postnatal period. Int. J. Epidemiol. (2014) 43 (6): 1806-1814 .

Omit "Chart Junk"

Consider these two examples.

Here is a simple enumeration of the number of pets in a neighborhood. There is absolutely no reason to connect these counts with lines. This is, in fact, confusing and inappropriate and nothing more than "chart junk."

define tabular data presentation

Source: http://www.go-education.com/free-graph-maker.html

Moiré Vibration

Moiré effects are sometimes used in modern art to produce the appearance of vibration and movement. However, when these effects are applied to statistical presentations, they are distracting and add clutter because the visual noise interferes with the interpretation of the data.

Tufte presents the example shown below from Instituto de Expansao Commercial, Brasil, Graphicos Estatisticas (Rio de Janeiro, 1929, p. 15).

 While the intention is to present quantitative information about the textile industry, the moiré effects do not add anything, and they are distracting, if not visually annoying.

Present Data to Facilitate Comparisons

Here is an attempt to compare catches of cod fish and crab across regions and to relate the variation to changes in water temperature. The problem here is that the Y-axes are vastly different, making it hard to sort out what's really going on. Even the Y-axes for temperature are vastly different.

define tabular data presentation

http://seananderson.ca/courses/11-multipanel/multipanel.pdf1

The ability to make comparisons is greatly facilitated by using the same scales for axes, as illustrated below.

define tabular data presentation

Data source: Dawber TR, Meadors GF, Moore FE Jr. Epidemiological approaches to heart disease:

the Framingham Study. Am J Public Health Nations Health. 1951;41(3):279-81. PMID: 14819398

It is also important to avoid distorting the X-axis. Note in the example below that the space between 0.05 to 0.1 is the same as space between 0.1 and 0.2.

define tabular data presentation

Source: Park JH, Gail MH, Weinberg CR, et al. Distribution of allele frequencies and effect sizes and

their interrelationships for common genetic susceptibility variants. Proc Natl Acad Sci U S A. 2011; 108:18026-31.

Consider the range of the Y-axis. In the examples below there is no relevant information below $40,000, so it is not necessary to begin the Y-axis at 0. The graph on the right makes more sense.

Also, consider using a log scale. this can be particularly useful when presenting ratios as in the example below.

define tabular data presentation

Source: Broman KW, Murray JC, Sheffield VC, White RL, Weber JL (1998) Comprehensive human genetic maps:

Individual and sex-specific variation in recombination. American Journal of Human Genetics 63:861-869, Figure 1

We noted earlier that pie charts make it difficult to see differences within a single pie chart, but this is particularly difficult when data is presented with multiple pie charts, as in the example below.

define tabular data presentation

Source: Bell ML, et al. (2007) Spatial and temporal variation in PM2.5 chemical composition in the United States

for health effects studies. Environmental Health Perspectives 115:989-995, Figure 3

When multiple comparisons are being made, it is essential to use colors and symbols in a consistent way, as in this example.

define tabular data presentation

Source: Manning AK, LaValley M, Liu CT, et al.  Meta-Analysis of Gene-Environment Interaction:

Joint Estimation of SNP and SNP x Environment Regression Coefficients.  Genet Epidemiol 2011, 35(1):11-8.

Avoid putting too many lines on the same chart. In the example below, the only thing that is readily apparent is that 1980 was a very hot summer.

define tabular data presentation

Data from National Weather Service Weather Forecast Office at

http://www.srh.noaa.gov/tsa/?n=climo_tulyeartemp

Make Efficient Use of Space

Reduce the ratio of ink to information.

This isn't efficient, because this graphic is totally uninformative.

define tabular data presentation

Source: Mykland P, Tierney L, Yu B (1995) Regeneration in Markov chain samplers.  Journal of the American Statistical Association 90:233-241, Figure 1

Bar graphs add ink without conveying any additional information, and they are distracting. The graph below on the left inappropriately uses bars which clutter the graph without adding anything. The graph on the right displays the same data, by does so more clearly and with less clutter.

Multiple Types of Information on the Same Figure

Choosing the best graph type, bar charts, error bars and dot plots.

As noted previously, bar charts can be problematic. Here is another one presenting means and error bars, but the error bars are misleading because they only extend in one direction. A better alternative would have been to to use full error bars with a scatter plot, as illustrated previously (right).

Consider the four graphs below presenting the incidence of cancer by type. The upper left graph unnecessary uses bars, which take up a lot of ink. This layout also ends up making the fonts for the types of cancer too small. Small font is also a problem for the dot plot at the upper right, and this one also has unnecessary grid lines across the entire width.

The graph at the lower left has more readable labels and uses a simple dot plot, but the rank order is difficult to figure out.

The graph at the lower right is clearly the best, since the labels are readable, the magnitude of incidence is shown clearly by the dot plots, and the cancers are sorted by frequency.

Single Continuous Numeric Variable

In this situation a cumulative distribution function conveys the most information and requires no grouping of the variable. A box plot will show selected quantiles effectively, and box plots are especially useful when stratifying by multiple categories of another variable.

Histograms are also possible. Consider the examples below.

Two Variables

 The two graphs below summarize BMI (Body Mass Index) measurements in four categories, i.e., younger and older men and women. The graph on the left shows the means and 95% confidence interval for the mean in each of the four groups. This is easy to interpret, but the viewer cannot see that the data is actually quite skewed. The graph on the right shows the same information presented as a box plot. With this presentation method one gets a better understanding of the skewed distribution and how the groups compare.

The next example is a scatter plot with a superimposed smoothed line of prediction. The shaded region embracing the blue line is a representation of the 95% confidence limits for the estimated prediction. This was created using "ggplot" in the R programming language.

define tabular data presentation

Source: Frank E. Harrell Jr. on graphics:  http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatGraphCourse/graphscourse.pdf (page 121)

Multivariate Data

The example below shows the use of multiple panels.

define tabular data presentation

Source: Cleveland S. The Elements of Graphing Data. Hobart Press, Summit, NJ, 1994.

Displaying Uncertainty

  • Error bars showing confidence limits
  • Confidence bands drawn using two lines
  • Shaded confidence bands
  • Bayesian credible intervals
  • Bayesian posterior densities

Confidence Limits

Shaded Confidence Bands

define tabular data presentation

Source: Frank E. Harrell Jr. on graphics:  http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatGraphCourse/graphscourse.pdf

define tabular data presentation

Source: Tweedie RL and Mengersen KL. (1992) Br. J. Cancer 66: 700-705

Forest Plot

This is a Forest plot summarizing 26 studies of cigarette smoke exposure on risk of lung cancer. The sizes of the black boxes indicating the estimated odds ratio are proportional to the sample size in each study.

define tabular data presentation

Data from Tweedie RL and Mengersen KL. (1992) Br. J. Cancer 66: 700-705

Summary Recommendations

  • In general, avoid bar plots
  • Avoid chart junk and the use of too much ink relative to the information you are displaying. Keep it simple and clear.
  • Avoid pie charts, because humans have difficulty perceiving relative angles.
  • Pay attention to scale, and make scales consistent.
  • Explore several ways to display the data!

12 Tips on How to Display Data Badly

Adapted from Wainer H.  How to Display Data Badly.  The American Statistician 1984; 38: 137-147. 

  • Show as few data as possible
  • Hide what data you do show; minimize the data-ink ratio
  • Ignore the visual metaphor altogether
  • Only order matters
  • Graph data out of context
  • Change scales in mid-axis
  • Emphasize the trivial;  ignore the important
  • Jiggle the baseline
  • Alphabetize everything.
  • Make your labels illegible, incomplete, incorrect, and ambiguous.
  • More is murkier: use a lot of decimal places and make your graphs three dimensional whenever possible.
  • If it has been done well in the past, think of another way to do it

Additional Resources

  • Stephen Few: Designing Effective Tables and Graphs. http://www.perceptualedge.com/images/Effective_Chart_Design.pdf
  • Gary Klaas: Presenting Data: Tabular and graphic display of social indicators. Illinois State University, 2002. http://lilt.ilstu.edu/gmklass/pos138/datadisplay/sections/goodcharts.htm (Note: The web site will be discontinued to be replaced by the Just Plain Data Analysis site).

Talk to our experts

1800-120-456-456

  • Presentation of Data

ffImage

Data Presenting for Clearer Reference

Imagine the statistical data without a definite presentation, will be burdensome! Data presentation is one of the important aspects of Statistics. Presenting the data helps the users to study and explain the statistics thoroughly. We are going to discuss this presentation of data and know-how information is laid down methodically. 

In this context, we are going to present the topic - Presentation of Data which is to be referred to by the students and the same is to be studied in regard to the types of presentations of data. 

Presentation of Data and Information

Statistics is all about data. Presenting data effectively and efficiently is an art. You may have uncovered many truths that are complex and need long explanations while writing. This is where the importance of the presentation of data comes in. You have to present your findings in such a way that the readers can go through them quickly and understand each and every point that you wanted to showcase. As time progressed and new and complex research started happening, people realized the importance of the presentation of data to make sense of the findings.

Define Data Presentation

Data presentation is defined as the process of using various graphical formats to visually represent the relationship between two or more data sets so that an informed decision can be made based on them.

Types of Data Presentation

Broadly speaking, there are three methods of data presentation:

Diagrammatic

Textual Ways of Presenting Data

Out of the different methods of data presentation, this is the simplest one. You just write your findings in a coherent manner and your job is done. The demerit of this method is that one has to read the whole text to get a clear picture. Yes, the introduction, summary, and conclusion can help condense the information.

Tabular Ways of Data Presentation and Analysis

To avoid the complexities involved in the textual way of data presentation, people use tables and charts to present data. In this method, data is presented in rows and columns - just like you see in a cricket match showing who made how many runs. Each row and column have an attribute (name, year, sex, age, and other things like these). It is against these attributes that data is written within a cell.

Diagrammatic Presentation: Graphical Presentation of Data in Statistics

This kind of data presentation and analysis method says a lot with dramatically short amounts of time.

Diagrammatic Presentation has been divided into further categories:

Geometric Diagram

When a Diagrammatic presentation involves shapes like a bar or circle, we call that a Geometric Diagram. Examples of Geometric Diagram

Bar Diagram

Simple Bar Diagram

Simple Bar Diagram is composed of rectangular bars. All of these bars have the same width and are placed at an equal distance from each other. The bars are placed on the X-axis. The height or length of the bars is used as the means of measurement. So, on the Y-axis, you have the measurement relevant to the data. 

Suppose, you want to present the run scored by each batsman in a game in the form of a bar chart. Mark the runs on the Y-axis - in ascending order from the bottom. So, the lowest scorer will be represented in the form of the smallest bar and the highest scorer in the form of the longest bar.

Multiple Bar Diagram

(Image will be uploaded soon)

In many states of India, electric bills have bar diagrams showing the consumption in the last 5 months. Along with these bars, they also have bars that show the consumption that happened in the same months of the previous year. This kind of Bar Diagram is called Multiple Bar Diagrams.

Component Bar Diagram

(image will be uploaded soon)

Sometimes, a bar is divided into two or more parts. For example, if there is a Bar Diagram, the bars of which show the percentage of male voters who voted and who didn’t and the female voters who voted and who didn’t. Instead of creating separate bars for who did and who did not, you can divide one bar into who did and who did not.

A pie chart is a chart where you divide a pie (a circle) into different parts based on the data. Each of the data is first transformed into a percentage and then that percentage figure is multiplied by 3.6 degrees. The result that you get is the angular degree of that corresponding data to be drawn in the pie chart. So, for example, you get 30 degrees as the result, on the pie chart you draw that angle from the center.

Frequency Diagram

Suppose you want to present data that shows how many students have 1 to 2 pens, how many have 3 to 5 pens, how many have 6 to 10 pens (grouped frequency) you do that with the help of a Frequency Diagram. A Frequency Diagram can be of many kinds:

Where the grouped frequency of pens (from the above example) is written on the X-axis and the numbers of students are marked on the Y-axis. The data is presented in the form of bars.

Frequency Polygon

When you join the midpoints of the upper side of the rectangles in a histogram, you get a Frequency Polygon

Frequency Curve

When you draw a freehand line that passes through the points of the Frequency Polygon, you get a Frequency Curve.

Ogive 

Suppose 2 students got 0-20 marks in maths, 5 students got 20-30 marks and 4 students got 30-50 marks in Maths. So how many students got less than 50 marks? Yes, 5+2=7. And how many students got more than 20 marks? 5+4=9. This type of more than and less than data are represented in the form of the ogive. The meeting point of the less than and more than line will give you the Median.

Arithmetic Line Graph

If you want to see the trend of Corona infection vs the number of recoveries from January 2020 to December 2020, you can do that in the form of an Arithmetic Line Graph. The months should be marked on the X-axis and the number of infections and recoveries are marked on the Y-axis. You can compare if the recovery is greater than the infection and if the recovery and infection are going at the same rate or not with the help of this Diagram.

Did You Know?

Sir Ronald Aylmer Fisher is known as the father of modern statistics.

arrow-right

FAQs on Presentation of Data

1. What are the 4 types of Tabular Presentation?

The tabular presentation method can be further divided into 4 categories:

Qualitative

Quantitative

Qualitative classification is done when the attributes in the table are some kind of ‘quality’ or feature. Suppose you want to make a table where you would show how many batsmen made half-centuries and how many batsmen made centuries in IPL 2020. Notice that the data would have only numbers - no age, sex, height is needed. This type of tabulation is called quantitative tabulation.

If you want to make a table that would inform which year’s world cup, which team won. The classifying variable, here, is year or time. This kind of classification is called Temporal classification.

If you want to list the top 5 coldest places in the world. The classifying variable here would be a place in each case. This kind of classification is called Spatial Classification.

2. Are bar charts and histograms the Same?

No, they are not the same. With a histogram, you measure the frequency of quantitative data. With bar charts, you compare categorical data.

3. What is the definition of Data Presentation?

When research work is completed, the data gathered from it can be quite large and complex. Organizing the data in a coherent, easy-to-understand, quick to read and graphical way is called data presentation.

We use essential cookies to make Venngage work. By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts.

Manage Cookies

Cookies and similar technologies collect certain information about how you’re using our website. Some of them are essential, and without them you wouldn’t be able to use Venngage. But others are optional, and you get to choose whether we use them or not.

Strictly Necessary Cookies

These cookies are always on, as they’re essential for making Venngage work, and making it safe. Without these cookies, services you’ve asked for can’t be provided.

Show cookie providers

  • Google Login

Functionality Cookies

These cookies help us provide enhanced functionality and personalisation, and remember your settings. They may be set by us or by third party providers.

Performance Cookies

These cookies help us analyze how many people are using Venngage, where they come from and how they're using it. If you opt out of these cookies, we can’t get feedback to make Venngage better for you and all our users.

  • Google Analytics

Targeting Cookies

These cookies are set by our advertising partners to track your activity and show you relevant Venngage ads on other sites as you browse the internet.

  • Google Tag Manager
  • Infographics
  • Daily Infographics
  • Template Lists
  • Graphic Design
  • Graphs and Charts
  • Data Visualization
  • Human Resources
  • Beginner Guides

Blog Data Visualization

10 Data Presentation Examples For Strategic Communication

By Krystle Wong , Sep 28, 2023

Data Presentation Examples

Knowing how to present data is like having a superpower. 

Data presentation today is no longer just about numbers on a screen; it’s storytelling with a purpose. It’s about captivating your audience, making complex stuff look simple and inspiring action. 

To help turn your data into stories that stick, influence decisions and make an impact, check out Venngage’s free chart maker or follow me on a tour into the world of data storytelling along with data presentation templates that work across different fields, from business boardrooms to the classroom and beyond. Keep scrolling to learn more! 

Click to jump ahead:

10 Essential data presentation examples + methods you should know

What should be included in a data presentation, what are some common mistakes to avoid when presenting data, faqs on data presentation examples, transform your message with impactful data storytelling.

Data presentation is a vital skill in today’s information-driven world. Whether you’re in business, academia, or simply want to convey information effectively, knowing the different ways of presenting data is crucial. For impactful data storytelling, consider these essential data presentation methods:

1. Bar graph

Ideal for comparing data across categories or showing trends over time.

Bar graphs, also known as bar charts are workhorses of data presentation. They’re like the Swiss Army knives of visualization methods because they can be used to compare data in different categories or display data changes over time. 

In a bar chart, categories are displayed on the x-axis and the corresponding values are represented by the height of the bars on the y-axis. 

define tabular data presentation

It’s a straightforward and effective way to showcase raw data, making it a staple in business reports, academic presentations and beyond.

Make sure your bar charts are concise with easy-to-read labels. Whether your bars go up or sideways, keep it simple by not overloading with too many categories.

define tabular data presentation

2. Line graph

Great for displaying trends and variations in data points over time or continuous variables.

Line charts or line graphs are your go-to when you want to visualize trends and variations in data sets over time.

One of the best quantitative data presentation examples, they work exceptionally well for showing continuous data, such as sales projections over the last couple of years or supply and demand fluctuations. 

define tabular data presentation

The x-axis represents time or a continuous variable and the y-axis represents the data values. By connecting the data points with lines, you can easily spot trends and fluctuations.

A tip when presenting data with line charts is to minimize the lines and not make it too crowded. Highlight the big changes, put on some labels and give it a catchy title.

define tabular data presentation

3. Pie chart

Useful for illustrating parts of a whole, such as percentages or proportions.

Pie charts are perfect for showing how a whole is divided into parts. They’re commonly used to represent percentages or proportions and are great for presenting survey results that involve demographic data. 

Each “slice” of the pie represents a portion of the whole and the size of each slice corresponds to its share of the total. 

define tabular data presentation

While pie charts are handy for illustrating simple distributions, they can become confusing when dealing with too many categories or when the differences in proportions are subtle.

Don’t get too carried away with slices — label those slices with percentages or values so people know what’s what and consider using a legend for more categories.

define tabular data presentation

4. Scatter plot

Effective for showing the relationship between two variables and identifying correlations.

Scatter plots are all about exploring relationships between two variables. They’re great for uncovering correlations, trends or patterns in data. 

In a scatter plot, every data point appears as a dot on the chart, with one variable marked on the horizontal x-axis and the other on the vertical y-axis.

define tabular data presentation

By examining the scatter of points, you can discern the nature of the relationship between the variables, whether it’s positive, negative or no correlation at all.

If you’re using scatter plots to reveal relationships between two variables, be sure to add trendlines or regression analysis when appropriate to clarify patterns. Label data points selectively or provide tooltips for detailed information.

define tabular data presentation

5. Histogram

Best for visualizing the distribution and frequency of a single variable.

Histograms are your choice when you want to understand the distribution and frequency of a single variable. 

They divide the data into “bins” or intervals and the height of each bar represents the frequency or count of data points falling into that interval. 

define tabular data presentation

Histograms are excellent for helping to identify trends in data distributions, such as peaks, gaps or skewness.

Here’s something to take note of — ensure that your histogram bins are appropriately sized to capture meaningful data patterns. Using clear axis labels and titles can also help explain the distribution of the data effectively.

define tabular data presentation

6. Stacked bar chart

Useful for showing how different components contribute to a whole over multiple categories.

Stacked bar charts are a handy choice when you want to illustrate how different components contribute to a whole across multiple categories. 

Each bar represents a category and the bars are divided into segments to show the contribution of various components within each category. 

define tabular data presentation

This method is ideal for highlighting both the individual and collective significance of each component, making it a valuable tool for comparative analysis.

Stacked bar charts are like data sandwiches—label each layer so people know what’s what. Keep the order logical and don’t forget the paintbrush for snazzy colors. Here’s a data analysis presentation example on writers’ productivity using stacked bar charts:

define tabular data presentation

7. Area chart

Similar to line charts but with the area below the lines filled, making them suitable for showing cumulative data.

Area charts are close cousins of line charts but come with a twist. 

Imagine plotting the sales of a product over several months. In an area chart, the space between the line and the x-axis is filled, providing a visual representation of the cumulative total. 

define tabular data presentation

This makes it easy to see how values stack up over time, making area charts a valuable tool for tracking trends in data.

For area charts, use them to visualize cumulative data and trends, but avoid overcrowding the chart. Add labels, especially at significant points and make sure the area under the lines is filled with a visually appealing color gradient.

define tabular data presentation

8. Tabular presentation

Presenting data in rows and columns, often used for precise data values and comparisons.

Tabular data presentation is all about clarity and precision. Think of it as presenting numerical data in a structured grid, with rows and columns clearly displaying individual data points. 

A table is invaluable for showcasing detailed data, facilitating comparisons and presenting numerical information that needs to be exact. They’re commonly used in reports, spreadsheets and academic papers.

define tabular data presentation

When presenting tabular data, organize it neatly with clear headers and appropriate column widths. Highlight important data points or patterns using shading or font formatting for better readability.

9. Textual data

Utilizing written or descriptive content to explain or complement data, such as annotations or explanatory text.

Textual data presentation may not involve charts or graphs, but it’s one of the most used qualitative data presentation examples. 

It involves using written content to provide context, explanations or annotations alongside data visuals. Think of it as the narrative that guides your audience through the data. 

Well-crafted textual data can make complex information more accessible and help your audience understand the significance of the numbers and visuals.

Textual data is your chance to tell a story. Break down complex information into bullet points or short paragraphs and use headings to guide the reader’s attention.

10. Pictogram

Using simple icons or images to represent data is especially useful for conveying information in a visually intuitive manner.

Pictograms are all about harnessing the power of images to convey data in an easy-to-understand way. 

Instead of using numbers or complex graphs, you use simple icons or images to represent data points. 

For instance, you could use a thumbs up emoji to illustrate customer satisfaction levels, where each face represents a different level of satisfaction. 

define tabular data presentation

Pictograms are great for conveying data visually, so choose symbols that are easy to interpret and relevant to the data. Use consistent scaling and a legend to explain the symbols’ meanings, ensuring clarity in your presentation.

define tabular data presentation

Looking for more data presentation ideas? Use the Venngage graph maker or browse through our gallery of chart templates to pick a template and get started! 

A comprehensive data presentation should include several key elements to effectively convey information and insights to your audience. Here’s a list of what should be included in a data presentation:

1. Title and objective

  • Begin with a clear and informative title that sets the context for your presentation.
  • State the primary objective or purpose of the presentation to provide a clear focus.

define tabular data presentation

2. Key data points

  • Present the most essential data points or findings that align with your objective.
  • Use charts, graphical presentations or visuals to illustrate these key points for better comprehension.

define tabular data presentation

3. Context and significance

  • Provide a brief overview of the context in which the data was collected and why it’s significant.
  • Explain how the data relates to the larger picture or the problem you’re addressing.

4. Key takeaways

  • Summarize the main insights or conclusions that can be drawn from the data.
  • Highlight the key takeaways that the audience should remember.

5. Visuals and charts

  • Use clear and appropriate visual aids to complement the data.
  • Ensure that visuals are easy to understand and support your narrative.

define tabular data presentation

6. Implications or actions

  • Discuss the practical implications of the data or any recommended actions.
  • If applicable, outline next steps or decisions that should be taken based on the data.

define tabular data presentation

7. Q&A and discussion

  • Allocate time for questions and open discussion to engage the audience.
  • Address queries and provide additional insights or context as needed.

Presenting data is a crucial skill in various professional fields, from business to academia and beyond. To ensure your data presentations hit the mark, here are some common mistakes that you should steer clear of:

Overloading with data

Presenting too much data at once can overwhelm your audience. Focus on the key points and relevant information to keep the presentation concise and focused. Here are some free data visualization tools you can use to convey data in an engaging and impactful way. 

Assuming everyone’s on the same page

It’s easy to assume that your audience understands as much about the topic as you do. But this can lead to either dumbing things down too much or diving into a bunch of jargon that leaves folks scratching their heads. Take a beat to figure out where your audience is coming from and tailor your presentation accordingly.

Misleading visuals

Using misleading visuals, such as distorted scales or inappropriate chart types can distort the data’s meaning. Pick the right data infographics and understandable charts to ensure that your visual representations accurately reflect the data.

Not providing context

Data without context is like a puzzle piece with no picture on it. Without proper context, data may be meaningless or misinterpreted. Explain the background, methodology and significance of the data.

Not citing sources properly

Neglecting to cite sources and provide citations for your data can erode its credibility. Always attribute data to its source and utilize reliable sources for your presentation.

Not telling a story

Avoid simply presenting numbers. If your presentation lacks a clear, engaging story that takes your audience on a journey from the beginning (setting the scene) through the middle (data analysis) to the end (the big insights and recommendations), you’re likely to lose their interest.

Infographics are great for storytelling because they mix cool visuals with short and sweet text to explain complicated stuff in a fun and easy way. Create one with Venngage’s free infographic maker to create a memorable story that your audience will remember.

Ignoring data quality

Presenting data without first checking its quality and accuracy can lead to misinformation. Validate and clean your data before presenting it.

Simplify your visuals

Fancy charts might look cool, but if they confuse people, what’s the point? Go for the simplest visual that gets your message across. Having a dilemma between presenting data with infographics v.s data design? This article on the difference between data design and infographics might help you out. 

Missing the emotional connection

Data isn’t just about numbers; it’s about people and real-life situations. Don’t forget to sprinkle in some human touch, whether it’s through relatable stories, examples or showing how the data impacts real lives.

Skipping the actionable insights

At the end of the day, your audience wants to know what they should do with all the data. If you don’t wrap up with clear, actionable insights or recommendations, you’re leaving them hanging. Always finish up with practical takeaways and the next steps.

Can you provide some data presentation examples for business reports?

Business reports often benefit from data presentation through bar charts showing sales trends over time, pie charts displaying market share,or tables presenting financial performance metrics like revenue and profit margins.

What are some creative data presentation examples for academic presentations?

Creative data presentation ideas for academic presentations include using statistical infographics to illustrate research findings and statistical data, incorporating storytelling techniques to engage the audience or utilizing heat maps to visualize data patterns.

What are the key considerations when choosing the right data presentation format?

When choosing a chart format , consider factors like data complexity, audience expertise and the message you want to convey. Options include charts (e.g., bar, line, pie), tables, heat maps, data visualization infographics and interactive dashboards.

Knowing the type of data visualization that best serves your data is just half the battle. Here are some best practices for data visualization to make sure that the final output is optimized. 

How can I choose the right data presentation method for my data?

To select the right data presentation method, start by defining your presentation’s purpose and audience. Then, match your data type (e.g., quantitative, qualitative) with suitable visualization techniques (e.g., histograms, word clouds) and choose an appropriate presentation format (e.g., slide deck, report, live demo).

For more presentation ideas , check out this guide on how to make a good presentation or use a presentation software to simplify the process.  

How can I make my data presentations more engaging and informative?

To enhance data presentations, use compelling narratives, relatable examples and fun data infographics that simplify complex data. Encourage audience interaction, offer actionable insights and incorporate storytelling elements to engage and inform effectively.

The opening of your presentation holds immense power in setting the stage for your audience. To design a presentation and convey your data in an engaging and informative, try out Venngage’s free presentation maker to pick the right presentation design for your audience and topic. 

What is the difference between data visualization and data presentation?

Data presentation typically involves conveying data reports and insights to an audience, often using visuals like charts and graphs. Data visualization , on the other hand, focuses on creating those visual representations of data to facilitate understanding and analysis. 

Now that you’ve learned a thing or two about how to use these methods of data presentation to tell a compelling data story , it’s time to take these strategies and make them your own. 

But here’s the deal: these aren’t just one-size-fits-all solutions. Remember that each example we’ve uncovered here is not a rigid template but a source of inspiration. It’s all about making your audience go, “Wow, I get it now!”

Think of your data presentations as your canvas – it’s where you paint your story, convey meaningful insights and make real change happen. 

So, go forth, present your data with confidence and purpose and watch as your strategic influence grows, one compelling presentation at a time.

404 Not found

COMMENTS

  1. Tabular Presentation of Data: Meaning, Objectives ...

    As a result of this, it is simple to remember the statistical facts. Cost-effective: Tabular presentation is a very cost-effective way to convey data. It saves time and space. Provides Reference: As the data provided in a tabular presentation can be used for other studies and research, it acts as a source of reference.

  2. Understanding Data Presentations (Guide + Examples)

    Step 1: Define Your Data Hierarchy. While presenting data on the budget allocation, start by outlining the hierarchical structure. The sequence will be like the overall budget at the top, followed by departments, projects within each department, and finally, individual cost categories for each project. Example:

  3. What is Tabular Data? (Definition & Example)

    In statistics, tabular data refers to data that is organized in a table with rows and columns. Within the table, the rows represent observations and the columns represent attributes for those observations. For example, the following table represents tabular data: This dataset has 9 rows and 5 columns. Each row represents one basketball player ...

  4. Data Presentation: A Comprehensive Guide

    Definition: Data presentation is the art of visualizing complex data for better understanding. Importance: Data presentations enhance clarity, engage the audience, aid decision-making, and leave a lasting impact. Types: Textual, Tabular, and Graphical presentations offer various ways to present data.

  5. Tabular Presentation of Data

    What is Tabular Presentation of Data? It is a table that helps to represent even a large amount of data in an engaging, easy to read, and coordinated manner. The data is arranged in rows and columns. This is one of the most popularly used forms of presentation of data as data tables are simple to prepare and read.

  6. Tabular Presentation of Data

    The objectives of tabular data presentation are as follows. The tabular data presentation helps in simplifying the complex data. It also helps to compare different data sets thereby bringing out the important aspects. The tabular presentation provides the foundation for statistical analysis. The tabular data presentation further helps in the ...

  7. PDF Tabular Display of Data

    Gary W. Oehlert. Tabular Display of Data. Or computer files. # Number of hawks responding to the "alarm" call # Variables are year (1999 or 2000), season (courtship, # nestling, fledgling), distance in meters between the # alarm call and the nest, number of hawks responding, # and number of. year season distance respond trials. 1 100 1 4.

  8. PDF Tabular and Graphical Presentation of Data

    TABULAR PRESENTATION OF DATA When to Use Tables • Written documents (reports, journal articles) typically present most results in tabular form. • Research Posters for conferences. • More concise format than graphs. • In oral presentations, only VERY simple tables should be presented.

  9. 7 Introduction to Tabular Data

    Many interesting data in computing are tabular — i.e., like a table— in form. First we'll see a few examples of them, before we try to identify what they have in common. First we'll see a few examples of them, before we try to identify what they have in common.

  10. Basic Statistics: Data and Its Tabular Representation

    This article offers tips to mine information from data efficiently using tabular representation. ... drawing interpretations and presentation. First thing in the definition is Data. It is a ...

  11. 7 thumb rules to optimize your tabular data presentation

    TL;DR: I'm going to show you some data-viz techniques for tabular data. Those techniques will help your audience focus more on the impact of your cells than the table definition itself — feel free to just to conclusions to have a quick bullet list. When to use a table. As tables are so risky you must choose carefully when to use them.

  12. What is Tabular Data? (Definition & Example)

    In statistics, tabular data refers to data that is organized in a table with rows and columns. Within the table, the rows represent observations and the columns represent attributes for those observations. For example, the following table represents tabular data: This dataset has 9 rows and 5 columns. Each row represents one basketball player ...

  13. Statistical data presentation

    In this article, the techniques of data and information presentation in textual, tabular, and graphical forms are introduced. Text is the principal method for explaining findings, outlining trends, and providing contextual information. A table is best suited for representing individual information and represents both quantitative and ...

  14. Textual And Tabular Presentation Of Data

    Data Tables or Tabular Presentation. A table facilitates representation of even large amounts of data in an attractive, easy to read and organized manner. The data is organized in rows and columns. This is one of the most widely used forms of presentation of data since data tables are easy to construct and read.

  15. Chapter 10 Tabular Data

    Chapter 10. Tabular Data. Tabular data is data on entities that has been aggregated in some way. A typical example would be to count the number of successes and failures in an experiment, and to report those aggregate numbers rather than the outcomes of the individual trials. Another way that tabular data arises is via binning, where we count ...

  16. PDF Graphical and Tabular

    Steps for constructing a histogram is as follows. Step 1: Partition the data range into General guidelines are: classes. or. bins. Use between 6 and 15 bins. One suggested formula (Sturges) is: Number of Classes = 1 + 3.3 log(n) where n is the total number of observations. All bins should have the same width.

  17. Data Presentation

    Encourage the eye to compare different pieces of data. Reveal the data at several levels of detail, from a broad overview to the fine structure. Serve a clear purpose: description, exploration, tabulation, or decoration. Be closely integrated with the statistical and verbal descriptions of the data set. From E. R. Tufte.

  18. Presentation of Data

    Tabular Ways of Data Presentation and Analysis. To avoid the complexities involved in the textual way of data presentation, people use tables and charts to present data. In this method, data is presented in rows and columns - just like you see in a cricket match showing who made how many runs. Each row and column have an attribute (name, year ...

  19. What Is Data Presentation? (Definition, Types And How-To)

    Related: 14 Data Modelling Tools For Data Analysis (With Features) Tabular Tabular presentation is using a table to share large amounts of information. When using this method, you organise data in rows and columns according to the characteristics of the data. Tabular presentation is useful in comparing data, and it helps visualise information.

  20. (PDF) CHAPTER FOUR DATA PRESENTATION, ANALYSIS AND ...

    DATA PRESENTATION, ANALYSIS AND INTERPRETATION. 4.0 Introduction. This chapter is concerned with data pres entation, of the findings obtained through the study. The. findings are presented in ...

  21. What Is Data Presentation? (With How to Present Data)

    In this article, we define data presentation, discuss how to present data, highlight some types of presentation, and outline important tips. Related jobs on Indeed. Part-time jobs. Full-time jobs. ... Tabular presentation An alternative way to represent data is through tables. You may detail your data in rows and columns depending on the type ...

  22. 10 Data Presentation Examples For Strategic Communication

    Tabular data presentation is all about clarity and precision. Think of it as presenting numerical data in a structured grid, with rows and columns clearly displaying individual data points. A table is invaluable for showcasing detailed data, facilitating comparisons and presenting numerical information that needs to be exact. They're commonly ...

  23. Tabular Presentation of Data: Meaning, Objectives, Features and Merits

    To make complex data simpler: The main aim of overview is to present who classified data within a systematic way. The purpose is to condense the bulk of information (data) lower inquiry the an simple and meaningful form. To save space: Tabulation tries to save space by condensing data in a meaningful form while maintaining the quality and quantity out the data.