Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 31 May 2021

Converting tabular data into images for deep learning with convolutional neural networks

  • Yitan Zhu 1 ,
  • Thomas Brettin 1 ,
  • Fangfang Xia 1 ,
  • Alexander Partin 1 ,
  • Maulik Shukla 1 ,
  • Hyunseung Yoo 1 ,
  • Yvonne A. Evrard 2 ,
  • James H. Doroshow 3 &
  • Rick L. Stevens 1 , 4  

Scientific Reports volume  11 , Article number:  11325 ( 2021 ) Cite this article

39k Accesses

51 Citations

5 Altmetric

Metrics details

  • Computational models
  • Machine learning
  • Virtual drug screening

A Publisher Correction to this article was published on 01 July 2021

This article has been updated

Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimized assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image. We apply IGTD to transform gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into their respective image representations. Compared with existing transformation methods, IGTD generates compact image representations with better preservation of feature neighborhood structure. Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.

Similar content being viewed by others

graph image research paper

Highly accurate protein structure prediction with AlphaFold

John Jumper, Richard Evans, … Demis Hassabis

graph image research paper

Predicting and improving complex beer flavor through machine learning

Michiel Schreurs, Supinya Piampongsant, … Kevin J. Verstrepen

graph image research paper

Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

Kyle Swanson, Gary Liu, … Jonathan M. Stokes

Introduction

Convolutional neural networks (CNNs) have been successfully used in numerous applications, such as image and video recognition 1 , 2 , 3 , 4 , medical image analysis 5 , 6 , natural language processing 7 , and speech recognition 8 . CNNs are inspired by visual neuroscience and possess key features that exploit the properties of natural signals, including local connections in receptive field, parameter sharing via convolution kernel, and hierarchical feature abstraction through pooling and multiple layers 9 . These features make CNNs suitable for analyzing data with spatial or temporal dependencies between components 10 , 11 . A particular example is imaging in which the spatial arrangement of pixels carries crucial information of the image content. When applied on images for object recognition, the bottom layers of CNNs detect low-level local features, such as oriented edges at certain positions. While the information flows through the layers, low-level features combine and form more abstract high-level features to assemble motifs and then parts of objects, until the identification of whole objects.

Although CNNs have been applied for image analysis with great success, non-image data are prevalent in many fields, such as bioinformatics 12 , 13 , 14 , medicine 15 , 16 , finance, and others, for which CNNs might not be directly applicable to take full advantage of their modeling capacity. For some tabular data, the order of features can be rearranged in a 2-D space to explicitly represent relationships between features, such as feature categories or similarities 17 , 18 , 19 . This motivates the transformation of tabular data into images, from which CNNs can learn and utilize the feature relationships to improve the prediction performance as compared with models trained on tabular data. The transformation converts each sample in the tabular data into an image in which features and their values are represented by pixels and pixel intensities, respectively. A feature is represented by the same pixel (or pixels) in the images of all samples with the pixel intensities vary across images.

To our knowledge, three methods have been developed to transform non-image tabular data into images for predictive modeling using CNNs. Sharma et al. developed DeepInsight 17 that projects feature vectors onto a 2-D space using t-SNE 20 , which minimizes the Kullback–Leibler divergence between the feature distributions in the 2-D projection space and the original full-dimensional space. Then, on the 2-D projection, the algorithm identifies a rectangle that includes all the projected feature points with a minimum area, which forms the image representation. Bazgir et al. developed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) 18 , which uses the Bayesian multidimensional scaling as a global distortion minimizer to project the features onto a 2-D space and preserves the feature distribution from the original full-dimensional space. The features are then assigned to image pixels according to the projection and a hill climbing algorithm is applied to locally optimize the arrangement of feature positions in the image 18 . Ma and Zhang developed OmicsMapNet 19 to convert gene expression data of cancer patients into 2-D images for the prediction of tumor grade using CNNs. OmicsMapNet utilizes functional annotations of genes extracted from the Kyoto Encyclopedia of Genes and Genomes to construct images via TreeMap 21 , so that genes with similar molecular functions are closely located in the image.

In this paper, we develop a novel method, Image Generator for Tabular Data (IGTD), to transform tabular data into images for subsequent deep learning analysis using CNNs. The algorithm assigns each feature to a pixel in the image. According to the assignment, an image is generated for each data sample, in which the pixel intensity reflects the value of the corresponding feature in the sample. The algorithm searches for an optimized assignment of features to pixels by minimizing the difference between the ranking of pairwise distances between features and the ranking of pairwise distances between the assigned pixels, where the distances between pixels are calculated based on their coordinates in the image. Minimizing the difference between the two rankings assigns similar features to neighboring pixels and dissimilar features to pixels that are far apart. The optimization is achieved through an iterative process of swapping the pixel assignments of two features. In each iteration, the algorithm identifies the feature that has not been considered for swapping for the longest time, and seeks for a feature swapping for it that best reduces the difference between the two rankings.

Compared with three existing methods for converting tabular data into images, the proposed IGTD approach presents several advantages. Unlike OmicsMapNet that requires domain knowledge about features, IGTD is a general method that can be used in the absence of domain knowledge. Because DeepInsight uses the t-SNE projection as image representation, a significant portion of the image is usually left blank, which is composed of pixels not representing features. On the contrary, IGTD provides compact image representations in which each pixel represents a unique feature. Thus, the DeepInsight images are usually much larger than the IGTD images and potentially require more memory and time to train CNNs in subsequent analysis. Compared with REFINED, IGTD generates image representations that better preserve the feature neighborhood structure. In the IGTD image representation, features close to each other in the image are indeed more similar, as will be shown later in the example applications of transforming gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into images. Also, we take the prediction of anti-cancer drug response as an example and demonstrate that CNNs trained on IGTD images provide a better prediction performance than both CNNs trained on alternative image representations and prediction models trained on the original tabular data. Moreover, IGTD provides a flexible framework that can be extended to accommodate diversified data and requirements. Various measures can be implemented to calculate feature and pixel distances and to evaluate the difference between rankings. The size and shape of the image representation can also be flexibly chosen.

IGTD algorithm

Let \({\varvec{X}}\) denote an \(M\) by \(N\) tabular data matrix to be transformed into images. Each row of \({\varvec{X}}\) is a sample and each column is a feature. Let \({{\varvec{x}}}_{i,:}\) , \({{\varvec{x}}}_{:,j}\) , and \({x}_{i,j}\) denote the \(i\) th row, the \(j\) th column, and the element in the \(i\) th row and \(j\) th column, respectively. The bold upper-case and lower-case letters are used to denote matrices and vectors, respectively. Scalars are denoted by either upper-case or lower-case letters without bold. Our goal is to transform each sample \({{\varvec{x}}}_{i,:}\) into an \({N}_{r}\) by \({N}_{c}\) image (i.e. a 2-D array), where \({N}_{r}\times {N}_{c}=N\) . The pairwise distances between features are calculated according to a distance measure, such as the Euclidean distance. These pairwise distances are then ranked ascendingly, so that small distances are given small ranks while large distances are given large ranks. An \(N\) by \(N\) rank matrix denoted by \({\varvec{R}}\) is formed, in which \({r}_{i,j}\) at the \(i\) th row and \(j\) th column of \({\varvec{R}}\) is the rank value of the distance between the \(i\) th and \(j\) th features. The diagonal of \({\varvec{R}}\) is set to be zeros. Apparently, \({\varvec{R}}\) is a symmetric matrix. Fig.  1 a shows an example of the feature distance rank matrix calculated based on the gene expression profiles of CCLs containing 2500 genes that are taken as features. Details regarding the data will be presented in the next section. Distances between genes are measured by the Euclidean distance based on their expression values. In Fig.  1 a, the grey level indicates the rank value. The larger the distance is, the larger the rank is, and the darker the corresponding point is in the plot.

figure 1

An illustration of IGTD strategy based on CCL gene expression data. ( a ) Rank matrix of Euclidean distances between all pairs of genes. The grey level indicates the rank value. 2500 genes with the largest variations across CCLs are included for calculating the matrix. ( b ) Rank matrix of Euclidean distances between all pairs of pixels calculated based on their coordinates in a \(50\) by \(50\) image. The pixels are concatenated row by row from the image to form the order of pixels in the matrix. ( c ) Feature distance rank matrix after optimization and rearranging the features accordingly. ( d ) The error change in the optimization process. The horizontal axis shows the number of iterations and the vertical axis shows the error value.

On the other hand, for an \({N}_{r}\) by \({N}_{c}\) image, the distance between each pair of pixels can be calculated based on the pixel coordinates according to a distance measure, such as the Euclidean distance. Then, the pairwise pixel distances are ranked ascendingly. An \(N\) by \(N\) rank matrix of pixel distances is generated and denoted by \({\varvec{Q}}\) , in which \({q}_{i,j}\) is the rank of the distance between pixel \(i\) and pixel \(j\) . The main diagonal of \({\varvec{Q}}\) is set to zeros and \({\varvec{Q}}\) is also a symmetric matrix. The pixels in the image are concatenated row by row to form the order of pixels in \({\varvec{Q}}\) . Fig.  1 b is an example of the pixel distance rank matrix that shows the ranks of Euclidean distances between all pairs of pixels calculated based on their coordinates in a \(50\) by \(50\) image. The plot presents two apparent patterns. First, the top right and bottom left corners of the plot are generally darker indicating larger distance and rank values, while the region around the diagonal is generally brighter indicating smaller distances and rank values. Second, the plot shows a mosaic pattern because the pixels are concatenated row by row from the image. Small tiles in the plot correspond to pairwise combinations between rows in the image. Thus, there are totally \(50\times 50=\mathrm{2,500}\) tiles in the plot. Each small tile actually shares the same pattern as the whole plot that the top right and bottom left corners of the tile are relatively darker and the region around the diagonal is relatively brighter.

To transform tabular data into images, each feature needs to be assigned to a pixel position in the image. A simple way is to assign the \(i\) th feature (the \(i\) th row and column) in the feature distance rank matrix \({\varvec{R}}\) to the \(i\) th pixel (the \(i\) th row and column) in the pixel distance rank matrix \({\varvec{Q}}\) . But, comparing Fig.  1 a with Fig. 1b, we can see the significant difference between the two matrices. An error function is defined to measure the difference

where \(\mathrm{diff}\left(\cdot ,\cdot \right)\) is a function that measures the difference between \({r}_{i,j}\) and \({q}_{i,j}\) , for which there are various options, such as the absolute difference \(\left|{r}_{i,j}-{q}_{i,j}\right|\) or the squared difference \({\left({r}_{i,j}-{q}_{i,j}\right)}^{2}\) . The error function measures the difference between the lower triangles of the two symmetric matrices. At this stage, the task of assigning each feature to a suitable pixel position so that features similar to each other are close in the image can be converted to reorder the features (rows and columns in \({\varvec{R}}\) ) so that \(\mathrm{err}\left({\varvec{R}},{\varvec{Q}}\right)\) becomes small. Notice that the reordering of rows and columns in \({\varvec{R}}\) needs to synchronized, which means the orders of features along the rows and columns in \({\varvec{R}}\) must always be the same. A basic operation of reordering the features is to swap the positions of two features, because any feature reordering can be implemented by a sequence of feature swaps. Thus, we can reduce the error iteratively by searching for suitable feature swaps. Based on this idea, we design the IGTD algorithm.

The IGTD algorithm takes four input parameters \({S}_{\mathrm{max}}\) , \({S}_{\mathrm{con}}\) , \({t}_{\mathrm{con}}\) , and \({t}_{\mathrm{swap}}\) . \({S}_{\mathrm{max}}\) and \({S}_{\mathrm{con}}\) are two positive integers, and \({S}_{\mathrm{max}}\gg {S}_{\mathrm{con}}\) . \({S}_{\mathrm{max}}\) is the maximum number of iterations that the algorithm will run if it does not converge. \({S}_{\mathrm{con}}\) is the number of iterations for checking algorithm convergence. \({t}_{\mathrm{con}}\) is a small positive threshold to determine whether the algorithm converges. \({t}_{\mathrm{swap}}\) is a threshold on the error reduction rate to determine whether a feature swap should be performed. The IGTD algorithm takes the following 4 steps.

Step 1 initializes some variables used in the algorithm. Initialize the iteration index \(s=0\) . Calculate the initial error \({e}_{0}=\mathrm{err}\left({\varvec{R}},{\varvec{Q}}\right)\) . Initialize \({\varvec{h}}\) , a vector of negative infinities with a length of \(N\) . \({\varvec{h}}\) will be used to record the latest iterations in which the features have been considered for feature swap, i.e. in the optimization process \({h}_{n}\) will be the latest iteration in which the \(n\) th feature in \({\varvec{R}}\) has been considered for feature swap. Let \({{\varvec{k}}}_{0}\) be \(\left[\begin{array}{ccc}1& \cdots & N\end{array}\right]\) , which indicates the ordering of features at the beginning before optimization.

Step 2 identifies the feature that has not been considered for feature swap for the longest time and searches for a feature swap for it that results in the largest error reduction. In this step, the iteration index is updated, \(s=s+1\) . We identify the feature that has not been considered for feature swap for the longest time by identifying the smallest element in \({\varvec{h}}\) ,

Then we identify the feature whose swap with feature \({n}^{*}\) results in the largest error reduction.

where \({{\varvec{R}}}_{{n}^{*}\sim l}\) is the matrix resulting from swapping features \({n}^{*}\) and \(l\) in \({\varvec{R}}\) , i.e. swapping the \({n}^{*}\) th and \(l\) th rows and the \({n}^{*}\) th and \(l\) th columns in \({\varvec{R}}\) . In this search, the algorithm repetitively calculates the error reduction resulted from swapping two features. The calculation involves only the rows and columns corresponding to the two features in the feature and pixel distance rank matrices. See Section  1 in the Supplementary Information for more discussion about the calculation.

Step 3 performs the identified feature swap if the error reduction rate is larger than \({t}_{\mathrm{swap}}\) . If \(\left(\mathrm{err}\left(R,Q\right)-\mathrm{err}\left({R}_{{n}^{*}\sim {l}^{*}},Q\right)\right)/\mathrm{err}\left(R,Q\right)>{t}_{\mathrm{swap}}\) , the algorithm does the following:

\({{\varvec{k}}}_{s}={{\varvec{k}}}_{s-1}\) and swap the \({n}^{*}\) th and \({l}^{*}\) th elements in \({{\varvec{k}}}_{s}\)

\({e}_{s}=\mathrm{err}\left({{\varvec{R}}}_{{n}^{*}\sim {l}^{*}},{\varvec{Q}}\right)\)

\({h}_{{n}^{*}}=s\) and \({h}_{{l}^{*}}=s\)

\({\varvec{R}}={{\varvec{R}}}_{{n}^{*}\sim {l}^{*}}\)

Otherwise, the algorithm does the following:

\({h}_{{n}^{*}}=s\)

\({e}_{s}={e}_{s-1}\)

\({{\varvec{k}}}_{s}={{\varvec{k}}}_{s-1}\)

In the case that the identified feature swap is performed, (i) generates the feature reordering indices of iteration \(s\) that keep track of feature swap; (ii) calculates the error after feature swap; (iii) labels that features \({n}^{*}\) and \({l}^{*}\) have been considered for feature swap in iteration \(s\) ; (iv) updates the feature distance rank matrix after feature swap. In the case that the feature swap is not performed, (v) labels that feature \({n}^{*}\) has been considered for feature swap in iteration \(s\) ; (vi) keeps the error unchanged from the previous iteration; (vii) keeps the feature reordering indices unchanged from the previous iteration. Notice that if \({t}_{\mathrm{swap}}\) is set to be non-negative, the IGTD algorithm monotonically reduces the error. If \({t}_{\mathrm{swap}}\) is set to be negative, the algorithm has a chance to jump out of a local optimum and search for a potentially better solution.

Step 4 checks whether the algorithm should terminate or iterate to Step 2 if it should continue. The algorithm runs iteratively and terminates when reaching either the maximum number of interactions \({S}_{\mathrm{max}}\) or convergence where the error reduction rate is continuously smaller than the threshold \({t}_{\mathrm{con}}\) for \({S}_{\mathrm{con}}\) iterations. So, if \(s={S}_{\mathrm{max}}\) or \(\frac{{e}_{s-{S}_{\mathrm{con}}}-{e}_{u}}{{e}_{s-{S}_{\mathrm{con}}}}<{t}_{\mathrm{con}}\) for \(\forall u\in \left\{s-{S}_{\mathrm{con}}+1,\dots ,s\right\}\) , the algorithm identifies the iteration with the minimum error

It then terminates and outputs \({{\varvec{k}}}_{{{\varvec{v}}}^{\boldsymbol{*}}}\) and \({e}_{{v}^{*}}\) , which are the optimized indices to reorder the features and the optimized error resulted from reordering the features according to \({{\varvec{k}}}_{{{\varvec{v}}}^{\boldsymbol{*}}}\) , respectively. If the termination criteria are not satisfied, the algorithm iterates to Step 2.

Applications on CCL gene expression profiles and drug molecular descriptors

We applied the IGTD algorithm for anti-cancer drug response prediction. Following existing works 22 , 23 , 24 , we predicted the response of a CCL to a drug treatment using the gene expression profile of CCL and the molecular descriptors of drug. Two benchmark in vitro drug screening datasets, the Cancer Therapeutics Response Portal v2 (CTRP) 25 and the Genomics of Drug Sensitivity in Cancer (GDSC) 26 , were used to train and evaluate the performance of drug response prediction model. Supplementary Table 1 shows the numbers of CCLs, drugs, and treatments (i.e. pairs of drugs and CCLs) in the two datasets. The IGTD algorithm was used to transform CCL gene expression profiles and drug molecular descriptors into their respective images. A total of 882 CCLs from various cancer types were included in our analysis. Without loss of generality, we chose the 2,500 genes with the largest expression variations across CCLs for analysis. The drugs were represented by chemical descriptors calculated using the Dragon (version 7.0) software package ( https://chm.kode-solutions.net/products_dragon.php ) based on the drug molecular structure. Molecular descriptors were calculated for a total of 651 drugs included in the two drug screening datasets. Without loss of generality, we also chose the 2500 drug descriptors with the largest variations across drugs for analysis. See Section  2 in the Supplementary Information for the details of data and data preprocessing.

We applied the IGTD algorithm on the CCL gene expression data and the drug molecular descriptors, separately, to generate their image representations. The IGTD algorithm was run with \({N}_{r}=50\) , \({N}_{c}=50\) , \({S}_{\mathrm{max}}=\mathrm{30,000}\) , \({S}_{\mathrm{con}}=500\) , \({t}_{\mathrm{con}}=0.000001\) , \({t}_{\mathrm{swap}}=0\) , the Euclidean distance for calculating pairwise feature distance and pixel distance, and the absolute difference as the \(\mathrm{diff}\left(\bullet \right)\) function. Fig.  1 a and Fig. 1 b show the feature distance rank matrix before optimization and the pixel distance rank matrix, respectively, for the transformation of CCL gene expression profiles into images. Fig.  1 c shows the feature distance rank matrix after optimization and rearranging the features/genes accordingly. After optimization the feature distance rank matrix becomes more similar to the pixel distance rank matrix than it originally is. The optimized feature distance rank matrix shares the two important patterns of the pixel distance rank matrix. The top right corner and the bottom left corner in Fig.  1 c are relatively dark, while the region around the diagonal is relatively bright, and it also shows a mosaic pattern. The optimization error monotonically decreases and tends to converge after approximately 5,000 iterations as shown in Fig.  1 d.

Based on the optimization results, each gene or drug descriptor was assigned to a pixel in the destination images. The grey level of a pixel in the image indicates the expression value of the corresponding gene in a CCL or the value of the corresponding molecular descriptor in a drug. Fig.  2 a shows an example image representation of gene expression profile, which is for the SNU-61 rectal adenocarcinoma cell line ( https://web.expasy.org/cellosaurus/CVCL_5078 ). Fig.  2 d shows an example image representation of drug molecular descriptors, which is for Nintedanib ( https://en.wikipedia.org/wiki/Nintedanib ), an inhibitor of multiple receptor tyrosine kinases and non-receptor tyrosine kinases. In Fig.  2 a and Fig.  2 d, some genes or drug descriptors have very small values and thus are indicated by white or a color close to white.

figure 2

Example image representations of CCL gene expression profiles and drug molecular descriptors. ( a – c ) are image representations of the gene expression profile of the SNU-61 cell line generated by IGTD, REFINED, and DeepInsight, respectively. ( d – f ) are image representations of molecular descriptors of Nintedanib, generated by IGTD, REFINED, and DeepInsight, respectively.

For comparison purposes, we also generated image representations using DeepInsight 17 and REFINED 18 . Fig.  2 c and Fig. 2 f show the images generated using DeepInsight for the SNU-61 cell line and Nintedanib, respectively. Because the DeepInsight images were generated using 2-D t-SNE projection, a significant portion of the images is blank, especially in the presence of outlier features. To include the 2,500 features into the plots with a reasonable resolution, the size of DeepInsight images are much larger than that of IGTD images, \(227\times 387=\mathrm{87,849}\) pixels (Fig.  2 c) and \(380\times 387=\mathrm{147,060}\) pixels (Fig.  2 f) vs. \(50\times 50=\mathrm{2,500}\) pixels (Fig.  2 a and Fig. 2 d). The large images generated by DeepInsight may require more memory and time to train the prediction model in subsequent analysis.

Similar to IGTD, REFINED also generates compact image representations without any blank area. Fig.  2 b and Fig. 2 e show the images that REFINED generated for the SNU-61 cell line and Nintedanib, respectively. To investigate the difference between IGTD and REFINED images, we used the following local heterogeneity (LH) measure to quantitatively evaluate the preservation of feature neighborhood structure in image representations.

where \({y}_{i,j}\) is the intensity of the pixel in the \(i\) th row and \(j\) th column of an image (denoted by \({\varvec{Y}}\) ), and \({\mathcal{N}}_{i,j}\) is a \(p\times p\) neighborhood centered around \({y}_{i,j}\) but not including \({y}_{i,j}\) . In a \(p\times p\) neighborhood, the average absolute difference between the center pixel and the neighbor pixels is calculated to measure the neighborhood heterogeneity. The LH measure is the mean neighborhood heterogeneity obtained by moving the neighborhood window across the whole image. The LH measurements were calculated with multiple neighborhood sizes for both IGTD and REFINED image representations. Two-tail pairwise t-test 27 was applied across CCLs or drugs to examine the LH difference between IGTD and REFINED images. For each CCL and drug, we also calculated the percentage that IGTD reduced the local heterogeneity compared with REFINED, which is \(\left({\mathrm{LH}}_{\mathrm{REFINED}}-{\mathrm{LH}}_{\mathrm{IGTD}}\right)/{\mathrm{LH}}_{\mathrm{REFINED}}\times 100\%\) , where \({\mathrm{LH}}_{\mathrm{REFINED}}\) and \({\mathrm{LH}}_{\mathrm{IGTD}}\) are the LH measurements of the REFINED and IGTD images, respectively. Table 1 shows the result. For both CCLs and drugs and all neighborhood sizes in consideration (i.e. 3, 5, 7, and 9), the average LH of the IGTD images is always statistically significantly lower (p-values ≤ 0.05) than that of the REFINED images. This result indicates that the IGTD algorithm better preserves the neighborhood structure of features in the 2-D images, so that similar features are grouped closer in IGTD images.

We also compared the runtimes of IGTD, REFINED, and DeepInsight for converting tabular data into images. For the gene expressions of CCLs, IGTD, REFINED, and DeepInsight took 0.66, 7.69, and 0.04 hour to convert them into images, respectively. For the drug descriptors, IGTD, REFINED, and DeepInsight took 0.74, 5.13, and 0.07 h to convert them into images, respectively. Notice that both IGTD and DeepInsight were executed with one CPU processor, while REFINED was executed with parallel computing using 40 processors of the same specification. This result indicates that DeepInsight converts tabular data into images significantly faster. This observation is expected, because DeepInsight does not generate compact image representations that require an optimization process to assign features to suitable pixel positions as what IGTD and REFINED do. Interestingly, for the two methods that generate compact image representations, the runtimes of REFINED were much longer than those of IGTD, even when REFINED used parallel computing with 40 processors while IGTD used only a single processor.

Drug response prediction using CNNs based on image representations

We performed drug response prediction using CNN models trained on the IGTD image representations. See Section  2 in the Supplementary Information for the preprocessing of drug screening datasets. The area under the dose response curve (AUC) was taken as the prediction target in a regression setting. Fig.  3 shows the architecture of the CNN model. For both CCLs and drugs, a subnetwork of three convolution layers, each of which has \(5\times 5\) kernels and subsequent batch normalization, ReLU activation, and maximum pooling layers, accepts the image representations as the input. The output feature maps from the subnetworks are flattened, concatenated, and passed to a fully connected network to make predictions. The total number of trainable parameters in the model is 1,307,218. The mean square error was used as the loss function to be minimized during model training. A tenfold cross-validation was performed to train and evaluate the prediction models, in which eight data folds were used for model training, one data fold was used for validation to select the dropout rate and for early stopping to avoid overfitting, and the rest one data fold was used for testing the prediction performance. A total of 20 cross-validation trials were conducted. The prediction performance was measured by the coefficient of determination (R 2 ).

figure 3

Architecture of the convolutional neural network (CNN) used for predicting drug response based on image representations.

To assess the utilities of different image representations, the same CNN models were also trained with REFINED and DeepInsight images. The only difference was when training with DeepInsight images the stride value for moving the convolution kernels was changed from 1 to 2, in order to accommodate the larger input images. Due to the larger input images and consequently larger feature maps and concatenation layer, the number of trainable parameters in the model increased from 1,307,218 for IGTD and REFINED images to 2,715,218 for DeepInsight images. Because the larger input images consumed more memory, we always encountered the out-of-memory error when training models using static data of DeepInsight images. To avoid the error, a data generator mechanism had to be implemented to generate the training data batch by batch on the fly instead of using static data. The out-of-memory error never occurred in model training using static data of IGTD and REFINED images due to their smaller size, which demonstrated that the compact image representations of IGTD and REFINED indeed required less memory for model training.

We also compared CNNs trained on IGTD images with prediction models trained on the original tabular data. Four prediction models, including LightGBM 28 , random forest 29 , single-network DNN (sDNN), and two-subnetwork DNN (tDNN), were included for the comparison. LightGBM is an implementation of the gradient boosting decision tree algorithm that uses techniques of gradient-based one-side sampling and exclusive feature bundling to speed up model training 28 . Random forest constructs multiple decision trees on random sub-samples of data and uses the average of their outcomes as the prediction 29 . sDNN was a fully connected neural network of six hidden layers. For LightGBM, random forest, and sDNN, the CCL gene expression profile and the drug molecular descriptors were concatenated to form the input vector. tDNN was also a neural network with dense hidden layers, but it includes two subnetworks for the input of gene expression profiles and drug molecular descriptors separately. Each subnetwork included three hidden layers. The outputs of the two subnetworks were concatenated and passed to another three hidden layers to make prediction. For a fair comparison, all prediction models were trained and tested through 20 tenfold cross-validation trials, with the same data partitions (i.e. training, validation, and testing sets) used for the cross-validation of CNNs with image representations. See Section  3 in the Supplementary Information for details of the prediction models and the model training process.

Table 2 shows the drug response prediction performance obtained using different data representations and prediction models. CNNs with IGTD images provide the highest average R 2 across cross-validation trials on both CTRP and GDSC datasets. The average R 2 of CNN with REFINED images is similar to that of CNN with IGTD images, presumably because both IGTD and REFINED take a similar strategy to generate compact image representations with an intention of grouping similar features together in the image. CNN with DeepInsight images and tDNN with tabular data rank the third and the fourth on the CTRP dataset, while their ranks switch on the GDSC dataset. sDNN, LightGBM, and random forest with tabular data rank the fifth, sixth, and seventh on the two datasets, respectively. The two-tail pair-wise t-test is applied to evaluate the performance difference between CNN with IGTD images and other combinations of prediction models and data representations. The result shows that CNNs trained with IGTD images statistically significantly outperform (p-values ≤ 0.05) all other combinations, except CNNs trained with REFINED images for which the p-values do not make the cutoff.

Because the DeepInsight images are much larger than the IGTD or REFINED images, the number of trainable parameters at least double (2,715,218 parameters vs. 1,307,218 parameters) for CNN models trained on DeepInsight images. To investigate how the larger input image size and consequent model size affect the model training speed, we compare the model training time (i.e. the time to train a prediction model to convergence) of CNNs with different image representations. For each cross-validation trial, we calculate the ratio between the model training time of CNN with DeepInsight or REFINED images and that of CNN with IGTD images. The ratio is then log2 transformed, so that a positive value indicates CNN with DeepInsight or REFINED images takes a longer time to train while a negative value indicates CNN with IGTD images takes a longer time to train. See Table 3 for the mean and standard deviation of the log2 ratio obtained in cross-validation. The one-sample t-test is applied across the cross-validation trials to evaluate how significantly the log2 ratio is different from 0. The result indicates that CNNs take a statistically significantly shorter time (p-values ≤ 0.05) to train on IGTD images than on DeepInsight images for both datasets. CNNs with IGTD images also train statistically significantly faster than CNNs with REFINED images on the GDSC dataset, while their training speeds are similar on the CTRP dataset without a significant difference.

We developed the Image Generator for Tabular Data (IGTD), a novel algorithm that transforms tabular data into images for deep learning with CNN models. To investigate its utility, we applied the algorithm to convert CCL gene expression profiles and drug molecular descriptors into images, and compared with existing methods that also convert tabular data into images. Compared with DeepInsight, IGTD generates more compact image representations in which every pixel corresponds to a different feature. The compact images reduce the memory consumption and increase the training speed of prediction model in subsequent analysis. As compared with REFINED, the image representations generated by IGTD better preserve the feature neighborhood structure by clustering similar features closer in the images. Based on two benchmark in vitro drug screening datasets, we trained CNNs with the image representations of CCLs and drugs to predict anti-cancer drug response. The prediction performance of CNNs trained on different image representations were compared with each other and with several other prediction models trained on the original tabular data. The results show that CNNs trained on IGTD images provide the highest average prediction performance in cross-validation on both datasets.

IGTD provides a flexible framework that can be easily extended to accommodate diversified data and requirements. Its flexibility can be seen from multiple aspects. First, various distance measures can be designed and used to calculate the feature and pixel distances. For example, besides the Euclidean distance, another feature distance measure is \(1-\rho\) , where \(\rho\) can be a correlation coefficient for continuous variables or the Jaccard index for binary variables. To measure the pixel distance, the Manhattan distance can also be used instead of the Euclidean distance. Second, various difference functions can be implemented to measure the deviation between the feature distance ranking and the pixel distance ranking. Different difference functions may emphasize on distinct aspects of the data. For example, compared with the absolute difference function the squared difference function puts larger weights on elements with large differences. Third, the number of dimensions, size, and shape of the images can be flexibly chosen. The IGTD framework can be extended in a straightforward manner to transform data vectors into not only 2-D matrices, but also 1-D or multi-dimensional arrays with the features rearranged according to mutual similarities or even images of irregular shapes, such as a concave polygon. Fourth, the numbers of features and image pixels can be flexibly adjusted to match each other. If there are more features than image pixels, either larger images with more pixels can be used or a front-end feature selection can be done to reduce the feature number. If there are fewer features than image pixels, either smaller images can be used or pseudo features with all zero elements can be padded to the data to match the feature and pixel numbers.

Compared with existing studies, our IGTD work has the following contributions. First, IGTD transforms tabular data into images using a novel approach, which minimizes the difference between feature distance ranking and pixel distance ranking. The optimization keeps similar features close in the image representation. Second, compared with existing approaches of transforming tabular data into images, IGTD does not require domain knowledge and provides compact image representations with a better preservation of feature neighborhood structure. Third, using drug response prediction as an example, we demonstrate that CNNs trained on IGTD image representations provide a better (or similar) prediction performance than CNNs trained on other image representations and prediction models trained on the original tabular data. Fourth, IGTD is a flexible framework that can be extended to accommodate diversified data and requirements as described above.

Because both IGTD and REFINED generate compact image representations for tabular data, it is important to compare and summarize their difference. We have comprehensively compared the two methods from four aspects, including the local heterogeneity of the generated images, the runtime to generate image representations, the prediction performance based on image representations, and the time for training prediction model. IGTD outperforms REFINED significantly in terms of the preservation of feature neighborhood structure in image and the speed of converting tabular data into images, while the benefit of IGTD is not very significant for improving the prediction performance and the model training speed. Although prediction modeling with CNNs is one of the most important purposes of converting tabular data into images, IGTD also provides a significantly better choice for applications that emphasize on generating compact image representations promptly with a good preservation of feature neighborhood structure.

To understand how sensitive the IGTD algorithm is to the hyper-parameters \({S}_{\mathrm{max}}\) , \({S}_{\mathrm{con}}\) , and \({t}_{\mathrm{con}}\) , we run the IGTD algorithm with three different values for each parameter that spanned across a reasonably large range. Specifically, we tried 10,000, 20,000, and 30,000 for \({S}_{\mathrm{max}}\) , 200, 350, and 500 for \({S}_{\mathrm{con}}\) , 0.0001, 0.00001, and 0.000001 for \({t}_{\mathrm{con}}\) . In total, \(3\times 3\times 3=27\) different combinations of parameter settings were used to apply the IGTD algorithm on CCL gene expression profiles and drug molecular descriptors. Supplementary Table 2 shows the optimization results, which are the obtained errors after optimization. To evaluate the variation of error across 27 different parameter settings, we calculated the coefficient of variation for the error, which was the ratio of the standard deviation to the mean. The coefficient of variation of error was 0.029% and 0.039% for the analyses of gene expressions and drug descriptors, respectively. Such small coefficients of variation indicate that the IGTD algorithm is not very sensitive to the variation of the hyper-parameters in a relatively large range. This observation is also expected, because the optimization process reaches a plateau region fairly quickly. For example, in Fig.  1 d the error does not change much after about 5000 iterations. As long as the hyper-parameters allow the optimization process to reach the plateau region, the optimization result is not very sensitive to the hyper-parameter setting.

A hypothesis supporting the transformation of data into images is that images may better represent the relationship between features that can be learned by CNNs to facilitate prediction. Apparently, this hypothesis is not universally true for all data. An extreme example can be a dataset including only independent features, where there is no meaningful feature relationship to be represented using images. We expect the IGTD algorithm to perform better for data with feature relationships that can be characterized by feature similarities, although there is not much existing knowledge regarding such relationships.

Data availability

IGTD software package is available at https://github.com/zhuyitan/IGTD .

Change history

01 july 2021.

A Correction to this paper has been published: https://doi.org/10.1038/s41598-021-93376-5

Hadsell, R. et al. Learning long-range vision for autonomous off-road driving. J. Field Robot. 26 , 120–144 (2009).

Article   Google Scholar  

Garcia, C. & Delakis, M. Convolutional face finder: A neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Machine Intell. 26 , 1408–1423 (2004).

Tompson, J., Goroshin, R. R., Jain, A., LeCun, Y. Y. & Bregler, C. C. Efficient object localization using convolutional networks. in IEEE Conference on Computer Vision and Pattern Recognition. (2015).

Sermanet, P., Kavukcuoglu, K., Chintala, S. & LeCun, Y. Pedestrian detection with unsupervised multi-stage feature learning. in IEEE Conference on Computer Vision and Pattern Recognition. (2013).

Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25 , 1054–1056. https://doi.org/10.1038/s41591-019-0462-y (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Schmauch, B. et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat. Commun. 11 , 3877 (2020).

Article   ADS   CAS   Google Scholar  

Collobert, R. et al. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 , 2493–2537 (2011).

MATH   Google Scholar  

Sainath, T., Mohamed, A. R., Kingsbury, B. & Ramabhadran, B. Deep convolutional neural networks for LVCSR. in IEEE International Conference on Acoustics, Speech and Signal Processing. 8614–8618 (2013).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444. https://doi.org/10.1038/nature14539 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Arel, I., Rose, D. C. & Karnowski, T. P. Deep machine learning: A new frontier in artificial intelligence research. IEEE Comput. Intell. Mag. 5 , 13–18 (2010).

Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L. & Muller, P. A. Deep learning for time series classification: A review. Data Min. Knowl. Disc. 33 , 917–963. https://doi.org/10.1007/s10618-019-00619-1 (2019).

Article   MathSciNet   MATH   Google Scholar  

Bayat, A. Science, medicine, and the future: Bioinformatics. BMJ 324 , 1018–1022. https://doi.org/10.1136/bmj.324.7344.1018 (2002).

Article   PubMed   PubMed Central   Google Scholar  

Zhu, Y., Qiu, P. & Ji, Y. TCGA-Assembler: Open-source software for retrieving and processing TCGA data. Nat. Methods 11 , 599–600 (2014).

Article   CAS   Google Scholar  

Zhu, Y. et al. Zodiac: A comprehensive depiction of genetic interactions in cancer by integrating TCGA data. J. Natl. Cancer Inst. 107 , 129. https://doi.org/10.1093/jnci/djv129 (2015).

Topol, E. J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 25 , 44–56. https://doi.org/10.1038/s41591-018-0300-7 (2019).

Article   CAS   PubMed   Google Scholar  

Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med. 1 , 18. https://doi.org/10.1038/s41746-018-0029-1 (2018).

Sharma, A., Vans, E., Shigemizu, D., Boroevich, K. A. & Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9 , 11399. https://doi.org/10.1038/s41598-019-47765-6 (2019).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Bazgir, O. et al. Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks. Nat. Commun. 11 , 4391. https://doi.org/10.1038/s41467-020-18197-y (2020).

Ma, S. & Zhang, Z. OmicsMapNet: Transforming omics data to take advantage of deep convolutional neural network for discovery. https://arxiv.org/abs/1804.05283 (2018).

Van der Maaten, L. J. P. & Hinton, G. E. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9 , 2579–2605 (2008).

Shneiderman, B. Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans. Graph. 11 , 92–99 (1992).

Zhu, Y. et al. Enhanced co-expression extrapolation (COXEN) gene selection method for building anti-cancer drug response prediction models. Genes 11 , 1070. https://doi.org/10.3390/genes11091070 (2020).

Article   CAS   PubMed Central   Google Scholar  

Zhu, Y. et al. Ensemble transfer learning for the prediction of anti-cancer drug response. Sci. Rep. 10 , 18040 (2020).

Partin, A. et al. Learning curves for drug response prediction in cancer cell lines. https://arxiv.org/abs/2011.12466 (2020).

Basu, A. et al. An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules. Cell 154 , 1151–1161. https://doi.org/10.1016/j.cell.2013.08.003 (2013).

Yang, W. et al. Genomics of drug sensitivity in cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41 , D955-961. https://doi.org/10.1093/nar/gks1111 (2013).

Goulden, C. H. Methods of Statistical Analysis 2nd edn, 50–55 (Wiley, 1956).

Google Scholar  

Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. in 31st International Conference on Neural Information Processing Systems. 3149–3157 (2017).

Breiman, L. Random forests. Mach. Learn. 45 , 25–32 (2001).

Download references

Acknowledgements

This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725. This project has also been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. We thank Prasanna Balaprakash and Rida Assaf for their critical review of the manuscript.

Author information

Authors and affiliations.

Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL, 60439, USA

Yitan Zhu, Thomas Brettin, Fangfang Xia, Alexander Partin, Maulik Shukla, Hyunseung Yoo & Rick L. Stevens

Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, 21702, USA

Yvonne A. Evrard

Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, 20892, USA

James H. Doroshow

Department of Computer Science, The University of Chicago, Chicago, IL, 60637, USA

Rick L. Stevens

You can also search for this author in PubMed   Google Scholar

Contributions

Y.Z. developed the algorithm, conducted the analysis, and led the writing of article. F.X., A.P., M.S., and H.Y. collected and processed the data for analysis. R.L.S. and T.B. supervised and participated in the conceptualization of the project. J.H.D. and Y.A.E. participated in the validation of analysis results. All authors participated in writing the manuscript.

Corresponding author

Correspondence to Yitan Zhu .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhu, Y., Brettin, T., Xia, F. et al. Converting tabular data into images for deep learning with convolutional neural networks. Sci Rep 11 , 11325 (2021). https://doi.org/10.1038/s41598-021-90923-y

Download citation

Received : 01 February 2021

Accepted : 17 May 2021

Published : 31 May 2021

DOI : https://doi.org/10.1038/s41598-021-90923-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Advances in ai and machine learning for predictive medicine.

  • Alok Sharma
  • Artem Lysenko
  • Tatsuhiko Tsunoda

Journal of Human Genetics (2024)

A fast spatio-temporal temperature predictor for vacuum assisted resin infusion molding process based on deep machine learning modeling

  • Runyu Zhang
  • Yingjian Liu

Journal of Intelligent Manufacturing (2024)

Machine learning prediction models for in-hospital postoperative functional outcome after moderate-to-severe traumatic brain injury

  • Bao-qiang Song

European Journal of Trauma and Emergency Surgery (2024)

Visualizations for universal deep-feature representations: survey and taxonomy

  • Tomáš Skopal
  • Ladislav Peška
  • David Bernhauer

Knowledge and Information Systems (2024)

DeepInsight-3D architecture for anti-cancer drug response prediction with deep-learning on multi-omics

Scientific Reports (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

graph image research paper

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

How to cite images and graphs in your research paper

Deeptanshu D

Table of Contents

How-to-cite-images-and-graphs-in-a-research-paper

If you are confused about whether you should include pictures, images, charts, and other non-textual elements in your research paper or not, I would suggest you must insert such elements in your research paper. Including non-textual elements like images and charts in the research paper helps extract a higher acceptance of your proposed theories.

An image or chart will make your research paper more attractive, interesting, explanatory, and understandable for the audience. In addition, when you cite an image or chart, it helps you describe your research and its parts with far more precision than simple, long paragraphs.

There are plenty of reasons why you should cite images in your research paper. However, most scholars and academicians avoid it altogether, losing the opportunity to make their research papers more interesting and garner higher readership.

Additionally, it has been observed that there are many misconceptions around the use or citation of images in research papers. For example, it is widely believed and practiced that using pictures or any graphics in the research papers will render it unprofessional or non-academic. However, in reality, no such legit rules or regulations prohibit citing images or any graphic elements in the research papers.

You will find it much easier once you know the appropriate way to cite images or non-textual elements in your research paper. But, it’s important to keep in mind some rules and regulations for using different non-textual elements in your research paper. You can easily upgrade your academic/ research writing skills by leveraging various guides in our repository.

In this guide, you will find clear explanations and guidelines that will teach you how to identify appropriate images and other non-textual elements and cite them in your research paper. So, cut the clutter; let’s start.

Importance of citing images in a research paper

Although it’s not mandatory to cite images in a research paper, however, if you choose to include them, it will help showcase your deep understanding of the research topic. It can even represent the clarity you carry for your research topic and help the audience navigate your paper easily.

Why-it-is-important-to-use-images-and-graphs-in-a-research-paper.

There are several reasons why you must cite images in your research paper like:

(i) A better explanation for the various phenomenon

While writing your research paper, certain topics will be comparatively more complex than others. In such a scenario where you find out that words are not providing the necessary explanation, you can always switch to illustrating the process using images. For example, you can write paragraphs describing climate change and its associated factors and/or cite a single illustration to describe the complete process with its embedded factors.

(ii) To simplify examples

To create an impeccable research paper, you need to include evidence and examples supporting your argument for the research topic. Rather than always explaining the supporting evidence and examples through words, it will be better to depict them through images. For example, to demonstrate climate change's effects on a region, you can always showcase and cite the “before and after” images.

(iii) Easy Classification

If your research topic requires segregation into various sub-topics and further, you can easily group and classify them in the form of a classification tree or a chart. Providing such massive information in the format of a classification tree will save you a lot of words and present the information in a more straightforward and understandable form to your audience.

(iv) Acquire greater attention from the audience

Including images in your research paper, theses, and dissertations will help you garner the audience's greater attention. If you add or cite images in the paper, it will provide a better understanding and clarification of the topics covered in your research. Additionally, it will make your research paper visually attractive.

Types of Images that you can use or cite in your research paper

Using and citing images in a research paper as already explained can make your research paper more understanding and structured in appearance. For this, you can use photos, drawings, charts, graphs, infographics, etc. However, there are no mandatory regulations to use or cite images in a research paper, but there are some recommendations as per the journal style.

Before including any images in your research paper, you need to ensure that it fits the research topic and syncs with your writing style. As already mentioned, there are no strict regulations around the usage of images. However, you should make sure that it satisfies certain parameters like:

  • Try using HD quality images for better picture clarity in both print and electronic formats
  • It should not be copyrighted, and if it is, you must obtain the license to use it. In short cite the image properly by providing necessary credits to its owner
  • The image should satisfy the context of the research topic

You can cite images in your research paper either at the end, in between the topics, or in a separate section for all the non-textual elements used in the paper. You can choose to insert images in between texts, but you need to provide the in-text citations for every image that has been used.

Additionally, you need to attach the name, description and image number so that your research paper stays structured. Moreover, you must cite or add the copyright details of the image if you borrow images from other platforms to avoid any copyright infringement.

Graphs and Charts

You can earn an advantage by providing better and simple explanations through graphs and charts rather than wordy descriptions. There are several reasons why you must cite or include graphs and charts in your research paper:

  • To draw a comparison between two events, phenomena, or any two random parameters
  • Illustration of statistics through charts and graphs are most significant in drawing audience attention towards your research topic
  • Classification tree or pie charts goes best to show off the degree of influence of a specific event, or phenomenon in your research paper

With the usage of graphs and charts, you can answer several questions of your readers without them even questioning. With charts and graphs, you can provide an immense amount of information in a brief yet attractive manner to your readers, as these elements keep them interested in your research topic.

Providing these non-textual elements in your research paper increases its readability. Moreover, the graphs and charts will drive the reader’s attention compared to text-heavy paragraphs.

You can easily use the graphs or charts of some previously done research in your chosen domain, provided that you cite them appropriately, or else you can create your graphs through different tools like Canva, Excel, or MS PowerPoint. Additionally, you must provide supporting statements for the graphs and charts so that readers can understand the meaning of these illustrations easily.

Similarly, like pictures or images, you can choose one of the three possible methods of placement in your research paper, i.e., either after the text or on a different page right after the corresponding paragraph or inside the paragraph itself.

How to Cite Images and Graphs in a Research Paper?

How-to-cite-images-and-graphs-in-a-research-paper.

Once you have decided the type of images you will be using in your paper, understand the rules of various journals for the fair usage of these elements. Using pictures or graphs as per these rules will help your reader navigate and understand your research paper easily. If you borrow or cite previously used pictures or images, you need to follow the correct procedure for that citation.

Usage or citation of pictures or graphs is not prohibited in any academic writing style, and it just differs from each other due to their respective formats.

Cite an Image/Graphs in APA (American Psychological Association) style

Most of the scientific works, society, and media-based research topics are presented in the APA style. It is usually followed by museums, exhibitions, galleries, libraries, etc. If you create your research paper in APA style and cite already used images or graphics, you need to provide complete information about the source.

In APA style, the list of the information that you must provide while citing an element is as follows:

  • Owner of the image (artist, designer, photographer, etc.)
  • Complete Date of the Image: Follow the simple DD/MM/YYYY to provide the details about the date of the image. If you have chosen a certain historical image, you can choose to provide the year only, as the exact date or month may be unknown
  • Country or City where the Image was first published
  • A Name or Title of the Image (Optional: Means If it is not available, you can skip it)
  • Publisher Name: Organization, association, or the person to whom the image was first submitted

If you want to cite some images from the internet, try providing its source link rather than the name or webpage.

Format/Example of Image Citation:

Johanson, M. (Photographer). (2017, September, Vienna, Austria. Rescued bird. National gallery.

Cite an Image/Graphs in MLA (Modern Language Association) style

MLA style is again one of the most preferred styles worldwide for research paper publication. You can easily use or cite images in this style provided no rights of the image owner get violated. Additionally, the format or the information required for citation or usage is very brief yet precise.

In the MLA style, the following are the details that a used image or graph must carry:

  • Name of the creator of the owner
  • Title, Name, or the Description of the Image
  • Website Or the Source were first published
  • Contributors Name (if any)
  • Version or Serial Number (if any)
  • Publisher’s Details; at least Name must be provided
  • Full Date (DD:MM: YYYY) of the first published Image
  • Link to the original image

Auteur, Henry. “Abandoned gardens, Potawatomi, Ontario.” Historical Museum, Reproduction no. QW-YUJ78-1503141, 1989, www.flickr.com/pictures/item/609168336/

Final Words

It is easy to cite images in your research paper, and you should add different forms of non-textual elements in the paper. There are different rules for using or citing images in research papers depending on writing styles to ensure that your paper doesn’t fall for copyright infringement or the owner's rights get violated.

No matter which writing style you choose to write your paper, make sure that you provide all the details in the appropriate format. Once you have all the details and understanding of the format of usage or citation, feel free to use as many images that make your research paper intriguing and interesting enough.

If you still have doubts about how to use or cite images, join our SciSpace (Formerly Typeset) Community and post your questions there. Our experts will address your queries at the earliest. Explore the community to know what's buzzing and be a part of hot discussion topics in the academic domain.

Learn more about SciSpace's dedicated research solutions by heading to our product page. Our suite of products can simplify your research workflows so that you can focus more on what you do best: advance science.

With a best-in-class solution, you can handle everything from literature search and discovery to profile management, research writing, and formatting.

But Before You Go,

You might also like.

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Sumalatha G

Literature Review and Theoretical Framework: Understanding the Differences

Nikhil Seethi

Types of Essays in Academic Writing - Quick Guide (2024)

Enago Academy

How to Create Precise Graphs, Diagrams or Images in a Research Paper

' src=

According to the American psychologist Howard Gardner, human intelligence can be divided into seven categories: visual-spatial, bodily kinesthetic, musical, interpersonal, intrapersonal, linguistic, and logical-mathematical. This implies our intelligence strengths can be different in each (so-called) intelligence profile and that everybody can be intelligent in many different ways.

Gardner says these differences “challenge an educational system that assumes that everyone can learn the same materials in the same way and that a uniform, universal measure suffices to test student learning.” The truth is that we learn and understand things differently, and these differences affect the manner we read academic papers. A research paper is usually a combination of written and visual information. We can assume that those who have a predominant linguistic intelligence would focus on written information, whereas those with a visual-spatial intelligence would feel more comfortable focusing on graphs, diagrams, or images. How to combine both to achieve a paper that engages readers with different intelligence profiles at par?

The Perfect Combination

The first thing we must understand is that, no matter how much visual support they have, papers are written works. Filling pages with unnecessary images, graphs, diagrams or any other kind of visual material is never a good idea. Remember that you are writing a professional academic paper and, therefore, your capacity to discern which material is important. Once this is clear, it is time to discern which information is likely to be visually demonstrated .

Some main ideas would help you to decide when to use graphs. Choose only information that can be clearer if explained visually, and only if it is so important that you desire the reader to keep focus on it more than in other parts. Besides, this piece of information must be qualitatively or quantitatively measurable.

Images can also be used to summarize; plenty of information can be perfectly summed up in a single graph. Lastly, another reason to use images is comparison. Graphs and diagrams are great tools to indicate the differences between two agents.

Do not fill up your images with too much information because it would complicate the readers’ understanding. Images combine or support the written words, but should not be used to replace them. A good combination of words and images can ease the paper’s general understanding.

Thinking Visually: How to Choose?

It is important to know the possibilities each tool offers. Graphs, for example, are good to express the mathematical relationship or statistical correlation between data. Line graphs are useful to present an evolution, circulant graphs are better to indicate proportional parts and column graphs are commonly used to compare different elements.

Researchers and academics are supposed to have a good command of graphs usage . However, the capacity of selecting which data is most likely to be shown this way makes the difference. Indeed, achieving a good command of these tools is quite difficult, but is possible with experience.

Last but not least, it is always helpful to consider the final goal of an academic paper: communication . Thus, if the graph clearly points to one of the research’s main statements, do not doubt in using it.

Rate this article Cancel Reply

Your email address will not be published.

graph image research paper

Enago Academy's Most Popular Articles

explanatory variables

  • Reporting Research

Explanatory & Response Variable in Statistics — A quick guide for early career researchers!

Often researchers have a difficult time choosing the parameters and variables (like explanatory and response…

data visualization techniques

  • Manuscript Preparation
  • Publishing Research

How to Use Creative Data Visualization Techniques for Easy Comprehension of Qualitative Research

“A picture is worth a thousand words!”—an adage used so often stands true even whilst…

statistics in research

  • Figures & Tables

Effective Use of Statistics in Research – Methods and Tools for Data Analysis

Remember that impending feeling you get when you are asked to analyze your data! Now…

  • Old Webinars
  • Webinar Mobile App

SCI中稿技巧: 提升研究数据的说服力

如何寻找原创研究课题 快速定位目标文献的有效搜索策略 如何根据期刊指南准备手稿的对应部分 论文手稿语言润色实用技巧分享,快速提高论文质量

Distill

Distill: A Journal With Interactive Images for Machine Learning Research

Research is a wide and extensive field of study. This field has welcomed a plethora…

Explanatory & Response Variable in Statistics — A quick guide for early career…

How to Create and Use Gantt Charts

graph image research paper

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

graph image research paper

What should universities' stance be on AI tools in research and academic writing?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Postgrad Med
  • v.69(3); Jul-Sep 2023
  • PMC10394528

Utilizing tables, figures, charts and graphs to enhance the readability of a research paper

Department of Pediatrics, College of Medicine and Health Sciences, National University of Science and Technology, Sohar, Sultanate of Oman

1 Department of Pediatrics, Seth G.S. Medical College and KEM Hospital, Mumbai, Maharashtra, India

Introduction

Every author aims to reach the maximum target audience through his/her research publication/s. Our previous editorials have touched upon the process of writing a quality research paper and its successful publication in an appropriate journal.[ 1 , 2 ] Journal-specific ”Instructions for Authors” generally have defined limits to the text and non-textual content for the benefit of space and presentation. Though the aim of a paper is to get its research point across through methodology, results, and discussion, readers often read the summary of data and analysis (only). Thus, the tables, figures, charts, and graphs are time and space-effective tools that not only help to understand the research presented in a simple manner but also engage and sustain the reader's interest.

Why use tables/figures/charts or graphs?

Reading text matter can often get monotonous – for the readers as well as the editors and reviewers. Using Tables/Figures/Charts or Graphs effectively provides a break from textual content monotony as well as provides an opportunity to process and connect information between text and images, promoting deeper learning. It is suggested that one non-textual element should be used for every 1000 words in a manuscript, which generally amounts to two for every three print pages.[ 3 ] The use of tables/figures/charts/graphs not only reduces the word count but also complements the text effectively. Although the text focuses on explaining findings, outlining trends, and providing contextual information, non-textual content allows readers to understand characteristics, distribution, and relationships between data, and visualize statistics/abstract concepts in a powerful manner. High-quality tables and figures also increase the likelihood of a manuscript being accepted for publication.[ 4 ] Note that the figures/artwork needs to be uploaded as separate files for most of the journals.

The CONSORT statement ( www.equator-ntework.org ) provides guidelines on how to report outcome-specific information in a published clinical trial report; however there are no definite recommendations on how to present non-textual elements, and this varies from one journal to another. Authors tend to prepare them based on their own understanding, often without much thought, and repeat the information presented in the main text. Moreover, while some journals have dedicated editors and resources to redraw or edit figures/tables, others simply publish whatever the authors submit. Thus, to improve the readability of the paper, it is primarily the author's responsibility to submit clear and useful tables, figures, charts, and graphs.

The heart of any research lies in its data, and most readers only get a glimpse of the data via the results. The closest one can get to raw statistics is through data presented in tables, figures, graphs, and supplementary material. Tables, figures, and graphs also help to classify and interpret data, highlight key findings, and present maximum data in a concise space. The author should make a deliberate decision on the presentation of his data early in the writing process. Using a sentence as text is more efficient while presenting up to half a dozen numbers in data or if the information can be summarized in three or lesser sentences.[ 5 ] Figures give an overall picture of concept (but without exact numerical data), while tables present exact values (but are less engaging and less interesting).[ 5 ] The final choice of the presentation depends on the type of data, statistical analysis, and relevant message to be delivered.[ 6 ]

General methodology of design and submission

The general structure followed by most non-textual elements is caption/legend/title, content, and footnotes. All data should be verified thoroughly for errors (especially outliers or unexpected spikes) and data sources should be cited in the footnotes/references. The presentation should be simple and clear enough for the reader to understand without any assumptions.[ 7 ] Each exhibit should be labeled clearly with a title and numbers (usually Arabic numerals) that are separate, unique, and consecutive based on their appearance in the text. The title should be self-explanatory and explain the information presented (what, where, and when) briefly. Footnotes should refer to restrictions, assumptions, abbreviations, explanatory notes, and unusual annotations. The formatting should be consistent throughout (across all tables/graphs) for easy comparison.[ 7 ] Design the figures, tables, and graphs to fit in one page on a scale that will be readable in print.[ 8 ] Always use the insert -> (arrow) page break function to ensure that each new Table/Figure/Graph is seen in the document on a new page. Data from the figures and tables should not be repeated in the text. Although tables/figures are often submitted separately or at the end of manuscript based on journal instructions, they should be referred to in the text at appropriate points by location statements i.e. Figures ​ Figures1 1 and ​ and2 2 or Tables ​ Tables1 1 and ​ and2 2 .[ 7 ] One should be careful during editing and proofreading, as contents and columns may get misplaced.[ 9 ] Ensure to follow the journal instructions regarding numbers and formats and glance through published examples in targeted journal. For additional data/tables/figures/graphs that do not fit into the journal's instructions or are still necessary to be displayed outside the word/Table limit, online appendages (or supplementary files) can be created. Do ask for feedback from experienced colleague/s (but not co-author) for the exhibit before final submissions.

An external file that holds a picture, illustration, etc.
Object name is JPGM-69-125-g001.jpg

A representative table already published in the JPGM earlier (reproduced from Shah S, Deshmukh CT, Tullu MS. The predictors of outcome and progression of pediatric sepsis and septic shock: A prospective observational study from western India. J Postgrad Med 2020;66:67-72 )

An external file that holds a picture, illustration, etc.
Object name is JPGM-69-125-g002.jpg

Representative figure/s already published in the JPGM earlier (reproduced from Mondkar SA, Tullu MS, Sathe P, Agrawal M. Lane-Hamilton syndrome – Is it really a needle in a haystack? J Postgrad Med 2022;68:162-7 )

Do’s and Don’ts while creating effective Tables.[ 8 , 9 , 12 , 14 , 15 ]

Types of graphical representations and their characteristics.[ 5 , 6 , 7 , 8 , 20 , 21 ]

Copyright issues

Material from government publications/public domain may be used without seeking permission; however, permission is required for all fully borrowed, adapted, or modified tables/figures/graphs not in the public domain, usually from the publishers, with appropriate credit notes in footnotes (as stated for the Journal of Postgraduate Medicine – JPGM).[ 9 , 10 ] All data sources should be identified for tables/figures created using information from other studies.[ 9 ] Authors should seek permissions from publishers early in their writing, as their research cannot be published until all written permissions have been submitted and approved.[ 9 ] It is good practice to maintain a copy of such permissions with the corresponding author in case a dispute arises later on.

Use of tables

Tables are meant to give a systematic overview of the results and provide a richer understanding/comprehension of study participant characteristics and principal research findings.[ 11 ] Since tables deal with larger groups of data, they are suitable when all data requires equal attention and readers can selectively scan the data of interest.[ 6 ] Tables can present precise numerical values and information with different units' side-by-side but may not analyze data trends.[ 6 ] However, due to the sheer amount of data, interpretation may take longer.[ 6 ]

Generally, the first table summarizes key characteristics of the study population allowing readers to assess the generalizability of the findings. Subsequent tables present details of associations/comparisons between variables, often crude findings followed by models adjusted for confounding factors.[ 11 ] Other tables include tables of lists, tables of inclusion/exclusion criteria for review, and summary of characteristics/results of study (systematic reviews) and characteristics of participants in narrative format (qualitative studies).[ 11 ]

A good table draws attention to the data and not the table itself; the reader should be able to express an opinion about results just by looking at it.[ 12 ] It should neither be too long nor wide; designing more rows than columns makes it easier to scan in portrait orientation.[ 9 , 11 ] JPGM guidelines permit a maximum of 10 columns and 25 rows in a table.[ 10 ] They are generally created from word documents as insert table and constructed before writing the content in text.[ 9 ] Most tables consist of five basic components: title, rows, columns, data fields, and footnotes. The title/legend should be concise but sufficiently informative.[ 13 ] The first column generally lists the independent variables in rows while subsequent columns present the dependent data. Column and row headings should include group sizes and measurement units (preferably an international system of units). Stubs (row headings) on the left side of a table describe row contents and should also list numerical definitions for the data i.e. the median ± SD (normal distribution), median with IQR (non-normally distributed data), or percentages (dichotomous data)[ 9 , 14 ] Use fewest decimal points necessary for accurate reporting of data.[ 14 ] Columns should present statistical analysis and significance ( P values) to highlight key findings.[ 14 ] Use well-labeled columns with the same format throughout (mean or percentiles).[ 3 ]

Each cell (data field) should contain only one numerical value and never be blank [use dash (-), ellipsis (…) or designate as “no data”]. Abbreviations should be limited; use abbreviations like “NA” very cautiously as it may be misinterpreted as not applicable/not available/not analyzed.[ 13 ] Combine tables when single variable is cross-tabulated or divide tables with too much of data.[ 7 ]

Footnotes should be brief, define abbreviations, statistical results ( P values and level of significance) and explain restrictions/discrepancies in the data. Footnotes should be ordered starting with the title of the table and working downwards from left to right. Symbols applying to the entire table should be in the title and those applying to the entire row/column should be after the row/column heading.[ 13 ] Most journals prefer superscripted letters instead of numbers. Symbols recommended by JPGM for footnotes (in order) are: *, †, ‡, §, ||, ¶, **, ††, ‡‡.[ 10 ]

Alignment and formatting: All text should be aligned to the left and numbers to the right.[ 7 ] Data fields with decimal points, hyphens/slashes, plus/minus symbols, or parentheses are aligned to these elements. For stubs continuing onto a second line, the corresponding data field should be aligned to the top line of the stub.[ 13 ] Tables can be made more meaningful, by converting data to ratios/percentages and sorting data as per the significance of variables, generally from left to right and top to bottom.[ 7 ] Data included in tables should conform with those in the main text, and percentages in rows and columns should sum up accurately.

Most journals have specific instructions for gridlines – only the top and bottom horizontal lines are used, with no vertical lines as columns are inherently aligned.[ 7 ] If tables are used from other publications, copyright permission should be obtained to reproduce them, and they should be appropriately referenced in the legend. There may be limitations as to the number of tables allowed depending on the Journal instructions and the type of article. Some Do's and Don'ts while creating tables are summarized in Table 1 .[ 8 , 9 , 12 , 14 , 15 ] Also, a representative table already published in the JPGM earlier has been reproduced herewith for better understanding [ Figure 1 ].

Use of figures

Figures are powerful communication tools that display patterns that are not visualized in the text or the tables. They can achieve a high educational impact by sustaining readers' interest and helping them understand trends, patterns, relationships among concepts and sequence of events.[ 3 ] Like tables, figures should be complete and self-explanatory. They should be designed thoughtfully, be relevant and be of good quality.[ 5 ] There may be limitations as to the number of figures allowed depending on the Journal instructions and type of the article. Figures can be statistical (graphs- as explained later) and non-statistical (clinical images, photographs, diagrams, illustrations and textual figures).[ 16 ] Non-statistical figures present visual information without data.[ 16 ] Clinical images and photographs [ultrasonograms, X-rays, computed tomography (CT) scans, magnetic resonance (MR) scans, images of patients, intraoperative photographs, tissue samples or microscopy findings] provide convincing and substantial information through illustrative examples from specific individuals and engage audiences, especially clinical professionals.[ 5 ] Illustrations help to explain structures, mechanisms, and relationships. Diagrams like “flowcharts”, “algorithms”, “pedigree charts”, and “maps” display complex relationships while “textual figures” describe steps of a procedure or summarize guidelines.

Structure: Figure legends (maximum of 40 words excluding credits) should be double-spaced and identified by consecutive Arabic numerals with the corresponding citation in the text. They reflect the data within, and consist of a brief title, experimental/statistical details, definitions of symbols/line or bar patterns and abbreviations/annotations.[ 15 ] Labels, numbers, and symbols should be clear, consistent, of uniform size and large enough to be legible after fitting figures to publication size.[ 15 ] Symbols, arrows, numbers, or letters used to identify parts of illustrations should be clearly identified, properly sized, placed, and explained in the legend. In case of photomicrographs, contrast the symbols/letters or arrows with background, and describe the internal scale (magnification) and method of staining.[ 10 ] If the figure has several parts (”collage”), they should be presented in order from left to right and top to bottom; this should be similarly followed for their description in the legend with labeling done as a, b, c, d, etc.[ 14 ]

Photos should have a minimum resolution of 300 dpi before digital manipulation, the acceptable formats for pictures/photos and figures in various journals being pdf, doc, ppt, jpg, gif, and tiff. Publication of color images may be chargeable which should be checked beforehand.[ 9 ] Often the print version of journal may present black and white images, with color images used in the online version.

Line diagrams: Black and white art with no shading often illustrates content better than a photograph, especially in the case of body anatomy or surgical techniques.[ 9 ] Their line weight should be consistent and not less than 0.25 pt. If scanned, they should be submitted as a tiff/jpeg image of at least 600 dpi and a width of 15 cm/6 inches.[ 14 ] Creating line diagrams may involve expensive professional help with issues of exclusive rights. Simple drawings can be scanned in a conventional office scanner at a minimum resolution of 600 dpi.[ 9 ] Drawings in shades of grey require a resolution of 1200 dpi or more, usually unavailable in regular office scanners.[ 9 ]

X-rays , which are photographic images, often lack good contrast, a problem magnified if the image must be enlarged. The quality of radiographs can be improved using Adobe Photoshop.[ 17 ] Figure captions in radiology should be utilized correctly and mention the modality, imaging plane and relevant technical information for images e.g. projection name on an x-ray, plane of a cross-sectional image, window setting of a CT section, and sequence name of an MR image.[ 17 ]

One may need to crop images to focus on the point of interest and maintain patient anonymity. Editing is usually done in tiff file format in software designed for image editing. Adjustments in brightness/contrast/color balance may help if raw image is not clear; however, it should not alter the meaning.[ 5 ] Colors should be easy to see (avoid yellow) and backgrounds should preferably be white. The tint should be no lower than 15%.[ 14 ] However, all digital modifications or enhancements of photographic images should be documented, step by step and the original raw files of unedited photographs/images should be available as supplementary files .[ 5 ]

Minimum resolution and design: Figures should be of high quality and resolution such that when final images are zoomed to 1600%, they should not blur or pixelate.[ 5 ] In case of reprints, care should be taken about picture quality.[ 3 ] JPGM requires a minimum resolution of 300 dpi or 1800 × 1600 pixels in TIFF format for digital images. Uploaded images should be within 4 MB in size and JPEG format. The JPGM also reserves the right to crop, rotate, reduce, or enlarge the photographs to an acceptable size.[ 10 ] One can use tools while creating figures and exporting data in another software; a few examples of open-source are Matplotlib (python plotting library), R, Inkscape, TikZ, PGF, GIMP, ImageMagick, D3 (Data-Driven-Documents), Cytoscape and Circos.[ 18 ]

Anonymity and Copyright: In the case of images, all unessential patient information or identifiers should be removed (masking or blurring only the eyes is no longer considered sufficient).[ 19 ] It is the author's responsibility to obtain written permission from the patient to use the photograph for educational purposes (whether the subject is identifiable or not) and archive it properly.[ 10 ] For images or descriptions that identify the patient, a statement about obtaining informed patient consent should be specified in the manuscript.[ 10 ] For figures published elsewhere, the original source should be acknowledged (via credit line in the figure legend) and the author should submit permission from copyright holder (usually the publisher) to reproduce the material before his/her manuscript is accepted.[ 3 , 19 ] Representative figure/s already published in the JPGM earlier have been reproduced herewith as an example [ Figure 2 ].

Use of graphs

Graphs allow the reader to visualize and compare data by highlighting patterns and relationships such as changes over time, frequency distribution, correlation, and relative share.[ 7 ] One should be precise with data values and presentation in graphs to avoid misinterpretation. Graphs can be created from data using the same software used for statistical analysis or by special programs. Depending on the results, data can be depicted in several different formats, such as line graphs, bar charts, data plots, maps, and pie charts.

What to use and when: The graphical format (bar graph, line graph, scatter lot, dot plot) can be decided based on the type of relationship to be shown. For example, line graphs demonstrate trends, bar graphs show magnitudes and pie charts show proportions.[ 9 , 16 ] The preferred graph also depends on the representative value of data – absolute value/fraction/average/median.[ 20 ] Graphs should accurately present findings, scale should start at zero and the axes should not be altered to make data meaningful.[ 15 ] Pie charts and 3D graphs are generally not recommended.[ 5 ] Table 2 summarizes different graphical formats with their brief description and uses.[ 5 , 6 , 7 , 8 , 20 , 21 ]

How to draw/construct: Most statistical programs create graphs with statistical computations. Special programs such as Prism and Sigmaplot can also be used.[ 14 ] Different formats can be visualized in the statistical program, and the one that best depicts the data can be chosen.[ 3 ] Actual numbers from which graphs are drawn should be provided.[ 10 ] Components of graphs include axes, labels, scales, tick/reference marks, symbols, and legends.[ 21 ] Independent variables are plotted on the horizontal axis while dependent variables on vertical axis.[ 4 ] Axis labels should be short, clear and indicate measurement variable/result, units, and number of group subjects (if any).[ 7 ] The axis scale should be proportional to data range so that visual data is not exaggerated/missed and minimum space is wasted.[ 20 ] Length of axes should be visually balanced (ratio of X to Y axis should be 1.0 to 1.3).[ 21 ] Provide explanations if the axis starts from non-zero values, is non-linear (logarithmic/exponential/rate) or scales before and after a break are different.[ 7 , 20 ] Symbols/lines/curves inside the two axes should be the most prominent features, wording in axes labels next prominent and axes and tick mark (outside of axes) least prominent.[ 21 ] Numbers and marks should be large enough to be legible even when compressed for print.[ 5 ] Symbols should be uniform and effectively used to designate important strata in figures. All graphs should be consistent in style and formatting. Footnotes should indicate P values (with appropriate statistical test) and discrepancies in data/items.[ 8 ]

A clear and concise legend (inside/outside) should describe the variables in the graph. It should also include values of lines, symbols and diagrams, abbreviations, acronyms as well as statistical tests, their levels of significance, sampling size, stains used for analysis, and magnification rate.[ 4 , 20 ] Annotations can highlight specific values/statistically significant differences in graphs.[ 20 ]

All unnecessary background lines (such as gridlines) are distracting and should be removed. The background should be the palest possible (preferably white) for the highest contrast and readability. Remove all default pre-styling formats and avoid 3D effects.[ 7 ] Data presentation can be intensified by eliminating clutter and refined in a vector graph editing program (without altering the position of marks representing data).[ 5 ] It is essential to minimize meaningless noise and maximize meaningful signals.[ 5 ]

Algorithms (combination of graph and table) are an excellent aid to demonstrate a decision tree. However, they can be difficult to construct due to decisions based on more than one variable. This presents clinical and technical difficulties in presenting all possible variations in a diagnosis or therapeutic decision.[ 9 ]

A representative graph and chart already published in the JPGM earlier has been reproduced herewith as an example [Figures ​ [Figures3 3 and ​ and4 4 ].

An external file that holds a picture, illustration, etc.
Object name is JPGM-69-125-g003.jpg

A representative graph already published in the JPGM earlier (reproduced from Bhatia S, Tullu MS, Kannan S, Gogtay NJ, Thatte UM, Lahiri KR. An unusual recurrence of antitubercular drug induced hepatotoxicity in a child. J Postgrad Med 2011;57:147-152 )

An external file that holds a picture, illustration, etc.
Object name is JPGM-69-125-g004.jpg

A representative chart already published in the JPGM earlier (reproduced from Agarwal S, Divecha C, Tullu MS, Deshmukh CT. A rare case of nephrotic syndrome: ‘Nailed’ the diagnosis. J Postgrad Med 2014;60:179-82 )

Use of supplementary materials

Supplementary materials refer to additional content (tables/graphs/appendices/videos/audios) that are not published in the main article. Scientific publications often come with strict word limits. Additional text or data which lengthens the print version can be accessed via digital supplementary files. Besides overcoming word restrictions, supplementary material provides additional information that is not essential in the main manuscript, but enhances understanding of research. They are available to interested readers to explore or replicate (methods/formulae/statistical models/algorithms/intervention pathways) the study for secondary research or teaching.[ 22 ] Thus, they serve as an online companion, complementing the main text. The most common supplementary files are tables and figures. Some instances of their use in various sections are as follows.[ 23 ]

In introduction: Table of summary of literature from various studies, detailed description of research topic, illustrations of concepts discussed, and glossaries of terms used.

In methodology: Participant details (sources, inclusion/exclusion lists, demography), instrumentation of constructs and variables, data collection techniques (survey questionnaires, participant forms), and data analysis techniques (coding sheets for content analysis, checklists) mathematical formulae used in calculations, data collection schedule.

In results and discussion: Additional results (often tables and figures), detailed analysis of limitations of the study or possible alternative explanations, programming code.

Other material includes references for supplementary files, translations, errata, audio, and video files.[ 23 ]

Examples of video/audio files include echocardiography recordings and ultrasound images. Specific information on the preparation of audio and video clips is available in the author guidelines. Video formats usually used are MPEG-4, QuickTime, or Windows media video. Audio supplements include WAV or MP3 format. Video size should be reduced to <10 MB and clips limited to 15–25 s. The resolution should be optimized by using video frame dimensions of 480 × 360 pixels and 640 × 480 pixels.[ 14 ]

However, supplemental material is available only in the online version- limiting immediate access to many readers.[ 5 ] Moreover, only readers with a strong interest in the research topic will access the online supplementary material.[ 5 ] The information in these files is often very extensive and not integrated with the main text appropriately, thus finding and extracting specific points from a supplement can be tedious.[ 24 ]

The utility of supplementary material varies as per the audience – additional tables and figures are more useful to readers, information about study protocol/data collection to peer reviewers, and completed checklists to journal editors. Due to the lack of guidance from journals (to both authors and reviewers) regarding its necessity or accuracy and due to the extensive nature of the files, supplementary material is rarely read/reviewed (though all the supplementary files are to be uploaded for peer-review with the main article files at the time of submission).[ 24 ] This increases the likelihood of missing errors in methods/analysis (submitted as supplementary files), thus placing the scientific accuracy and validity of the published research at risk.[ 24 ] Moreover, the availability of raw data to third parties via supplementary files raises concerns about security and data permanence.[ 22 ] The supplementary files often describe methods crucial to published research but are not included in references, thus many researchers remain uncited/unrecognized. Citations within supplementary material are also not appropriately tracked by citation indices. This can be overcome by direct hyperlinking sections of supplementary materials with the main manuscript.[ 24 ] Thus, supplementary data can be an asset if used thoughtfully; however, its indiscriminate use can hinder its actual purpose and be detrimental to the peer review process.

Concluding remarks

Tables, figures, graphs, and supplementary materials are vital tools which, when skillfully used, make complex data simple and clear to understand, within journal word restrictions. They engage and sustain interest and provide a brief visual summary narrative of study hypothesis- saving time and energy for readers, reviewers, and editors. They should be self-explanatory, complement the text and provide value to the paper. Producing clear, informative non-textual elements increases the chances of papers being published and read. Thus, the author should plan these elements early during the process of data collection/analysis and not as an afterthought. The author should have a good understanding of the types of data presentations and choose the simplest format that delivers his message best. They should be adapted to the journal's instructions to display and summarize essential content, without focusing too much on making it attractive or showcase one's technical expertise. Titles should be clear and simple, data should be consistent with results, and footnotes should be used effectively. Copyrights permissions, whenever necessary, should be obtained in advance and preserved appropriately.

graph image research paper

MLA Style: Writing & Citation

  • Advertisements
  • Books, eBooks & Pamphlets
  • Class Notes & Presentations
  • Encyclopedias & Dictionaries
  • Government Documents
  • Images, Charts, Graphs, Maps & Tables

Is It a Figure or a Table?

Figure (photo, image, graph, or chart) inserted into a research paper, image reproduced from google maps, table inserted into a research paper.

  • Interviews and Emails (Personal Communications)
  • Journal Articles
  • Magazine Articles
  • Newspaper Articles
  • Religious Texts
  • Social Media
  • Videos & DVDs
  • When Information Is Missing
  • Works Quoted in Another Source
  • MLA Writing Style
  • Unknown or Multiple Authors
  • Paraphrasing
  • Long Quotes
  • Repeated Sources
  • In-Text Citation For More Than One Source
  • Works Cited List & Sample Paper
  • Annotated Bibliography

Need Help? Virtual Chat with a Librarian, 24/7

You Can Also:

There are two types of material you can insert into your assignment: figures and tables.

A figure is a photo, image, map, graph, or chart.

A table is a table of information.

For a visual example of each, see the figure and table to the right.

Still need help?

For more information on citing figures in MLA, see Purdue OWL .

The caption for a figure begins with a description of the figure, then the complete Works Cited list citation for the source the figure was found in. For example, if it was found on a website, cite the website. If it was in a magazine article, cite the magazine article.

Label your figures starting at 1.

Information about the figure (the caption) is placed directly below the image in your assignment.

If the image appears in your paper the full citation appears underneath the image (as shown below) and does not need to be included in the Works Cited List. If you are referring to an image but not including it in your paper you must provide an in-text citation and include an entry in the Works Cited List.

Fig. 1. Man exercising from: Green, Annie. "Yoga: Stretching Out." Sports Digest,  8 May 2006, p. 22. 

Yellow printed skirt by designer Annakiki. Faces on skirt.

Fig. 2. Annakiki skirt from: Cheung, Pauline. "Short Skirt S/S/ 15 China Womenswear Commercial Update." WGSN.

Note: This is a Seneca Libraries recommendation.

Fig. X. Description of the figure from: "City, Province." Map,  Google Maps.  Accessed Access Date.

map of Newnham Campus, 404 and Finch

Fig. 1. Map of Newnham Campus, Seneca College from: "Toronto, Ontario." Map,  Google Maps.  Accessed 23 Apr. 2014. 

Source: Citation for source table was found in.

Above the table, label it beginning at Table 1, and add a description of what information is contained in the table.

The caption for a table begins with the word Source, then the complete Works Cited list citation for the source the table was found in. For example, if it was found on a website, cite the website. If it was in a journal article, cite the journal article.

Information about the table (the caption) is placed directly below the table in your assignment.

If the table is not cited in the text of your assignment, you do not need to include it in the Works Cited list.

Variables in determining victims and aggressors

Source: Mohr, Andrea. "Family Variables Associated With Peer Victimization." Swiss Journal of Psychology,  vol .  65, no. 2, 2006, pp. 107-116, Psychology Collection , doi: http://dx.doi.org/10.1024/1421-0185.65.2.107.

Reproducing Figures and Tables

Reproducing happens when you copy or recreate a photo, image, chart, graph, or table that is not your original creation. If you reproduce one of these works in your assignment, you must create a note (or "caption") underneath the photo, image, chart, graph, or table to show where you found it. If you do not refer to it anywhere else in your assignment, you do not have to include the citation for this source in a Works Cited list.

Citing Information From a Photo, Image, Chart, Graph, or Table

If you refer to information from the photo, image, chart, graph, or table but do not reproduce it in your paper, create a citation both in-text and on your Works Cited list. 

If the information is part of another format, for example a book, magazine article, encyclopedia, etc., cite the work it came from. For example if information came from a table in an article in National Geographic magazine, you would cite the entire magazine article.

Figure Numbers

The word figure should be abbreviated to Fig. Each figure should be assigned a figure number, starting with number 1 for the first figure used in the assignment. E.g., Fig. 1.

Images may not have a set title. If this is the case give a description of the image where you would normally put the title.

  • << Previous: Government Documents
  • Next: Interviews and Emails (Personal Communications) >>
  • Last Updated: Apr 11, 2024 8:31 AM
  • URL: https://libguides.gvltec.edu/MLAcitation

How to Extract Data from Graphs or Images in Scientific Papers?

How to Extract Data from Graphs or Images in Scientific Papers?

Scientific results or analyses are often visualized in journals in the form of graphs. As researchers, we are interested in studying these published visuals and want to further analyze and improve the results. However, the associated raw data with the published graphs are not always available in the papers. Sometimes, we might want to correlate our revised results with historical plots, and there is no numerical data published along with reports.

It brings us to the point where we need to somehow reverse-engineer graphs and extract the data from them in the numerical format. This process of reverse engineering is the digitization of data. In other words, we are retrieving the information embedded inside the graphics.

Extracting Data from Graphs or Images using PlotDigitizer

Extracting data from published graphs or images is not a simple process and could consume a significant amount of time without the right tools. PlotDigitizer is one such professional tool that is capable of extracting data from graphs.

PlotDigitizer is freemium software; the online version is free with limited functionality, while the offline version is paid. It is available for all operating systems.

How to use PlotDigitizer to extract data from Graphs?

The first step in the digitization is to get the graph in the image file format. If the visual is in document formats, like PDF, you can capture the screenshot of the graph. Or, if the scientific visual is in the physical format, you can scan the document and then crop out all unnecessary portions of the image.

PlotDigitizer has an in-built image editing tool kit with which you can crop, flip, rotate, or scale the image as required. For example, if the image is slightly tilted, you can use the rotate tool to align the image appropriately.

Besides standard XY, PlotDigitizer supports several types of graphs, e.g., polar, ternary , bar, column, pie/doughnut, map. Moreover, you can also calculate the distance, angle, and area on the image.

Here are simple steps for extracting data from XY graphs:

Step 1: Find the graph in an image format

The first, foremost step is to get a graph in image file formats, such as PNG, JPG, JPEG, SVG, GIF, TIFF.

Here, we have taken the following graphs as a sample image.

The curve below represents the solubility of oxygen (expressed on the y-axis) in water with temperature (expressed on the x-axis). The image is taken from ResearchGate .

The solubility of oxygen with temperature

Step 2: Upload the image to PlotDigitizer

Upload or drag-drop the image in PlotDigitizer.

Step 3: Adjusting the image

We can adjust the image to make it fit properly. For example, crop out the unwanted parts or rotate the image to make it align with the screen, or scale it to increase/decrease the resolution. In our case, the image is perfect; there is no need for any modifications.

Step 4: Calibrating the axes

Once the image is uploaded, the four points will appear around the center of the view. These are nothing but the x and y coordinates of the individual axes. Drag these points to the extremes of the plot. Here x1 and x2 are dragged, placed at 0 and 90, while y1 and y2 are dragged, placed at 0 and 16. You can use the zoom panel to increase the accuracy while calibrating the axes.

Both axes are linear scales. Now, enter the values of the points x1, x2, y1, and y2, i.e., 0, 90, 0, and 16.

Note: We have placed the points to the extremes to improve the accuracy, but it is not necessary. Also, the points do not necessarily have to be set on the axes. You can drop on them in-between the plot.

Now, the entire graph is calibrated. You can see the coordinate for the mouse’s cursor below the zoom panel. If you want, you hide x1, x2, y1, and y2 by clicking on the lock icon on the top of the window.

Step 5: Extracting the data points from the plot

Finally, we can extract data points from the curve. Mark various points on the curve and their respective values are recorded on the side panel.

You can collect as many points as you want. Also, instead of manual extraction, you can use the automatic extraction feature to collect a large number of data points.

Step 6: Exporting the extracted data points

In the end, we can export the extracted points into other formats, like CSV, MS Excel, JSON, MatLab, Array.

Here, we have discussed the XY graph, but for every other type of graph, the process of digitization is very similar.

On Graph Extraction from Image Data

  • Conference paper
  • Cite this conference paper

Book cover

  • Andreas Holzinger 23 ,
  • Bernd Malle 23 &
  • Nicola Giuliani 23  

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8609))

Included in the following conference series:

  • International Conference on Brain Informatics and Health

1838 Accesses

5 Citations

Hot topics in knowledge discovery and interactive data mining from natural images include the application of topological methods and machine learning algorithms. For any such approach one needs at first a relevant and robust digital content representation from the image data. However, traditional pixel-based image analysis techniques do not effectively extract, hence represent the content. A very promising approach is to extract graphs from images, which is not an easy task. In this paper we present a novel approach for knowledge discovery by extracting graph structures from natural image data. For this purpose, we created a framework built upon modern Web technologies, utilizing HTML canvas and pure Javascript inside a Web-browser, which is a very promising engineering approach. Following on a short description of some popular image classification and segmentation methodologies, we outline a specific data processing pipeline suitable for carrying out future scientific research. A demonstration of our implementation, compared to the results of a traditional watershed transformation performed in Matlab showed very promising results in both quality and runtime, despite some open problems. Finally, we provide a short discussion of a few open problems and outline some of our future research routes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics state-of-the-art, future challenges and research directions. BMC Bioinformatics 15(suppl. 6), S1 (2014)

Google Scholar  

Bunke, H.: Graph-based tools for data mining and machine learning. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 7–19. Springer, Heidelberg (2003)

Chapter   Google Scholar  

Strogatz, S.: Exploring complex networks. Nature 410, 268–276 (2001)

Article   Google Scholar  

Dehmer, M., Emmert-Streib, F., Mehler, A.: Towards an Information Theory of Complex Networks: Statistical Methods and Applications. Birkhaeuser, Boston (2011)

Book   Google Scholar  

Holzinger, A.: On topological data mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 331–356. Springer, Heidelberg (2014)

Holzinger, K., Palade, V., Rabadan, R., Holzinger, A.: Darwin or lamarck? future challenges in evolutionary algorithms for knowledge discovery and data mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 35–56. Springer, Heidelberg (2014)

Holzinger, A., Blanchard, D., Bloice, M., Holzinger, K., Palade, V., Rabadan, R.: Darwin, lamarck, or baldwin: Applying evolutionary algorithms to machine learning techniques. In: The 2014 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2014). IEEE (in print, 2014)

Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)

Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual data mining: Effective exploration of the biological universe. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 19–33. Springer, Heidelberg (2014)

Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47–97 (2002)

Article   MATH   MathSciNet   Google Scholar  

Makrogiannis, S., Economou, G., Fotopoulos, S., Bourbakis, N.G.: Segmentation of color images using multiscale clustering and graph theoretic region synthesis. IEEE Transactions on Systems Man and Cybernetics Part A: Systems and Humans 35, 224–238 (2005)

Kropatsch, W.G., Burge, M., Glantz, R.: Graphs in image analysis. In: Kropatsch, W.G., Bischof, H. (eds.) Digital Image Analysis, pp. 179–197. Springer, New York (2001)

Caselles, V., Coll, B., Morel, J.M.: Topographic maps and local contrast changes in natural images. International Journal of Computer Vision 33, 5–27 (1999)

Ahammer, H., Kröpfl, J.M., Hackl, C., Sedivy, R.: Image statistics and data mining of anal intraepithelial neoplasia. Pattern Recognition Letters 29, 2189–2196 (2008)

Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 583–598 (1991)

Straehle, C., Peter, S., Köthe, U., Hamprecht, F.A.: K-smallest spanning tree segmentations. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 375–384. Springer, Heidelberg (2013)

Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. International Journal of Computer Vision 59, 167–181 (2004)

Lee, Y.J., Grauman, K.: Object-graphs for context-aware visual category discovery. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 346–358 (2012)

Meyer, F.: Topographic distance and watershed lines. Signal Processing 38, 113–125 (1994)

Article   MATH   Google Scholar  

Holzinger, A., Malle, B., Bloice, M., Wiltgen, M., Ferri, M., Stanganelli, I., Hofmann-Wellenhof, R.: On the generation of point cloud data sets: Step one in the knowledge discovery process. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 57–80. Springer, Heidelberg (2014)

Preuß, M., Dehmer, M., Pickl, S., Holzinger, A.: On terrain coverage optimization by using a network approach for universal graph-based data mining and knowledge discovery. In: Slezak, D., Peters, J.F., Ah-Hwee, T., Schwabe, L. (eds.) Brain Informatics and Health. LNCS (LNAI), vol. 8609, pp. 569–578. Springer, Heidelberg (2014)

Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. Proceedings of the IEEE 95, 215–233 (2007)

Wagner, I., Bruckstein, A.: From ants to a(ge)nts: A special issue on ant-robotics. Annals of Mathematics and Artificial Intelligence 31, 1–5 (2001)

Download references

Author information

Authors and affiliations.

Research Unit Human-Computer Interaction, Institute for Medical Informatics, Statistics & Documentation, Medical University Graz, Austria

Andreas Holzinger, Bernd Malle & Nicola Giuliani

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

University of Warsaw and Infobright Inc., Poland

Dominik Ślȩzak

School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore

Ah-Hwee Tan

Computational Intelligence Laboratory, ECE Department, University of Manitoba, R3T 5V6, Winnipeg, MB, Canada

James F. Peters

Institute of Computer Science, University of Rostock, Rostock, Germany

Lars Schwabe

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper.

Holzinger, A., Malle, B., Giuliani, N. (2014). On Graph Extraction from Image Data. In: Ślȩzak, D., Tan, AH., Peters, J.F., Schwabe, L. (eds) Brain Informatics and Health. BIH 2014. Lecture Notes in Computer Science(), vol 8609. Springer, Cham. https://doi.org/10.1007/978-3-319-09891-3_50

Download citation

DOI : https://doi.org/10.1007/978-3-319-09891-3_50

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-09890-6

Online ISBN : 978-3-319-09891-3

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Affiliate Program

Wordvice

  • UNITED STATES
  • 台灣 (TAIWAN)
  • TÜRKIYE (TURKEY)
  • Academic Editing Services
  • - Research Paper
  • - Journal Manuscript
  • - Dissertation
  • - College & University Assignments
  • Admissions Editing Services
  • - Application Essay
  • - Personal Statement
  • - Recommendation Letter
  • - Cover Letter
  • - CV/Resume
  • Business Editing Services
  • - Business Documents
  • - Report & Brochure
  • - Website & Blog
  • Writer Editing Services
  • - Script & Screenplay
  • Our Editors
  • Client Reviews
  • Editing & Proofreading Prices
  • Wordvice Points
  • Partner Discount
  • Plagiarism Checker
  • APA Citation Generator
  • MLA Citation Generator
  • Chicago Citation Generator
  • Vancouver Citation Generator
  • - APA Style
  • - MLA Style
  • - Chicago Style
  • - Vancouver Style
  • Writing & Editing Guide
  • Academic Resources
  • Admissions Resources

How to Use Tables & Graphs in a Research Paper

graph image research paper

It might not seem very relevant to the story and outcome of your study, but how you visually present your experimental or statistical results can play an important role during the review and publication process of your article. A presentation that is in line with the overall logical flow of your story helps you guide the reader effectively from your introduction to your conclusion. 

If your results (and the way you organize and present them) don’t follow the story you outlined in the beginning, then you might confuse the reader and they might end up doubting the validity of your research, which can increase the chance of your manuscript being rejected at an early stage. This article illustrates the options you have when organizing and writing your results and will help you make the best choice for presenting your study data in a research paper.

Why does data visualization matter?

Your data and the results of your analysis are the core of your study. Of course, you need to put your findings and what you think your findings mean into words in the text of your article. But you also need to present the same information visually, in the results section of your manuscript, so that the reader can follow and verify that they agree with your observations and conclusions. 

The way you visualize your data can either help the reader to comprehend quickly and identify the patterns you describe and the predictions you make, or it can leave them wondering what you are trying to say or whether your claims are supported by evidence. Different types of data therefore need to be presented in different ways, and whatever way you choose needs to be in line with your story. 

Another thing to keep in mind is that many journals have specific rules or limitations (e.g., how many tables and graphs you are allowed to include, what kind of data needs to go on what kind of graph) and specific instructions on how to generate and format data tables and graphs (e.g., maximum number of subpanels, length and detail level of tables). In the following, we will go into the main points that you need to consider when organizing your data and writing your result section .

Table of Contents:

Types of data , when to use data tables .

  • When to Use Data Graphs 

Common Types of Graphs in Research Papers 

Journal guidelines: what to consider before submission.

Depending on the aim of your research and the methods and procedures you use, your data can be quantitative or qualitative. Quantitative data, whether objective (e.g., size measurements) or subjective (e.g., rating one’s own happiness on a scale), is what is usually collected in experimental research. Quantitative data are expressed in numbers and analyzed with the most common statistical methods. Qualitative data, on the other hand, can consist of case studies or historical documents, or it can be collected through surveys and interviews. Qualitative data are expressed in words and needs to be categorized and interpreted to yield meaningful outcomes. 

Quantitative data example: Height differences between two groups of participants Qualitative data example: Subjective feedback on the food quality in the work cafeteria

Depending on what kind of data you have collected and what story you want to tell with it, you have to find the best way of organizing and visualizing your results.

When you want to show the reader in detail how your independent and dependent variables interact, then a table (with data arranged in columns and rows) is your best choice. In a table, readers can look up exact values, compare those values between pairs or groups of related measurements (e.g., growth rates or outcomes of a medical procedure over several years), look at ranges and intervals, and select specific factors to search for patterns. 

Tables are not restrained to a specific type of data or measurement. Since tables really need to be read, they activate the verbal system. This requires focus and some time (depending on how much data you are presenting), but it gives the reader the freedom to explore the data according to their own interest. Depending on your audience, this might be exactly what your readers want. If you explain and discuss all the variables that your table lists in detail in your manuscript text, then you definitely need to give the reader the chance to look at the details for themselves and follow your arguments. If your analysis only consists of simple t-tests to assess differences between two groups, you can report these results in the text (in this case: mean, standard deviation, t-statistic, and p-value), and do not necessarily need to include a table that simply states the same numbers again. If you did extensive analyses but focus on only part of that data (and clearly explain why, so that the reader does not think you forgot to talk about the rest), then a graph that illustrates and emphasizes the specific result or relationship that you consider the main point of your story might be a better choice.

graph in research paper

When to Use Data Graphs

Graphs are a visual display of information and show the overall shape of your results rather than the details. If used correctly, a visual representation helps your (or your reader’s) brain to quickly understand large amounts of data and spot patterns, trends, and exceptions or outliers. Graphs also make it easier to illustrate relationships between entire data sets. This is why, when you analyze your results, you usually don’t just look at the numbers and the statistical values of your tests, but also at histograms, box plots, and distribution plots, to quickly get an overview of what is going on in your data.

Line graphs

When you want to illustrate a change over a continuous range or time, a line graph is your best choice. Changes in different groups or samples over the same range or time can be shown by lines of different colors or with different symbols.

Example: Let’s collapse across the different food types and look at the growth of our four fish species over time.

line graph showing growth of aquarium fish over one month

You should use a bar graph when your data is not continuous but divided into categories that are not necessarily connected, such as different samples, methods, or setups. In our example, the different fish types or the different types of food are such non-continuous categories.

Example: Let’s collapse across the food types again and also across time, and only compare the overall weight increase of our four fish types at the end of the feeding period.

bar graph in reserach paper showing increase in weight of different fish species over one month

Scatter plots

Scatter plots can be used to illustrate the relationship between two variables — but note that both have to be continuous. The following example displays “fish length” as an additional variable–none of the variables in our table above (fish type, fish food, time) are continuous, and they can therefore not be used for this kind of graph. 

Scatter plot in research paper showing growth of aquarium fish over time (plotting weight versus length)

As you see, these example graphs all contain less data than the table above, but they lead the reader to exactly the key point of your results or the finding you want to emphasize. If you let your readers search for these observations in a big table full of details that are not necessarily relevant to the claims you want to make, you can create unnecessary confusion. Most journals allow you to provide bigger datasets as supplementary information, and some even require you to upload all your raw data at submission. When you write up your manuscript, however, matching the data presentation to the storyline is more important than throwing everything you have at the reader. 

Don’t forget that every graph needs to have clear x and y axis labels , a title that summarizes what is shown above the figure, and a descriptive legend/caption below. Since your caption needs to stand alone and the reader needs to be able to understand it without looking at the text, you need to explain what you measured/tested and spell out all labels and abbreviations you use in any of your graphs once more in the caption (even if you think the reader “should” remember everything by now, make it easy for them and guide them through your results once more). Have a look at this article if you need help on how to write strong and effective figure legends .

Even if you have thought about the data you have, the story you want to tell, and how to guide the reader most effectively through your results, you need to check whether the journal you plan to submit to has specific guidelines and limitations when it comes to tables and graphs. Some journals allow you to submit any tables and graphs initially (as long as tables are editable (for example in Word format, not an image) and graphs of high enough resolution. 

Some others, however, have very specific instructions even at the submission stage, and almost all journals will ask you to follow their formatting guidelines once your manuscript is accepted. The closer your figures are already to those guidelines, the faster your article can be published. This PLOS One Figure Preparation Checklist is a good example of how extensive these instructions can be – don’t wait until the last minute to realize that you have to completely reorganize your results because your target journal does not accept tables above a certain length or graphs with more than 4 panels per figure. 

Some things you should always pay attention to (and look at already published articles in the same journal if you are unsure or if the author instructions seem confusing) are the following:

  • How many tables and graphs are you allowed to include?
  • What file formats are you allowed to submit?
  • Are there specific rules on resolution/dimension/file size?
  • Should your figure files be uploaded separately or placed into the text?
  • If figures are uploaded separately, do the files have to be named in a specific way?
  • Are there rules on what fonts to use or to avoid and how to label subpanels?
  • Are you allowed to use color? If not, make sure your data sets are distinguishable.

If you are dealing with digital image data, then it might also be a good idea to familiarize yourself with the difference between “adjusting” for clarity and visibility and image manipulation, which constitutes scientific misconduct .  And to fully prepare your research paper for publication before submitting it, be sure to receive proofreading services , including journal manuscript editing and research paper editing , from Wordvice’s professional academic editors .

  • News & Media
  • Chemical Biology
  • Computational Biology
  • Ecosystem Science
  • Cancer Biology
  • Exposure Science & Pathogen Biology
  • Metabolic Inflammatory Diseases
  • Advanced Metabolomics
  • Mass Spectrometry-Based Measurement Technologies
  • Spatial and Single-Cell Proteomics
  • Structural Biology
  • Biofuels & Bioproducts
  • Human Microbiome
  • Soil Microbiome
  • Synthetic Biology
  • Computational Chemistry
  • Chemical Separations
  • Chemical Physics
  • Atmospheric Aerosols
  • Human-Earth System Interactions
  • Modeling Earth Systems
  • Coastal Science
  • Plant Science
  • Subsurface Science
  • Terrestrial Aquatics
  • Materials in Extreme Environments
  • Precision Materials by Design
  • Science of Interfaces
  • Friction Stir Welding & Processing
  • Dark Matter
  • Flavor Physics
  • Fusion Energy Science
  • Neutrino Physics
  • Quantum Information Sciences
  • Emergency Response
  • AGM Program
  • Tools and Capabilities
  • Grid Architecture
  • Grid Cybersecurity
  • Grid Energy Storage
  • Earth System Modeling
  • Energy System Modeling
  • Transmission
  • Distribution
  • Appliance and Equipment Standards
  • Building Energy Codes
  • Advanced Building Controls
  • Advanced Lighting
  • Building-Grid Integration
  • Building and Grid Modeling
  • Commercial Buildings
  • Federal Performance Optimization
  • Resilience and Security
  • Grid Resilience and Decarbonization
  • Building America Solution Center
  • Energy Efficient Technology Integration
  • Home Energy Score
  • Electrochemical Energy Storage
  • Flexible Loads and Generation
  • Grid Integration, Controls, and Architecture
  • Regulation, Policy, and Valuation
  • Science Supporting Energy Storage
  • Chemical Energy Storage
  • Waste Processing
  • Radiation Measurement
  • Environmental Remediation
  • Subsurface Energy Systems
  • Carbon Capture
  • Carbon Storage
  • Carbon Utilization
  • Advanced Hydrocarbon Conversion
  • Fuel Cycle Research
  • Advanced Reactors
  • Reactor Operations
  • Reactor Licensing
  • Solar Energy
  • Wind Resource Characterization
  • Wildlife and Wind
  • Community Values and Ocean Co-Use
  • Wind Systems Integration
  • Wind Data Management
  • Distributed Wind
  • Energy Equity & Health
  • Environmental Monitoring for Marine Energy
  • Marine Biofouling and Corrosion
  • Marine Energy Resource Characterization
  • Testing for Marine Energy
  • The Blue Economy
  • Environmental Performance of Hydropower
  • Hydropower Cybersecurity and Digitalization
  • Hydropower and the Electric Grid
  • Materials Science for Hydropower
  • Pumped Storage Hydropower
  • Water + Hydropower Planning
  • Grid Integration of Renewable Energy
  • Geothermal Energy
  • Algal Biofuels
  • Aviation Biofuels
  • Waste-to-Energy and Products
  • Hydrogen & Fuel Cells
  • Emission Control
  • Energy-Efficient Mobility Systems
  • Lightweight Materials
  • Vehicle Electrification
  • Vehicle Grid Integration
  • Contraband Detection
  • Pathogen Science & Detection
  • Explosives Detection
  • Threat-Agnostic Biodefense
  • Discovery and Insight
  • Proactive Defense
  • Trusted Systems
  • Nuclear Material Science
  • Radiological & Nuclear Detection
  • Nuclear Forensics
  • Ultra-Sensitive Nuclear Measurements
  • Nuclear Explosion Monitoring
  • Global Nuclear & Radiological Security
  • Disaster Recovery
  • Global Collaborations
  • Legislative and Regulatory Analysis
  • Technical Training
  • Additive Manufacturing
  • Deployed Technologies
  • Rapid Prototyping
  • Systems Engineering
  • 5G Security
  • RF Signal Detection & Exploitation
  • Climate Security
  • Internet of Things
  • Maritime Security
  • Artificial Intelligence
  • Graph and Data Analytics
  • Software Engineering
  • Computational Mathematics & Statistics
  • High-Performance Computing
  • Visual Analytics
  • Lab Objectives
  • Publications & Reports
  • Featured Research
  • Diversity, Equity, Inclusion & Accessibility
  • Lab Leadership
  • Lab Fellows
  • Staff Accomplishments
  • Undergraduate Students
  • Graduate Students
  • Post-graduate Students
  • University Faculty
  • University Partnerships
  • K-12 Educators and Students
  • STEM Workforce Development
  • STEM Outreach
  • Meet the Team
  • Internships
  • Regional Impact
  • Philanthropy
  • Volunteering
  • Available Technologies
  • Industry Partnerships
  • Licensing & Technology Transfer
  • Entrepreneurial Leave
  • Atmospheric Radiation Measurement User Facility
  • Electricity Infrastructure Operations Center
  • Energy Sciences Center
  • Environmental Molecular Sciences Laboratory
  • Grid Storage Launchpad
  • Institute for Integrated Catalysis
  • Interdiction Technology and Integration Laboratory
  • PNNL Portland Research Center
  • PNNL Seattle Research Center
  • PNNL-Sequim (Marine and Coastal Research)
  • Radiochemical Processing Laboratory
  • Shallow Underground Laboratory

Hybrid Attack Graph Generation with Graph Convolutional Deep-Q Learning

Published: April 13, 2024

Research topics

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

A collection of guides and examples for the Gemini API.

google-gemini/cookbook

Folders and files, repository files navigation, welcome to the gemini api cookbook.

This is a collection of guides and examples for the Gemini API, including quickstart tutorials for writing prompts and using different features of the API, and examples of things you can build.

Get started with the Gemini API

The Gemini API gives you access to Gemini models created by Google DeepMind . Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, code, and audio. You can use these to develop a range of applications .

Start developing

  • Go to Google AI Studio .
  • Login with your Google account.
  • Create an API key.
  • Use a quickstart for Python, or call the REST API using curl .

Capabilities

Learn about the capabilities of the Gemini API by checking out the quickstarts for safety , embeddings , function calling , audio , and more.

Official SDKs

The Gemini API is a REST API. You can call the API using a command line tool like curl , or by using one of our official SDKs:

  • Dart (Flutter)

Open an issue on GitHub.

Contributing

Contributions are welcome. See contributing to learn more.

Thank you for developing with the Gemini API! We’re excited to see what you create.

Contributors 10

@random-forests

  • Jupyter Notebook 99.9%

Prestigious cancer research institute has retracted 7 studies amid controversy over errors

Dana-Farber Cancer Institute

Seven studies from researchers at the prestigious Dana-Farber Cancer Institute have been retracted over the last two months after a scientist blogger alleged that images used in them had been manipulated or duplicated.

The retractions are the latest development in a monthslong controversy around research at the Boston-based institute, which is a teaching affiliate of Harvard Medical School. 

The issue came to light after Sholto David, a microbiologist and volunteer science sleuth based in Wales, published a scathing post on his blog in January, alleging errors and manipulations of images across dozens of papers produced primarily by Dana-Farber researchers . The institute acknowledged errors and subsequently announced that it had requested six studies to be retracted and asked for corrections in 31 more papers. Dana-Farber also said, however, that a review process for errors had been underway before David’s post. 

Now, at least one more study has been retracted than Dana-Farber initially indicated, and David said he has discovered an additional 30 studies from authors affiliated with the institute that he believes contain errors or image manipulations and therefore deserve scrutiny.

The episode has imperiled the reputation of a major cancer research institute and raised questions about one high-profile researcher there, Kenneth Anderson, who is a senior author on six of the seven retracted studies. 

Anderson is a professor of medicine at Harvard Medical School and the director of the Jerome Lipper Multiple Myeloma Center at Dana-Farber. He did not respond to multiple emails or voicemails requesting comment. 

The retractions and new allegations add to a larger, ongoing debate in science about how to protect scientific integrity and reduce the incentives that could lead to misconduct or unintentional mistakes in research. 

The Dana-Farber Cancer Institute has moved relatively swiftly to seek retractions and corrections. 

“Dana-Farber is deeply committed to a culture of accountability and integrity, and as an academic research and clinical care organization we also prioritize transparency,” Dr. Barrett Rollins, the institute’s integrity research officer, said in a statement. “However, we are bound by federal regulations that apply to all academic medical centers funded by the National Institutes of Health among other federal agencies. Therefore, we cannot share details of internal review processes and will not comment on personnel issues.”

The retracted studies were originally published in two journals: One in the Journal of Immunology and six in Cancer Research. Six of the seven focused on multiple myeloma, a form of cancer that develops in plasma cells. Retraction notices indicate that Anderson agreed to the retractions of the papers he authored.

Elisabeth Bik, a microbiologist and longtime image sleuth, reviewed several of the papers’ retraction statements and scientific images for NBC News and said the errors were serious. 

“The ones I’m looking at all have duplicated elements in the photos, where the photo itself has been manipulated,” she said, adding that these elements were “signs of misconduct.” 

Dr.  John Chute, who directs the division of hematology and cellular therapy at Cedars-Sinai Medical Center and has contributed to studies about multiple myeloma, said the papers were produced by pioneers in the field, including Anderson. 

“These are people I admire and respect,” he said. “Those were all high-impact papers, meaning they’re highly read and highly cited. By definition, they have had a broad impact on the field.” 

Chute said he did not know the authors personally but had followed their work for a long time.

“Those investigators are some of the leading people in the field of myeloma research and they have paved the way in terms of understanding our biology of the disease,” he said. “The papers they publish lead to all kinds of additional work in that direction. People follow those leads and industry pays attention to that stuff and drug development follows.”

The retractions offer additional evidence for what some science sleuths have been saying for years: The more you look for errors or image manipulation, the more you might find, even at the top levels of science. 

Scientific images in papers are typically used to present evidence of an experiment’s results. Commonly, they show cells or mice; other types of images show key findings like western blots — a laboratory method that identifies proteins — or bands of separated DNA molecules in gels. 

Science sleuths sometimes examine these images for irregular patterns that could indicate errors, duplications or manipulations. Some artificial intelligence companies are training computers to spot these kinds of problems, as well. 

Duplicated images could be a sign of sloppy lab work or data practices. Manipulated images — in which a researcher has modified an image heavily with photo editing tools — could indicate that images have been exaggerated, enhanced or altered in an unethical way that could change how other scientists interpret a study’s findings or scientific meaning. 

Top scientists at big research institutions often run sprawling laboratories with lots of junior scientists. Critics of science research and publishing systems allege that a lack of opportunities for young scientists, limited oversight and pressure to publish splashy papers that can advance careers could incentivize misconduct. 

These critics, along with many science sleuths, allege that errors or sloppiness are too common , that research organizations and authors often ignore concerns when they’re identified, and that the path from complaint to correction is sluggish. 

“When you look at the amount of retractions and poor peer review in research today, the question is, what has happened to the quality standards we used to think existed in research?” said Nick Steneck, an emeritus professor at the University of Michigan and an expert on science integrity.

David told NBC News that he had shared some, but not all, of his concerns about additional image issues with Dana-Farber. He added that he had not identified any problems in four of the seven studies that have been retracted. 

“It’s good they’ve picked up stuff that wasn’t in the list,” he said. 

NBC News requested an updated tally of retractions and corrections, but Ellen Berlin, a spokeswoman for Dana-Farber, declined to provide a new list. She said that the numbers could shift and that the institute did not have control over the form, format or timing of corrections. 

“Any tally we give you today might be different tomorrow and will likely be different a week from now or a month from now,” Berlin said. “The point of sharing numbers with the public weeks ago was to make clear to the public that Dana-Farber had taken swift and decisive action with regard to the articles for which a Dana-Farber faculty member was primary author.” 

She added that Dana-Farber was encouraging journals to correct the scientific record as promptly as possible. 

Bik said it was unusual to see a highly regarded U.S. institution have multiple papers retracted. 

“I don’t think I’ve seen many of those,” she said. “In this case, there was a lot of public attention to it and it seems like they’re responding very quickly. It’s unusual, but how it should be.”

Evan Bush is a science reporter for NBC News. He can be reached at [email protected].

The online eclipse experience: People on X get creative, political and possibly blind

graph image research paper

The 2024 total eclipse is caused by a rare alignment in celestial spheres that will send millions of people in the path of totality outside to peer at the sky.

People have booked their Airbnbs years in advance , eclipse glasses are selling out and forecasters have been warning of cloudy skies for weeks.

Regardless of how otherworldly this event is and how much planning people have dedicated to experiencing it in person, the internet is being the internet and providing an eclipse experience of its own.

GIFs, quips and skits are flooding social media platforms like X on Monday.

Here are some of the best social media reactions to the eclipse:

Solar eclipse 2024 live updates: See latest weather forecast, what time it hits your area

Forgot your eclipse glasses? So did the internet

Proper eye safety is recommended for looking at the sun during an eclipse, and several places like Warby Parker and public libraries have been giving them away for free.

But some poor souls didn't secure their pair in advance, and the internet knows it:

"I don’t have eclipse glasses and I don’t trust myself not to look at the sun," one user posted on X.

"During the eclipse, it’s important not to stare at the Sun directly, as it will take this as a provocation. Look away to the sides to indicate clearly that you are not a threat to the Sun. Do NOT run away; this will activate the Sun’s predation instinct, and then God help us all," another posted .

Viral moment: Looking back (but not directly at) Donald Trump's 2017 solar eclipse moment

Some people are making their own eclipses

*flips ipad around*, of course, it's getting political.

Several supporters of former President Donald Trump have also made their own footage showing Trump eclipsing President Joe Biden , indicating their hope for the 2024 presidential election.

"Biggest Event of 2024," wrote one user , with a picture of Trump "eclipsing" Biden attached.

But many people are reminiscing about the 2017 eclipse, when Trump seemingly looked up at the sky without glasses .

Biden joined in poking fun at him in a post saying, "don't be silly, folks – play it safe and wear protective eyewear," a presumed nod to Trump's viral moment.

Contributing: Eric Lagatta, Natalie Alund

IMAGES

  1. Best Practices of Graphs and Charts in Research Papers

    graph image research paper

  2. How To Label A Picture In A Research Paper

    graph image research paper

  3. How to draw a bar graph for your scientific paper with python

    graph image research paper

  4. How to Use Tables & Graphs in a Research Paper

    graph image research paper

  5. Figures In Apa Paper

    graph image research paper

  6. How to Format a Graph in your Research Paper

    graph image research paper

VIDEO

  1. Graph Paper Programming

  2. Basecamp Research: Embracing Graph Models over Traditional Data Science

  3. 8.5 Graph Cuts

  4. Drawing on Graph Paper

  5. 3D drawing on graph paper 😃 #216 #shorts #youtubeshorts #art

  6. Draw Pictures, Charts & Graphs in Research Paper using AI Tool |Diagramming AI |AI flowchart

COMMENTS

  1. Review A key review on graph data science: The power of graphs in

    During the graph type selection phase, graph images suitable for the type, property, and size of the data can be determined. ... This paper: 2022: graph theory, graph types, special graphs. ... Operation research problems. Graph coloring technique is very important in many real-time applications of computer science. Depending on the ...

  2. Graph neural networks: A review of methods and applications

    The other motivation comes from graph representation learning (Cui et al., 2018a; Hamilton et al., 2017b; Zhang et al., 2018a; Cai et al., 2018; Goyal and Ferrara, 2018), which learns to represent graph nodes, edges or subgraphs by low-dimensional vectors.In the field of graph analysis, traditional machine learning approaches usually rely on hand engineered features and are limited by its ...

  3. The Review of Image Processing Based on Graph Neural Network

    Abstract. Convolutional neural networks have ushered in significant advancements in the field of image processing. Convolutional neural networks, on the other hand, operate well with European geographic data, whereas graph neural networks function better with non-European geographical data. This paper summarizes the application of image ...

  4. Graph-Based Image Retrieval: State of the Art

    This technique use the textual indexation of an image, its metadata or the textual elements attached to the image. A lot of research has been done on TBIR, but are very ancient due to the great importance given to the other types of ImR. ... The main purpose of the paper is to draw attention to graph-based approaches, and its vital role to ...

  5. Converting tabular data into images for deep learning with

    In this paper, we develop a novel method, Image Generator for Tabular Data (IGTD), to transform tabular data into images for subsequent deep learning analysis using CNNs. ... ACM Trans. Graph. 11 ...

  6. Graph Interpretation, Summarization and Visualization Techniques: A

    With the advancement in computation technology and the increase of multimedia, generation of massive amount of data gained more proliferation due to emerging applications in research domains such as machine learning techniques, artificial intelligence and mathematical modeling for analysis of these data [1, 2].The graph has been a ubiquitous way of representing massive amounts of data which is ...

  7. Graph-Based Image Retrieval: State of the Art

    In this paper, we present the basic concepts of Image Retrieval and image retrieval techniques. Also, a comparative study of image retrieval techniques has been carried out to present their advantages, and drawbacks. The problematic of this work revolves around: the integration of semantic aspect in graph-based Images Retrieval approaches.

  8. Graph Neural Networks: A bibliometrics overview

    Recently, graph neural networks (GNNs) have become a hot topic in machine learning community. This paper presents a Scopus-based bibliometric overview of the GNNs' research since 2004 when GNN papers were first published. The study aims to evaluate GNN research trends, both quantitatively and qualitatively.

  9. PDF ChartOCR: Data Extraction from Charts Images via a Deep Hybrid Framework

    ChartOCR achieves state-of-the-art perfor-mance on chart data extraction task for all three major chart types. (2) We also design new evaluation metrics for these chart types. (3) We collect a fully annotated chart data set with 400K Excel chart images to enable the training of deep learning models. 2.

  10. A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval

    April 2021 A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval Manh-Duy NGUYENa;1, Binh T. NGUYENb;c;d and Cathal GURRINa aSchool of Computing, Dublin, Ireland bAISIA Research Lab cUniversity of Science, Ho Chi Minh City, Vietnam dVietnam National University Ho Chi Minh City, Vietnam Abstract. Conventional approaches to image-text retrieval mainly focus on index-

  11. How to Cite Images, Graphs & Tables in a Research Paper

    An image or chart makes your research paper more attractive, explanatory & understandable for the audience. To cite images, graphs, charts and other non-textual elements, you need to provide information like author's name, title, when and where it was first published, etc.

  12. Graph Neural Networks and Their Current Applications in Bioinformatics

    Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in ...

  13. (PDF) Image Segmentation Based on Graph Theory and Threshold

    Abstract —This paper presents an image segmentation technique using discreet tools from graph theory. The image. segmentation incorporating graph theoretic methods mak e the formulation of the ...

  14. Knowledge Graphs: A Practical Review of the Research Landscape

    Knowledge graphs (KGs) have rapidly emerged as an important area in AI over the last ten years. Building on a storied tradition of graphs in the AI community, a KG may be simply defined as a directed, labeled, multi-relational graph with some form of semantics. In part, this has been fueled by increased publication of structured datasets on the Web, and well-publicized successes of large-scale ...

  15. PDF Representation Learning of Histopathology Images using Graph Neural

    Graph convolution network is used to learn the repre-sentation of the graph, which is passed through a graph pooling layer to get a single feature vector representing the bag of instances. 5. The single feature vector from the graph can be used for classification or other learning tasks. 5. Experiments.

  16. Extracting (digitising) data from plots in scientific papers or images

    The extraction of data from images is called digitization. This is the conversion of an analogue figure into a quantized digital (numerical) format — to be used for manipulation and analysis. The simplest process works by defining the range of data within a plot and calculating the value of the points on a plotted line within it.

  17. PDF Research Topics in Graph Theory and Its Applications

    This book includes a number of research topics in graph the-ory and its applications. The topics are in the form of research ... Among practical applications of the theory are image pro-1. 2 CHAPTER 1. ... perfect graphs can be found in the survey paper [22]. Unlike perfect graphs, strongly perfect graphs do not have a conjecture ...

  18. How to Create Precise Graphs, Diagrams or Images in a Research Paper

    Choose only information that can be clearer if explained visually, and only if it is so important that you desire the reader to keep focus on it more than in other parts. Besides, this piece of information must be qualitatively or quantitatively measurable. Images can also be used to summarize; plenty of information can be perfectly summed up ...

  19. Utilizing tables, figures, charts and graphs to enhance the readability

    Introduction. Every author aims to reach the maximum target audience through his/her research publication/s. Our previous editorials have touched upon the process of writing a quality research paper and its successful publication in an appropriate journal.[1,2] Journal-specific "Instructions for Authors" generally have defined limits to the text and non-textual content for the benefit of ...

  20. MLA Style: Writing & Citation

    Figure (Photo, Image, Graph, or Chart) Inserted Into a Research Paper Fig. X. Description of the figure from: citation for source figure was found in. The caption for a figure begins with a description of the figure, then the complete Works Cited list citation for the source the figure was found in.

  21. How to Extract Data from Graphs or Images in Scientific Papers?

    Step 1: Find the graph in an image format. The first, foremost step is to get a graph in image file formats, such as PNG, JPG, JPEG, SVG, GIF, TIFF. Here, we have taken the following graphs as a sample image. The curve below represents the solubility of oxygen (expressed on the y-axis) in water with temperature (expressed on the x-axis).

  22. On Graph Extraction from Image Data

    However, traditional pixel-based image analysis techniques do not effectively extract, hence represent the content. A very promising approach is to extract graphs from images, which is not an easy task. In this paper we present a novel approach for knowledge discovery by extracting graph structures from natural image data.

  23. How to Use Tables & Graphs in a Research Paper

    In a table, readers can look up exact values, compare those values between pairs or groups of related measurements (e.g., growth rates or outcomes of a medical procedure over several years), look at ranges and intervals, and select specific factors to search for patterns. Tables are not restrained to a specific type of data or measurement.

  24. Efficient and Accurate Graph Statistics with Adaptive ...

    We further design the triangle counting mechanism that only downloads a 2-hop subgraph instead of the whole graph when interacting with users, which significantly reduces communication costs. Moreover, we leverage a variance-based threshold edge-clipping strategy to reconstruct the noisy graph, leading to high data utility and practicability.

  25. Hybrid Attack Graph Generation with Graph Convolutional Deep-Q Learning

    Hybrid Attack Graph Generation with Graph Convolutional Deep-Q Learning. In IEEE International Conference on Big Data (BigData 2023), December 15-18, 2023, Sorrento, Italy , 3127-3133. Piscataway, New Jersey:IEEE.

  26. GitHub

    Get started with the Gemini API. The Gemini API gives you access to Gemini models created by Google DeepMind. Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, code, and audio. You can use these to develop a range of applications.

  27. Cancer research institute retracts studies amid controversy over errors

    Scientific images in papers are typically used to present evidence of an experiment's results. Commonly, they show cells or mice; other types of images show key findings like western blots — a ...

  28. Eclipse memes, jokes, reactions: The internet is ready for totality

    The 2024 total eclipse is caused by a rare alignment in celestial spheres that will send millions of people in the path of totality outside to peer at the sky. People have booked their Airbnbs ...