Understanding and solving intractable resource governance problems.

  • In the Press
  • Conferences and Talks
  • Exploring models of electronic wastes governance in the United States and Mexico: Recycling, risk and environmental justice
  • The Collaborative Resource Governance Lab (CoReGovLab)
  • Water Conflicts in Mexico: A Multi-Method Approach
  • Past projects
  • Publications and scholarly output
  • Research Interests
  • Higher education and academia
  • Public administration, public policy and public management research
  • Research-oriented blog posts
  • Stuff about research methods
  • Research trajectory
  • Publications
  • Developing a Writing Practice
  • Outlining Papers
  • Publishing strategies
  • Writing a book manuscript
  • Writing a research paper, book chapter or dissertation/thesis chapter
  • Everything Notebook
  • Literature Reviews
  • Note-Taking Techniques
  • Organization and Time Management
  • Planning Methods and Approaches
  • Qualitative Methods, Qualitative Research, Qualitative Analysis
  • Reading Notes of Books
  • Reading Strategies
  • Teaching Public Policy, Public Administration and Public Management
  • My Reading Notes of Books on How to Write a Doctoral Dissertation/How to Conduct PhD Research
  • Writing a Thesis (Undergraduate or Masters) or a Dissertation (PhD)
  • Reading strategies for undergraduates
  • Social Media in Academia
  • Resources for Job Seekers in the Academic Market
  • Writing Groups and Retreats
  • Regional Development (Fall 2015)
  • State and Local Government (Fall 2015)
  • Public Policy Analysis (Fall 2016)
  • Regional Development (Fall 2016)
  • Public Policy Analysis (Fall 2018)
  • Public Policy Analysis (Fall 2019)
  • Public Policy Analysis (Spring 2016)
  • POLI 351 Environmental Policy and Politics (Summer Session 2011)
  • POLI 352 Comparative Politics of Public Policy (Term 2)
  • POLI 375A Global Environmental Politics (Term 2)
  • POLI 350A Public Policy (Term 2)
  • POLI 351 Environmental Policy and Politics (Term 1)
  • POLI 332 Latin American Environmental Politics (Term 2, Spring 2012)
  • POLI 350A Public Policy (Term 1, Sep-Dec 2011)
  • POLI 375A Global Environmental Politics (Term 1, Sep-Dec 2011)

Writing theoretical frameworks, analytical frameworks and conceptual frameworks

Three of the most challenging concepts for me to explain are the interrelated ideas of a theoretical framework, a conceptual framework, and an analytical framework. All three of these tend to be used interchangeably. While I find these concepts somewhat fuzzy and I struggle sometimes to explain the differences between them and clarify their usage for my students (and clearly I am not alone in this challenge), this blog post is an attempt to help discern these analytical categories more clearly.

A lot of people (my own students included) have asked me if the theoretical framework is their literature review. That’s actually not the case. A theoretical framework , the way I define it, is comprised of the different theories and theoretical constructs that help explain a phenomenon. A theoretical framework sets out the various expectations that a theory posits and how they would apply to a specific case under analysis, and how one would use theory to explain a particular phenomenon. I like how theoretical frameworks are defined in this blog post . Dr. Cyrus Samii offers an explanation of what a good theoretical framework does for students .

For example, you can use framing theory to help you explain how different actors perceive the world. Your theoretical framework may be based on theories of framing, but it can also include others. For example, in this paper, Zeitoun and Allan explain their theoretical framework, aptly named hydro-hegemony . In doing so, Zeitoun and Allan explain the role of each theoretical construct (Power, Hydro-Hegemony, Political Economy) and how they apply to transboundary water conflict. Another good example of a theoretical framework is that posited by Dr. Michael J. Bloomfield in his book Dirty Gold, as I mention in this tweet:

In Chapter 2, @mj_bloomfield nicely sets his theoretical framework borrowing from sociology, IR, and business-strategy scholarship pic.twitter.com/jTGF4PPymn — Dr Raul Pacheco-Vega (@raulpacheco) December 24, 2017

An analytical framework is, the way I see it, a model that helps explain how a certain type of analysis will be conducted. For example, in this paper, Franks and Cleaver develop an analytical framework that includes scholarship on poverty measurement to help us understand how water governance and poverty are interrelated . Other authors describe an analytical framework as a “conceptual framework that helps analyse particular phenomena”, as posited here , ungated version can be read here .

I think it’s easy to conflate analytical frameworks with theoretical and conceptual ones because of the way in which concepts, theories and ideas are harnessed to explain a phenomenon. But I believe the most important element of an analytical framework is instrumental : their purpose is to help undertake analyses. You use elements of an analytical framework to deconstruct a specific concept/set of concepts/phenomenon. For example, in this paper , Bodde et al develop an analytical framework to characterise sources of uncertainties in strategic environmental assessments.

A robust conceptual framework describes the different concepts one would need to know to understand a particular phenomenon, without pretending to create causal links across variables and outcomes. In my view, theoretical frameworks set expectations, because theories are constructs that help explain relationships between variables and specific outcomes and responses. Conceptual frameworks, the way I see them, are like lenses through which you can see a particular phenomenon.

A conceptual framework should serve to help illuminate and clarify fuzzy ideas, and fill lacunae. Viewed this way, a conceptual framework offers insight that would not be otherwise be gained without a more profound understanding of the concepts explained in the framework. For example, in this article, Beck offers social movement theory as a conceptual framework that can help understand terrorism . As I explained in my metaphor above, social movement theory is the lens through which you see terrorism, and you get a clearer understanding of how it operates precisely because you used this particular theory.

Dan Kaminsky offered a really interesting explanation connecting these topics to time, read his tweet below.

I think this maps to time. Theoretical frameworks talk about how we got here. Conceptual frameworks discuss what we have. Analytical frameworks discuss where we can go with this. See also legislative/executive/judicial. — Dan Kaminsky (@dakami) September 28, 2018

One of my CIDE students, Andres Ruiz, reminded me of this article on conceptual frameworks in the International Journal of Qualitative Methods. I’ll also be adding resources as I get them via Twitter or email. Hopefully this blog post will help clarify this idea!

You can share this blog post on the following social networks by clicking on their icon.

Posted in academia .

Tagged with analytical framework , conceptual framework , theoretical framework .

By Raul Pacheco-Vega – September 28, 2018

7 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post .

Thanks, this had some useful clarifications for me!

I GOT CONFUSED AGAIN!

No need to be confused!

Thanks for the Clarification, Dr Raul. My cluttered mind is largely cleared, now.

Thanks,very helpful

I too was/am confused but this helps 🙂

Thank you very much, Dr.

Leave a Reply Cancel Some HTML is OK

Name (required)

Email (required, but never shared)

or, reply to this post via trackback .

About Raul Pacheco-Vega, PhD

Find me online.

My Research Output

  • Google Scholar Profile
  • Academia.Edu
  • ResearchGate

My Social Networks

  • Polycentricity Network

Recent Posts

  • “State-Sponsored Activism: Bureaucrats and Social Movements in Brazil” – Jessica Rich – my reading notes
  • Reading Like a Writer – Francine Prose – my reading notes
  • Using the Pacheco-Vega workflows and frameworks to write and/or revise a scholarly book
  • On framing, the value of narrative and storytelling in scholarly research, and the importance of asking the “what is this a story of” question
  • The Abstract Decomposition Matrix Technique to find a gap in the literature

Recent Comments

  • Hazera on On framing, the value of narrative and storytelling in scholarly research, and the importance of asking the “what is this a story of” question
  • Kipi Fidelis on A sequential framework for teaching how to write good research questions
  • Razib Paul on On framing, the value of narrative and storytelling in scholarly research, and the importance of asking the “what is this a story of” question
  • Jonathan Wilcox on An improved version of the Drafts Review Matrix – responding to reviewers and editors’ comments
  • Catherine Franz on What’s the difference between the Everything Notebook and the Commonplace Book?

Follow me on Twitter:

Proudly powered by WordPress and Carrington .

Carrington Theme by Crowd Favorite

  • Trending Now
  • Foundational Courses
  • Data Science
  • Practice Problem
  • Machine Learning
  • System Design
  • DevOps Tutorial
  • Compute Empirical Cumulative Distribution Function in R
  • Difference Between Business Intelligence and Data analytics
  • Data Type Conversion in R
  • Kendall Correlation Testing in R Programming
  • How to Send Automated Email Messages in Python
  • Fundamental Steps For a Data Analytics Project Plan
  • Normal Distribution in R
  • How to export a DataFrame to Excel File in R ?
  • How to find group-wise summary statistics for R dataframe?
  • HuberRegressor vs Ridge on Dataset with Strong Outliers in Scikit Learn
  • Create Matrix from Vectors in R
  • Difference Between Data Analytics and Predictive Analytics
  • Top 10 Data Analytics Trends You Must Know in 2024
  • Time Series Analysis in R
  • Difference Between Cloud Computing and Data Analytics
  • R - Charts and Graphs
  • Data Science vs Data Analytics
  • Position ggplot text in Each Corner in R
  • Difference between Cloud Computing and Big Data Analytics

A Comprehensive Guide to Data Analytics Framework

Data analytics frameworks provide a structured approach for making sense of data. They bring order to complex information environments, so organizations can gain actionable insights. With the right framework, companies can collaborate and transform disconnected data into innovation and strategic planning. In today’s data-driven world, analytics frameworks are essential for optimizing operations, understanding customers, and identifying opportunities. In short, they turn overwhelming data into an asset for learning, improving, and thriving.

What-is-a-Data-Analytics-Framework

Table of Content

Understanding Data Analytics

Types of data analytics, key components of a data analytics framework, case study on data analytics framework, popular data analytics framework, future trends of data analytics framework, faqs on data analytics framework.

Data analytics is the process of examining data to uncover useful information and support decision-making. It involves collecting raw data from different sources, cleaning and organizing it, and using tools and techniques to analyze it. The goal is to discover patterns, trends, and insights that would otherwise be hidden in the mass of data.

Some common data analytics approaches include:

  • Descriptive analytics: summarizing historical data to understand the past.
  • Predictive analytics : using statistical models to forecast future outcomes.
  • Prescriptive analytics: suggesting actions to take based on insights.
  • Data mining: exploring data to find new patterns and relationships.

The results of data analytics guide strategic decisions across an organization, from operations to marketing to finance. With the growth of big data, analytics has become essential for staying competitive. It enables data-driven decision-making based on evidence rather than gut instinct.

  • Descriptive Analytics – This looks at past data to summarize and explain what happened. It provides insight into the reasons behind current business performance. Common techniques include data visualization, business reporting, and dashboards.
  • Diagnostic Analytics – This aims to understand why something happened by connecting data points and evaluating patterns. It helps identify issues and opportunities. Techniques involve drilling down data and data mining.
  • Predictive Analytics – This uses statistical models and forecasting techniques to understand future outcomes. It makes predictions based on current and historical data. Methods include regression analysis and machine learning.
  • Prescriptive Analytics – This suggests specific actions to take based on predictive modeling. It recommends data-driven decisions to achieve goals. Optimization, simulation, and decision modeling techniques are used.
  • Data Collection – This involves gathering relevant data from different sources like databases, apps, social media, etc. Both structured and unstructured data are collected.
  • Data Preparation – Here the raw data is cleaned, formatted, and made analysis-ready. Activities include data quality checks, merging data sources, handling missing values, etc.
  • Data Analysis – Appropriate analytical techniques are applied based on the business problem. Statistical modeling, data mining, machine learning methods can be used to analyze patterns.
  • Data Visualization – Data insights are visualized through charts, graphs and dashboards. This makes it easier to interpret results and identify trends.
  • Communication of Results – The key insights, trends, recommendations are compiled and presented to stakeholders. The analysis needs to connect back to core business goals.
  • Decision Making – The insights derived are used by leaders to steer strategy and operations. Data-driven decisions get incorporated into workflows.
  • Implementation – The insights are finally operationalized and executed across the organization through process changes, system updates, policy changes etc.

A retailer was facing declining sales for the past few quarters. They wanted to understand what was causing this downturn.

They decided to follow a data analytics framework to gain insights :

  • Data was collected from their sales database, customer relationship management system, and surveys.
  • The data was prepared by cleaning, joining tables, handling missing values.
  • Exploratory analysis was done to see sales trends across regions, segments, channels. Statistical modeling identified factors influencing sales.
  • Visualizations like charts, graphs and heat maps were created to see patterns clearly.
  • The analysis revealed that sales dropped due to changing customer preferences, price competition, and supply chain issues.
  • These insights were presented to the management team.
  • It was decided to refresh the product portfolio, streamline pricing, and improve supplier relationships.
  • These strategic decisions were implemented across the organization.

Within a few quarters, the analytics-driven decisions helped reverse the declining sales trend. The framework provided a structured data-driven approach to understand business issues and respond effectively.

Let us discuss some popular data analytics framework, their purpose, components and use cases:

  • Purpose : Hadoop is designed for distributed storage and processing of large datasets across clusters of computers. It’s particularly useful for batch processing tasks where data is stored across multiple nodes.
  • Components : It comprises two main components: Hadoop Distributed File System (HDFS) for storage and MapReduce for processing. HDFS breaks down large files into smaller blocks and distributes them across the cluster for redundancy and reliability. MapReduce is a programming model for processing and generating large datasets parallelly across distributed computing clusters.
  • Use Cases : Hadoop is commonly used in big data analytics, log processing, data warehousing, and for applications requiring massive scalability.
  • Purpose : Spark is an open-source distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s designed to be faster and more flexible than Hadoop’s MapReduce.
  • Components : Spark offers a wide range of libraries including Spark SQL for SQL queries, Spark Streaming for real-time data processing, MLlib for machine learning, and GraphX for graph processing. It utilizes an in-memory computing engine for improved performance.
  • Use Cases : Spark is often used in real-time analytics, iterative algorithms, machine learning, and interactive data analysis.
  • Purpose : Pandas is a powerful open-source data analysis and manipulation library for Python. It provides high-performance, easy-to-use data structures and data analysis tools.
  • Features : Pandas offers DataFrame objects for handling structured data, Series objects for one-dimensional data structures, and a wide range of functions for data manipulation, cleaning, merging, reshaping, and more.
  • Use Cases : Pandas is commonly used for data cleaning, exploration, transformation, and analysis tasks in data science projects.
  • Purpose : Scikit-learn is a machine learning library for Python that provides simple and efficient tools for data mining and data analysis. It’s built on NumPy, SciPy, and matplotlib, and offers a wide range of machine learning algorithms.
  • Features : Scikit-learn includes algorithms for classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It also provides tools for model evaluation and validation.
  • Use Cases : Scikit-learn is widely used for building and deploying machine learning models for various tasks such as classification, regression, clustering, and dimensionality reduction.
  • Purpose : Dask is a parallel computing library for analytics in Python. It’s designed to scale computations to larger datasets that don’t fit into memory by providing parallel versions of pandas DataFrame and NumPy arrays.
  • Features : Dask offers dynamic task scheduling and parallel execution, allowing users to work with larger-than-memory datasets using familiar APIs from libraries like pandas and NumPy.
  • Use Cases : Dask is commonly used for parallelizing data processing tasks, scaling computations on multi-core machines or distributed clusters, and handling large datasets efficiently in data science workflows.
  • Purpose : SciPy is an open-source library for mathematics, science, and engineering in Python. It builds on NumPy and provides a wide range of functions for numerical integration, optimization, signal processing, linear algebra, and more.
  • Features : SciPy includes modules for optimization, interpolation, integration, linear algebra, signal and image processing, statistics, and more. It provides efficient implementations of many numerical algorithms.
  • Use Cases : SciPy is used in various scientific and engineering applications, including physics, chemistry, biology, bioinformatics, image processing, and signal analysis. It’s particularly useful for numerical computations and data analysis tasks requiring advanced mathematical functions and algorithms.
  • Purpose : RapidMiner is a data science platform that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. It offers a visual interface for building and deploying analytics workflows.
  • Features : RapidMiner includes tools for data preparation, feature engineering, model building, evaluation, and deployment. It supports a wide range of machine learning algorithms and techniques, as well as advanced analytics tasks like text mining and deep learning.
  • Use Cases : RapidMiner is used by data scientists, analysts, and business users for a variety of tasks including predictive modeling, customer segmentation, fraud detection, sentiment analysis, and more. Its visual interface makes it accessible to users with varying levels of technical expertise.

Each of these frameworks/libraries serves different purposes and caters to different aspects of data analytics, from distributed processing to machine learning and statistical analysis.

  • Automation – More processes will become automated through AI/ML. This includes data preparation, analysis, and deployment. This makes frameworks faster and more efficient.
  • Real-time analytics – With technologies like streaming data, organizations can get insights continuously rather than waiting for reports. This enables quicker response.
  • Advanced analytics – Frameworks will incorporate more advanced techniques like predictive modeling, simulations, complex event processing etc.
  • Smart dashboards – Interactive visualizations with advanced features will enhance data communication and storytelling.
  • Democratization – Self-service tools will enable more people across teams to access and work with data without deep analytics skills.
  • Hybrid cloud – Frameworks will leverage a mix of on-premise and cloud resources for storage, processing, and analytics.
  • Data governance – As data grows, managing privacy, security, quality and metadata will become critical parts of frameworks.
  • Integration – Frameworks will need to integrate with more data sources and operational systems for end-to-end analytics.

Data analytics frameworks provide a structured approach to gain valuable insights from data. They help organizations collect, prepare, analyze, and interpret information in an efficient way. With the right framework, companies can unlock hidden patterns and trends to drive innovation, optimize operations, and make data-driven decisions. As data volumes grow, these frameworks become even more critical for competing in today’s data-driven world. Their automation, real-time capabilities, and ease of use will be key trends going forward. In short, data analytics frameworks turn complex data into actionable insights for learning, improving, and succeeding.

Q. What is a data analytics framework?

It is a structured approach for collecting, organizing, analyzing, and interpreting data to gain valuable insights.

Q. Why are frameworks important for data analytics?

Frameworks provide standard processes so analytics is consistent, efficient and aligns to business goals.

Q. What are the key components of a framework?

Main components are data collection, preparation, analysis, visualization, communication and implementation.

Q. What are some types of data analytics?

Main types are descriptive, diagnostic, predictive, and prescriptive analytics. Each provides different insights.

Q. How can data analytics help my business?

It can optimize operations, improve customer engagement, identify new opportunities through data-driven decisions.

Q. What skills are required for data analytics?

Math, statistics, programming, database, visualization, and communication skills are important.

Q. What are some future trends in this field?

Automation, real-time analytics, smart visualizations, advanced techniques and democratization.

Q. How can I get started with data analytics?

Start by identifying business goals, getting leadership buy-in, assembling a team and rolling out a pilot project. mining:

Please Login to comment...

Similar reads.

author

  • Data Analysis

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Analytical Approach and Framework

Analytical Approach and Framework

What is an analytical approach and framework.

An analytical framework is a structure that helps us make sense of data in an organized way. We take an analytical approach by dividing a complex problem into clear, manageable segments and then reintegrating the results into a unified solution.

Below, we will explore how and when to use three types of analytical frameworks:

  • A Framework for Qualitative Research: Translating problems into numbers.
  • Case Study 1: Banner ad strategy.
  • A Framework for Quantitative Research: Putting numbers in context.
  • Case Study 2: Marketing channel metrics.
  • Data Science Methodology: Step-by-step approach to gathering data and drawing conclusions.

Types of Analytical Frameworks

There are three main types of data analytics frameworks, each with its own strengths depending on what it is they help us organize.

1 - Qualitative research frameworks: When dealing with categorical questions such as, “are our clients satisfied with our product?”, we need a way to translate that question into numbers in order to create data-based insights. A qualitative research framework does this by transforming “soft” problems into “hard” numbers.

The qualitative research framework also helps us translate abstract concepts into quantifiable data . Its used for questions like “would investing five more hours per week in research add more value to our product?”. In this case, we aim to quantify the concept of value to compare different strategies. A qualitative framework eases this process.

2 - Quantitative research frameworks: Let’s say that we are already dealing with well-defined numeric quantities. For example, the “daily active users” our application sees is a metric we have extensively defined and measured. This information helps us know how well the app is currently doing - but doesn’t say much about where to find improvements.

To improve, we need to understand which factors are driving our key metrics ; we need to give our metrics context . Quantitative research analytics frameworks help us understand the relationships between different metrics to put our core metrics in context.

3 - Data science methodology: Let’s say we have defined our concepts and put all our metrics in context, then we’re just getting started. We still need to gather data to draw conclusions.

Numerous ways exist to do this, some prone to error or inconsistency. So we need an organized process to reduce risks and maintain organization. Data science methodology frameworks offer a reliable step-by-step approach to drawing conclusions from data.

Now, let’s examine how each of these analytical frameworks works.

A Framework for Qualitative Research

There are a few qualitative research analytical frameworks we could use depending on the context of the business environment. Specific situations and problems call for different approaches; we want to ensure that we are translating the business challenge into numerical measurements in the right way.

Two examples of these approaches include product metric frameworks for measuring success and diagnosing changes in metrics , as well as evaluating the impact of potential feature changes to our product. Another common business case for translating a problem into hard numbers is through A/B tests , which have a framework of their own.

However, each of these specific frameworks follows the same four-step structure outlined below. They begin with a vaguely defined business problem and need to convert it into hard numbers to address it.

The framework to go about finding these solutions has four steps:

  • First, ask clarifying questions. Gather all the context you need to narrow down the scope of the problem and determine what requires further clarification.
  • Second, assess the requirements. Define the problem in terms of precise metrics that can be used to address gaps from the previous step.
  • Third, provide a solution. Each solution will vary depending on the type of problem you’re dealing with.
  • Fourth, validate the solution. Do this against your pre-existing knowledge and available data to minimize the likelihood of making mistakes.

Case Study 1: Banner Ad Strategy

Let’s look go through each of those framework steps with a business example of an online media company that wants to monetize web traffic by embedding banner ads in their content. Our task is to measure the success of different banner ad strategies and select the best one to scale up.

1 - Clarifying Questions & Assumptions:

Initially, we need to gather context about our monetization method. Will revenue depend on ad impressions, clicks, or the number of users who buy the advertised products?

We also need to identify our audience type. Does it consist of stable (loyal) readers with regular engagement? Or is it primarily composed of click-bait article chasers with low rates of future engagement?

This information is necessary to define each strategy’s success and determine which strategies to test in the future. For example, if we have a click-bait audience, we can observe the revenue for each monetization strategy in the short term and then compare the results.

However, if we have a regular audience, we need to understand the customer lifetime value for each strategy. This is because strategies like filling the page with ad banners could make us more money in the short term - but contribute to the loss of loyal readers, hurting profits in the long term.

2 - Assessing Requirements:

Once we have gathered context and clarified assumptions, we need to define the solution requirements precisely. Let’s say our review reveals that our revenue depends on how many clicks the ads get and that our webpage has a stable user base who reads the webpage regularly.

Now we need to define the metric to optimize our banner ad strategy. We stated that the average customer lifetime value (CLV) was a good choice, which is the total revenue the company expects to make for each of its readers. In this case, the average CLV would be the average number of clicks per session times the average number of times each user views our pages for each banner strategy.

image

The resulting metric help us choose between a strategy that generates more clicks in the short term versus a strategy that reduces reader churn. We also need to define the set of strategies we’ll evaluate. For simplicity, let’s say that we will only test the number of banners we show to each user.

3 - Solution:

At this point, we’ve defined our problem numerically and can create a data-driven solution.

In general, solutions can involve running experiments , deciding on product features , or explaining metric changes. In this case, we’ll design an A/B test to identify the best banner ad strategy, based on our assessment requirements.

In this case, we need to define an A/B test to decide our optimal strategy. Based on our requirements, the A/B test should be user-based instead of session-based: We’ll divide users into two groups, showing each group a different number of ads during their visits. For example, Bucket A receives one banner ad per webpage, while Bucket B gets two. Over time, we will be able to capture how the number of ads shown impacts engagement.

To reduce causal effects, we must ensure identical banner content for both groups. If Bucket B sees the same two banners, half of Bucket A should see one banner and the other half the other banner. We should also alternate the order of banners for Bucket B to avoid interference from the display order.

Lastly, decide on the experiment duration. To account for long-term effects, we should run the experiment for at least three months.

4 - Validation:

A useful first step is to re-check the numbers and perform a gut instinct check. If results seem odd, we should suspect a problem, investigate the cause, and revise our approach.

In this example, we tested a banner strategy hypothesis. The validation step involves evaluating differences between the test and control groups (users who didn’t receive the treatment over three months) and identifying any confounding factors that might have affected the results. We must also determine if the differences and observations are statistically significant or potentially spurious results.

A Framework for Quantitative Research

The second type of analytical approach comes from the quantitative research framework. After we define our key metrics clearly, this framework helps give them context. With this framework, teams can enhance their understanding of the key metric, making it easier to control, track, assign responsibilities, and identify opportunities for improvements.

to understand the factors that drive them, assign responsibilities to team members, and identify opportunities for improvement.

We do this by breaking down the key metric into lower-level metrics. Here’s a step-by-step guide:

Identify the key metric : Determine the main metric you want to focus on (e.g., revenue for a sales team).

Define first-level metrics : Break down the key metric into components that directly relate to it. For a sales team, first-level metrics would be the sales volume and the average selling price because the revenue is the sales volume times the average selling price.

Identify second-level metrics : Further refine your analysis by breaking down the first-level metrics into their underlying factors. For a sales team, second-level metrics could include:

  • Number of leads generated
  • Conversion rate
  • Average order value
  • Discounts and promotions
  • Competitor prices

Assign responsibility and track progress : With a better understanding of first and second-level metrics, allocate responsibility for improving them to different team members. Track their progress to enhance the key metric.

Case Study 2: Marketing Channel Metrics

Let’s explore an example where we apply the quantitative analytics framework to a company called Mode, which sells B2B analytics dashboards through a SaaS freemium subscription model (users can use the product for free but must pay monthly or annually for advanced features).

Step 1: Identify the key metric

Our key metric is marketing ROI (revenue over expenses) for each of our marketing channels.

Step 2: Define first-level metrics

Two first-level metrics stand out:

  • Revenue: Driven by our average Customer Lifetime Value (CLV) - the total revenue we make over the years for each new customer.
  • Expenses: Driven by our Customer Acquisition Cost (CAC) - the cost of gaining new customers.

Step 3: Identify second-level metrics

Now we need to identify the second-level metrics for each of our first-level metrics.

First-Level Metric: Customer Lifetime Value

CLV is calculated as the Average Revenue Per Customer (ARPC) - the average amount a customer spends each month - divided by the churn rate (CR) - the percentage of users that stop using the platform each month:

image

So ARPC and CR are the second-level metrics driving CLV.

First-Level Metric: Customer Acquisition Cost

On the other side of our marketing ROI equation, CAC is the average amount spent by the sales team in salaried time and equipment/software value to sign up one new customer.

There are quite a few second-level metrics we could investigate under CAC, mostly from looking at the customer acquisition funnel:

  • Cost per View (CPV): The amount it costs the company for each new person to see our landing page.
  • Free Sign-Ups per Total Number of Views (FSU/TNV): The percentage of landing page visitors who create a free account.
  • Paid Customers per Total Number of Views (PC/TNV): The percentage of landing page visitors who create a premium account directly.
  • Paid Customers per Free Sign-Ups (PC/FSU): The percentage of free account users who upgrade to a premium account.

With this information, we can define our CAC as:

image

So the four metrics we identified serve as our second-level metrics.

Step 4: Assign responsibility and track progress

With a clear understanding of first- and second-level metrics, the sales team can assign responsibilities for improving each metric and track their progress in enhancing the key metric of marketing ROI.

Data Science Methodology

Let’s say we’ve defined our concepts and metrics. We translated our business problem into hard numbers using a qualitative framework. Then we used the quantitative framework to get an analytical understanding of the metrics involved and their relationships. Now we want to draw conclusions from the data .

To do this, we need a reliable process that minimizes errors and keeps things organized. This is where our third analytical framework comes into use. The data science methodology provides a step-by-step approach for reaching conclusions from data, which is especially useful when questions become increasingly complex:

  • Data Requirements - Figure out the necessary data, formats, and sources to collect.
  • Data Collection - Gather and validate the data, ensuring it’s representative of the problem.
  • Data Processing - Clean and transform the data.
  • Modeling - Build models to predict or describe outcomes.
  • Evaluation - Check if the model meets business requirements and is high-quality.
  • Deployment - Prepare the model for real-world use.
  • Feedback - Refine the model based on its performance and impact.

Imagine you’re working at a company that wants to boost customer retention in its online store. They collect customer data through website analytics and a customer database. Here’s how they might follow the data science methodology:

Going through each of the steps would look something like this:

  • Data Requirements: Identify data needed to improve customer retention, such as demographics, purchase history, website engagement, and feedback.
  • Data Collection: Gather data from sources like databases, website analytics, and surveys. Ensure data is accurate, complete, and relevant.
  • Data Processing: Clean and analyze the data to remove errors, duplicates, and missing values. Look for patterns and trends you could use for feature engineering .
  • Modeling: Create predictive models to find factors that impact customer retention using machine learning algorithms based on historical data.
  • Evaluation: Compare the model’s predictions to actual customer behavior, checking for accuracy, interpretability, and scalability.
  • Deployment: Implement the model in the online store’s retention strategies. This could include targeted marketing campaigns, personalized recommendations, or loyalty programs based on the model’s predictions. If you’re working on your own, ensure you showcase your projects and results in the best possible way .
  • Feedback: Keep an eye on the model’s performance and gather customer feedback to refine it. Update the model’s algorithms or adjust retention strategies based on its predictions. Continuously assess and improve the model to maintain its effectiveness.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 September 2022

Machine learning in project analytics: a data-driven framework and case study

  • Shahadat Uddin 1 ,
  • Stephen Ong 1 &
  • Haohui Lu 1  

Scientific Reports volume  12 , Article number:  15252 ( 2022 ) Cite this article

9380 Accesses

12 Citations

18 Altmetric

Metrics details

  • Applied mathematics
  • Computational science

The analytic procedures incorporated to facilitate the delivery of projects are often referred to as project analytics. Existing techniques focus on retrospective reporting and understanding the underlying relationships to make informed decisions. Although machine learning algorithms have been widely used in addressing problems within various contexts (e.g., streamlining the design of construction projects), limited studies have evaluated pre-existing machine learning methods within the delivery of construction projects. Due to this, the current research aims to contribute further to this convergence between artificial intelligence and the execution construction project through the evaluation of a specific set of machine learning algorithms. This study proposes a machine learning-based data-driven research framework for addressing problems related to project analytics. It then illustrates an example of the application of this framework. In this illustration, existing data from an open-source data repository on construction projects and cost overrun frequencies was studied in which several machine learning models (Python’s Scikit-learn package) were tested and evaluated. The data consisted of 44 independent variables (from materials to labour and contracting) and one dependent variable (project cost overrun frequency), which has been categorised for processing under several machine learning models. These models include support vector machine, logistic regression, k -nearest neighbour, random forest, stacking (ensemble) model and artificial neural network. Feature selection and evaluation methods, including the Univariate feature selection, Recursive feature elimination, SelectFromModel and confusion matrix, were applied to determine the most accurate prediction model. This study also discusses the generalisability of using the proposed research framework in other research contexts within the field of project management. The proposed framework, its illustration in the context of construction projects and its potential to be adopted in different contexts will significantly contribute to project practitioners, stakeholders and academics in addressing many project-related issues.

Similar content being viewed by others

analytical framework case study

An ensemble-based machine learning solution for imbalanced multiclass dataset during lithology log generation

Mohammad Saleh Jamshidi Gohari, Mohammad Emami Niri, … Javad Ghiasi‑Freez

analytical framework case study

Prediction of jumbo drill penetration rate in underground mines using various machine learning approaches and traditional models

Sasan Heydari, Seyed Hadi Hoseinie & Raheb Bagherpour

analytical framework case study

An efficient machine learning approach for predicting concrete chloride resistance using a comprehensive dataset

Maedeh Hosseinzadeh, Seyed Sina Mousavi, … Mehdi Dehestani

Introduction

Successful projects require the presence of appropriate information and technology 1 . Project analytics provides an avenue for informed decisions to be made through the lifecycle of a project. Project analytics applies various statistics (e.g., earned value analysis or Monte Carlo simulation) among other models to make evidence-based decisions. They are used to manage risks as well as project execution 2 . There is a tendency for project analytics to be employed due to other additional benefits, including an ability to forecast and make predictions, benchmark with other projects, and determine trends such as those that are time-dependent 3 , 4 , 5 . There has been increasing interest in project analytics and how current technology applications can be incorporated and utilised 6 . Broadly, project analytics can be understood on five levels 4 . The first is descriptive analytics which incorporates retrospective reporting. The second is known as diagnostic analytics , which aims to understand the interrelationships and underlying causes and effects. The third is predictive analytics which seeks to make predictions. Subsequent to this is prescriptive analytics , which prescribes steps following predictions. Finally, cognitive analytics aims to predict future problems. The first three levels can be applied with ease with the help of technology. The fourth and fifth steps require data that is generally more difficult to obtain as they may be less accessible or unstructured. Further, although project key performance indicators can be challenging to define 2 , identifying common measurable features facilitates this 7 . It is anticipated that project analytics will continue to experience development due to its direct benefits to the major baseline measures focused on productivity, profitability, cost, and time 8 . The nature of project management itself is fluid and flexible, and project analytics allows an avenue for which machine learning algorithms can be applied 9 .

Machine learning within the field of project analytics falls into the category of cognitive analytics, which deals with problem prediction. Generally, machine learning explores the possibilities of computers to improve processes through training or experience 10 . It can also build on the pre-existing capabilities and techniques prevalent within management to accomplish complex tasks 11 . Due to its practical use and broad applicability, recent developments have led to the invention and introduction of newer and more innovative machine learning algorithms and techniques. Artificial intelligence, for instance, allows for software to develop computer vision, speech recognition, natural language processing, robot control, and other applications 10 . Specific to the construction industry, it is now used to monitor construction environments through a virtual reality and building information modelling replication 12 or risk prediction 13 . Within other industries, such as consumer services and transport, machine learning is being applied to improve consumer experiences and satisfaction 10 , 14 and reduce the human errors of traffic controllers 15 . Recent applications and development of machine learning broadly fall into the categories of classification, regression, ranking, clustering, dimensionality reduction and manifold learning 16 . Current learning models include linear predictors, boosting, stochastic gradient descent, kernel methods, and nearest neighbour, among others 11 . Newer and more applications and learning models are continuously being introduced to improve accessibility and effectiveness.

Specific to the management of construction projects, other studies have also been made to understand how copious amounts of project data can be used 17 , the importance of ontology and semantics throughout the nexus between artificial intelligence and construction projects 18 , 19 as well as novel approaches to the challenges within this integration of fields 20 , 21 , 22 . There have been limited applications of pre-existing machine learning models on construction cost overruns. They have predominantly focussed on applications to streamline the design processes within construction 23 , 24 , 25 , 26 , and those which have investigated project profitability have not incorporated the types and combinations of algorithms used within this study 6 , 27 . Furthermore, existing applications have largely been skewed towards one type or another 28 , 29 .

In addition to the frequently used earned value method (EVM), researchers have been applying many other powerful quantitative methods to address a diverse range of project analytics research problems over time. Examples of those methods include time series analysis, fuzzy logic, simulation, network analytics, and network correlation and regression. Time series analysis uses longitudinal data to forecast an underlying project's future needs, such as the time and cost 30 , 31 , 32 . Few other methods are combined with EVM to find a better solution for the underlying research problems. For example, Narbaev and De Marco 33 integrated growth models and EVM for forecasting project cost at completion using data from construction projects. For analysing the ongoing progress of projects having ambiguous or linguistic outcomes, fuzzy logic is often combined with EVM 34 , 35 , 36 . Yu et al. 36 applied fuzzy theory and EVM for schedule management. Ponz-Tienda et al. 35 found that using fuzzy arithmetic on EVM provided more objective results in uncertain environments than the traditional methodology. Bonato et al. 37 integrated EVM with Monte Carlo simulation to predict the final cost of three engineering projects. Batselier and Vanhoucke 38 compared the accuracy of the project time and cost forecasting using EVM and simulation. They found that the simulation results supported findings from the EVM. Network methods are primarily used to analyse project stakeholder networks. Yang and Zou 39 developed a social network theory-based model to explore stakeholder-associated risks and their interactions in complex green building projects. Uddin 40 proposed a social network analytics-based framework for analysing stakeholder networks. Ong and Uddin 41 further applied network correlation and regression to examine the co-evolution of stakeholder networks in collaborative healthcare projects. Although many other methods have already been used, as evident in the current literature, machine learning methods or models are yet to be adopted for addressing research problems related to project analytics. The current investigation is derived from the cognitive analytics component of project analytics. It proposes an approach for determining hidden information and patterns to assist with project delivery. Figure  1 illustrates a tree diagram showing different levels of project analytics and their associated methods from the literature. It also illustrates existing methods within the cognitive component of project analytics to where the application of machine learning is situated contextually.

figure 1

A tree diagram of different project analytics methods. It also shows where the current study belongs to. Although earned value analysis is commonly used in project analytics, we do not include it in this figure since it is used in the first three levels of project analytics.

Machine learning models have several notable advantages over traditional statistical methods that play a significant role in project analytics 42 . First, machine learning algorithms can quickly identify trends and patterns by simultaneously analysing a large volume of data. Second, they are more capable of continuous improvement. Machine learning algorithms can improve their accuracy and efficiency for decision-making through subsequent training from potential new data. Third, machine learning algorithms efficiently handle multi-dimensional and multi-variety data in dynamic or uncertain environments. Fourth, they are compelling to automate various decision-making tasks. For example, machine learning-based sentiment analysis can easily a negative tweet and can automatically take further necessary steps. Last but not least, machine learning has been helpful across various industries, for example, defence to education 43 . Current research has seen the development of several different branches of artificial intelligence (including robotics, automated planning and scheduling and optimisation) within safety monitoring, risk prediction, cost estimation and so on 44 . This has progressed from the applications of regression on project cost overruns 45 to the current deep-learning implementations within the construction industry 46 . Despite this, the uses remain largely limited and are still in a developmental state. The benefits of applications are noted, such as optimising and streamlining existing processes; however, high initial costs form a barrier to accessibility 44 .

The primary goal of this study is to demonstrate the applicability of different machine learning algorithms in addressing problems related to project analytics. Limitations in applying machine learning algorithms within the context of construction projects have been explored previously. However, preceding research has mainly been conducted to improve the design processes specific to construction 23 , 24 , and those investigating project profitabilities have not incorporated the types and combinations of algorithms used within this study 6 , 27 . For instance, preceding research has incorporated a different combination of machine-learning algorithms in research of predicting construction delays 47 . This study first proposed a machine learning-based data-driven research framework for project analytics to contribute to the proposed study direction. It then applied this framework to a case study of construction projects. Although there are three different machine learning algorithms (supervised, unsupervised and semi-supervised), the supervised machine learning models are most commonly used due to their efficiency and effectiveness in addressing many real-world problems 48 . Therefore, we will use machine learning to represent supervised machine learning throughout the rest of this article. The contribution of this study is significant in that it considers the applications of machine learning within project management. Project management is often thought of as being very fluid in nature, and because of this, applications of machine learning are often more difficult 9 , 49 . Further to this, existing implementations have largely been limited to safety monitoring, risk prediction, cost estimation and so on 44 . Through the evaluation of machine-learning applications, this study further demonstrates a case study for which algorithms can be used to consider and model the relationship between project attributes and a project performance measure (i.e., cost overrun frequency).

Machine learning-based framework for project analytics

When and why machine learning for project analytics.

Machine learning models are typically used for research problems that involve predicting the classification outcome of a categorical dependent variable. Therefore, they can be applied in the context of project analytics if the underlying objective variable is a categorical one. If that objective variable is non-categorical, it must first be converted into a categorical variable. For example, if the objective or target variable is the project cost, we can convert this variable into a categorical variable by taking only two possible values. The first value would be 0 to indicate a low-cost project, and the second could be 1 for showing a high-cost project. The average or median cost value for all projects under consideration can be considered for splitting project costs into low-cost and high-cost categories.

For data-driven decision-making, machine learning models are advantageous. This is because traditional statistical methods (e.g., ordinary least square (OLS) regression) make assumptions about the underlying research data to produce explicit formulae for the objective target measures. Unlike these statistical methods, machine learning algorithms figure out patterns on their own directly from the data. For instance, for a non-linear but separable dataset, an OLS regression model will not be the right choice due to its assumption that the underlying data must be linear. However, a machine learning model can easily separate the dataset into the underlying classes. Figure  2 (a) presents a situation where machine learning models perform better than traditional statistical methods.

figure 2

( a ) An illustration showing the superior performance of machine learning models compared with the traditional statistical models using an abstract dataset with two attributes (X 1 and X 2 ). The data points within this abstract dataset consist of two classes: one represented with a transparent circle and the second class illustrated with a black-filled circle. These data points are non-linear but separable. Traditional statistical models (e.g., ordinary least square regression) will not accurately separate these data points. However, any machine learning model can easily separate them without making errors; and ( b ) Traditional programming versus machine learning.

Similarly, machine learning models are compelling if the underlying research dataset has many attributes or independent measures. Such models can identify features that significantly contribute to the corresponding classification performance regardless of their distributions or collinearity. Traditional statistical methods have become prone to biased results when there exists a correlation between independent variables. Machine learning-based current studies specific to project analytics have been largely limited. Despite this, there have been tangential studies on the use of artificial intelligence to improve cost estimations as well as risk prediction 44 . Additionally, models have been implemented in the optimisation of existing processes 50 .

Machine learning versus traditional programming

Machine learning can be thought of as a process of teaching a machine (i.e., computers) to learn from data and adjust or apply its present knowledge when exposed to new data 42 . It is a type of artificial intelligence that enables computers to learn from examples or experiences. Traditional programming requires some input data and some logic in the form of code (program) to generate the output. Unlike traditional programming, the input data and their corresponding output are fed to an algorithm to create a program in machine learning. This resultant program can capture powerful insights into the data pattern and can be used to predict future outcomes. Figure  2 (b) shows the difference between machine learning and traditional programming.

Proposed machine learning-based framework

Figure  3 illustrates the proposed machine learning-based research framework of this study. The framework starts with breaking the project research dataset into the training and test components. As mentioned in the previous section, the research dataset may have many categorical and/or nominal independent variables, but its single dependent variable must be categorical. Although there is no strict rule for this split, the training data size is generally more than or equal to 50% of the original dataset 48 .

figure 3

The proposed machine learning-based data-driven framework.

Machine learning algorithms can handle variables that have only numerical outcomes. So, when one or more of the underlying categorical variables have a textual or string outcome, we must first convert them into the corresponding numerical values. Suppose a variable can take only three textual outcomes (low, medium and high). In that case, we could consider, for example, 1 to represent low , 2 to represent medium , and 3 to represent high . Other statistical techniques, such as the RIDIT (relative to an identified distribution) scoring 51 , can also be used to convert ordered categorical measurements into quantitative ones. RIDIT is a parametric approach that uses probabilistic comparison to determine the statistical differences between ordered categorical groups. The remaining components of the proposed framework have been briefly described in the following subsections.

Model-building procedure

The next step of the framework is to follow the model-building procedure to develop the desired machine learning models using the training data. The first step of this procedure is to select suitable machine learning algorithms or models. Among the available machine learning algorithms, the commonly used ones are support vector machine, logistic regression, k -nearest neighbours, artificial neural network, decision tree and random forest 52 . One can also select an ensemble machine learning model as the desired algorithm. An ensemble machine learning method uses multiple algorithms or the same algorithm multiple times to achieve better predictive performance than could be obtained from any of the constituent learning models alone 52 . Three widely used ensemble approaches are bagging, boosting and stacking. In bagging, the research dataset is divided into different equal-sized subsets. The underlying machine learning algorithm is then applied to these subsets for classification. In boosting, a random sample of the dataset is selected and then fitted and trained sequentially with different models to compensate for the weakness observed in the immediately used model. Stacking combined different weak machine learning models in a heterogeneous way to improve the predictive performance. For example, the random forest algorithm is an ensemble of different decision tree models 42 .

Second, each selected machine learning model will be processed through the k -fold cross-validation approach to improve predictive efficiency. In k -fold cross-validation, the training data is divided into k folds. In an iteration, the (k-1) folds are used to train the selected machine models, and the remaining last fold isF used for validation purposes. This iteration process continues until each k folds will get a turn to be used for validation purposes. The final predictive efficiency of the trained models is based on the average values from the outcomes of these iterations. In addition to this average value, researchers use the standard deviation of the results from different iterations as the predictive training efficiency. Supplementary Fig 1 shows an illustration of the k -fold cross-validation.

Third, most machine learning algorithms require a pre-defined value for their different parameters, known as hyperparameter tuning. The settings of these parameters play a vital role in the achieved performance of the underlying algorithm. For a given machine learning algorithm, the optimal value for these parameters can be different from one dataset to another. The same algorithm needs to run multiple times with different parameter values to find its optimal parameter value for a given dataset. Many algorithms are available in the literature, such as the Grid search 53 , to find the optimal parameter value. In the Grid search, hyperparameters are divided into discrete grids. Each grid point represents a specific combination of the underlying model parameters. The parameter values of the point that results in the best performance are the optimal parameter values 53 .

Testing of the developed models and reporting results

Once the desired machine learning models have been developed using the training data, they need to be tested using the test data. The underlying trained model is then applied to predict its dependent variable for each data instance. Therefore, for each data instance, two categorical outcomes will be available for its dependent variable: one predicted using the underlying trained model, and the other is the actual category. These predicted and actual categorical outcome values are used to report the results of the underlying machine learning model.

The fundamental tool to report results from machine learning models is the confusion matrix, which consists of four integer values 48 . The first value represents the number of positive cases correctly identified as positive by the underlying trained model (true-positive). The second value indicates the number of positive instances incorrectly identified as negative (false-negative). The third value represents the number of negative cases incorrectly identified as positive (false-positive). Finally, the fourth value indicates the number of negative instances correctly identified as negative (true-negative). Researchers also use a few performance measures based on the four values of the confusion matrix to report machine learning results. The most used measure is accuracy which is the ratio of the number of correct predictions (true-positive + true-negative) and the total number of data instances (sum of all four values of the confusion matrix). Other measures commonly used to report machine learning results are precision, recall and F1-score. Precision refers to the ratio between true-positives and the total number of positive predictions (i.e., true-positive + false-positive), often used to indicate the quality of a positive prediction made by a model 48 . Recall, also known as the true-positive rate, is calculated by dividing true-positive by the number of data instances that should have been predicted as positive (i.e., true-positive + false-negative). F1-score is the harmonic mean of the last two measures, i.e., [(2 × Precision × Recall)/(Precision + Recall)] and the error-rate equals to (1-Accuracy).

Another essential tool for reporting machine learning results is variable or feature importance, which identifies a list of independent variables (features) contributing most to the classification performance. The importance of a variable refers to how much a given machine learning algorithm uses that variable in making accurate predictions 54 . The widely used technique for identifying variable importance is the principal component analysis. It reduces the dimensionality of the data while minimising information loss, which eventually increases the interpretability of the underlying machine learning outcome. It further helps in finding the important features in a dataset as well as plotting them in 2D and 3D 54 .

Ethical approval

Ethical approval is not required for this study since this study used publicly available data for research investigation purposes. All research was performed in accordance with relevant guidelines/regulations.

Informed consent

Due to the nature of the data sources, informed consent was not required for this study.

Case study: an application of the proposed framework

This section illustrates an application of this study’s proposed framework (Fig.  2 ) in a construction project context. We will apply this framework in classifying projects into two classes based on their cost overrun experience. Projects rarely experience a delay belonging to the first class (Rare class). The second class indicates those projects that often experience a delay (Often class). In doing so, we consider a list of independent variables or features.

Data source

The research dataset is taken from an open-source data repository, Kaggle 55 . This survey-based research dataset was collected to explore the causes of the project cost overrun in Indian construction projects 45 , consisting of 44 independent variables or features and one dependent variable. The independent variables cover a wide range of cost overrun factors, from materials and labour to contractual issues and the scope of the work. The dependent variable is the frequency of experiencing project cost overrun (rare or often). The dataset size is 139; 65 belong to the rare class, and the remaining 74 are from the often class. We converted each categorical variable with a textual or string outcome into an appropriate numerical value range to prepare the dataset for machine learning analysis. For example, we used 1 and 2 to represent rare and often class, respectively. The correlation matrix among the 44 features is presented in Supplementary Fig 2 .

Machine learning algorithms

This study considered four machine learning algorithms to explore the causes of project cost overrun using the research dataset mentioned above. They are support vector machine, logistic regression, k- nearest neighbours and random forest.

Support vector machine (SVM) is a process applied to understand data. For instance, if one wants to determine and interpret which projects are classified as programmatically successful through the processing of precedent data information, SVM would provide a practical approach for prediction. SVM functions by assigning labels to objects 56 . The comparison attributes are used to cluster these objects into different groups or classes by maximising their marginal distances and minimising the classification errors. The attributes are plotted multi-dimensionally, allowing a separation line, known as a hyperplane , see supplementary Fig 3 (a), to distinguish between underlying classes or groups 52 . Support vectors are the data points that lie closest to the decision boundary on both sides. In Supplementary Fig 3 (a), they are the circles (both transparent and shaded ones) close to the hyperplane. Support vectors play an essential role in deciding the position and orientation of the hyperplane. Various computational methods, including a kernel function to create more derived attributes, are applied to accommodate this process 56 . Support vector machines are not only limited to binary classes but can also be generalised to a larger variety of classifications. This is accomplished through the training of separate SVMs 56 .

Logistic regression (LR) builds on the linear regression model and predicts the outcome of a dichotomous variable 57 ; for example, the presence or absence of an event. It uses a scatterplot to understand the connection between an independent variable and one or more dependent variables (see Supplementary Fig 3 (b)). LR model fits the data to a sigmoidal curve instead of fitting it to a straight line. The natural logarithm is considered when developing the model. It provides a value between 0 and 1 that is interpreted as the probability of class membership. Best estimates are determined by developing from approximate estimates until a level of stability is reached 58 . Generally, LR offers a straightforward approach for determining and observing interrelationships. It is more efficient compared to ordinary regressions 59 .

k -nearest neighbours (KNN) algorithm uses a process that plots prior information and applies a specific sample size ( k ) to the plot to determine the most likely scenario 52 . This method finds the nearest training examples using a distance measure. The final classification is made by counting the most common scenario or votes present within the specified sample. As illustrated in Supplementary Fig 3 (c), the closest four nearest neighbours in the small circle are three grey squares and one white square. The majority class is grey. Hence, KNN will predict the instance (i.e., Χ ) as grey. On the other hand, if we look at the larger circle of the same figure, the nearest neighbours consist of ten white squares and four grey squares. The majority class is white. Thus, KNN will classify the instance as white. KNN’s advantage lies in its ability to produce a simplified result and handle missing data 60 . In summary, KNN utilises similarities (as well as differences) and distances in the process of developing models.

Random forest (RF) is a machine learning process that consists of many decision trees. A decision tree is a tree-like structure where each internal node represents a test on the input attribute. It may have multiple internal nodes at different levels, and the leaf or terminal nodes represent the decision outcomes. It produces a classification outcome for a distinctive and separate part to the input vector. For non-numerical processes, it considers the average value, and for discrete processes, it considers the number of votes 52 . Supplementary Fig 3 (d) shows three decision trees to illustrate the function of a random forest. The outcomes from trees 1, 2 and 3 are class B, class A and class A, respectively. According to the majority vote, the final prediction will be class A. Because it considers specific attributes, it can have a tendency to emphasise specific attributes over others, which may result in some attributes being unevenly weighted 52 . Advantages of the random forest include its ability to handle multidimensionality and multicollinearity in data despite its sensitivity to sampling design.

Artificial neural network (ANN) simulates the way in which human brains work. This is accomplished by modelling logical propositions and incorporating weighted inputs, a transfer and one output 61 (Supplementary Fig 3 (e)). It is advantageous because it can be used to model non-linear relationships and handle multivariate data 62 . ANN learns through three major avenues. These include error-back propagation (supervised), the Kohonen (unsupervised) and the counter-propagation ANN (supervised) 62 . There are two types of ANN—supervised and unsupervised. ANN has been used in a myriad of applications ranging from pharmaceuticals 61 to electronic devices 63 . It also possesses great levels of fault tolerance 64 and learns by example and through self-organisation 65 .

Ensemble techniques are a type of machine learning methodology in which numerous basic classifiers are combined to generate an optimal model 66 . An ensemble technique considers many models and combines them to form a single model, and the final model will eliminate the weaknesses of each individual learner, resulting in a powerful model that will improve model performance. The stacking model is a general architecture comprised of two classifier levels: base classifier and meta-learner 67 . The base classifiers are trained with the training dataset, and a new dataset is constructed for the meta-learner. Afterwards, this new dataset is used to train the meta-classifier. This study uses four models (SVM, LR, KNN and RF) as base classifiers and LR as a meta learner, as illustrated in Supplementary Fig 3 (f).

Feature selection

The process of selecting the optimal feature subset that significantly influences the predicted outcomes, which may be efficient to increase model performance and save running time, is known as feature selection. This study considers three different feature selection approaches. They are the Univariate feature selection (UFS), Recursive feature elimination (RFE) and SelectFromModel (SFM) approach. UFS examines each feature separately to determine the strength of its relationship with the response variable 68 . This method is straightforward to use and comprehend and helps acquire a deeper understanding of data. In this study, we calculate the chi-square values between features. RFE is a type of backwards feature elimination in which the model is fit first using all features in the given dataset and then removing the least important features one by one 69 . After that, the model is refit until the desired number of features is left over, which is determined by the parameter. SFM is used to choose effective features based on the feature importance of the best-performing model 70 . This approach selects features by establishing a threshold based on feature significance as indicated by the model on the training set. Those characteristics whose feature importance is more than the threshold are chosen, while those whose feature importance is less than the threshold are deleted. In this study, we apply SFM after we compare the performance of four machine learning methods. Afterwards, we train the best-performing model again using the features from the SFM approach.

Findings from the case study

We split the dataset into 70:30 for training and test purposes of the four selected machine learning algorithms. We used Python’s Scikit-learn package for implementing these algorithms 70 . Using the training data, we first developed six models based on these six algorithms. We used fivefold validation and target to improve the accuracy value. Then, we applied these models to the test data. We also executed all required hyperparameter tunings for each algorithm for the possible best classification outcome. Table 1 shows the performance outcomes for each algorithm during the training and test phase. The hyperparameter settings for each algorithm have been listed in Supplementary Table 1 .

As revealed in Table 1 , random forest outperformed the other three algorithms in terms of accuracy for both the training and test phases. It showed an accuracy of 78.14% and 77.50% for the training and test phases, respectively. The second-best performer in the training phase is k- nearest neighbours (76.98%), and for the test phase, it is the support vector machine, k- nearest neighbours and artificial neural network (72.50%).

Since random forest showed the best performance, we explored further based on this algorithm. We applied the three approaches (UFS, RFE and SFM) for feature optimisation on the random forest. The result is presented in Table 2 . SFM shows the best outcome among these three approaches. Its accuracy is 85.00%, whereas the accuracies of USF and RFE are 77.50% and 72.50%, respectively. As can be seen in Table 2 , the accuracy for the testing phase increases from 77.50% in Table 1 (b) to 85.00% with the SFM feature optimisation. Table 3 shows the 19 selected features from the SFM output. Out of 44 features, SFM found that 19 of them play a significant role in predicting the outcomes.

Further, Fig.  4 illustrates the confusion matrix when the random forest model with the SFM feature optimiser was applied to the test data. There are 18 true-positive, five false-negative, one false-positive and 16 true-negative cases. Therefore, the accuracy for the test phase is (18 + 16)/(18 + 5 + 1 + 16) = 85.00%.

figure 4

Confusion matrix results based on the random forest model with the SFM feature optimiser (1 for the rare class and 2 for the often class).

Figure  5 illustrates the top-10 most important features or variables based on the random forest algorithm with the SFM optimiser. We used feature importance based on the mean decrease in impurity in identifying this list of important variables. Mean decrease in impurity computes each feature’s importance as the sum over the number of splits that include the feature in proportion to the number of samples it splits 71 . According to this figure, the delays in decision marking attribute contributed most to the classification performance of the random forest algorithm, followed by cash flow problem and construction cost underestimation attributes. The current construction project literature also highlighted these top-10 factors as significant contributors to project cost overrun. For example, using construction project data from Jordan, Al-Hazim et al. 72 ranked 20 causes for cost overrun, including causes similar to these causes.

figure 5

Feature importance (top-10 out of 19) based on the random forest model with the SFM feature optimiser.

Further, we conduct a sensitivity analysis of the model’s ten most important features (from Fig.  5 ) to explore how a change in each feature affects the cost overrun. We utilise the partial dependence plot (PDP), which is a typical visualisation tool for non-parametric models 73 , to display this analysis’s outcomes. A PDP can demonstrate whether the relation between the target and a feature is linear, monotonic, or more complicated. The result of the sensitivity analysis is presented in Fig.  6 . For the ‘delays in decisions making’ attribute, the PDP shows that the probability is below 0.4 until the rating value is three and increases after. A higher value for this attribute indicates a higher risk of cost overrun. On the other hand, there are no significant differences can be seen in the remaining nine features if the value changes.

figure 6

The result of the sensitivity analysis from the partial dependency plot tool for the ten most important features.

Summary of the case study

We illustrated an application of the proposed machine learning-based research framework in classifying construction projects. RF showed the highest accuracy in predicting the test dataset. For a new data instance with information for its 19 features but has not had any information on its classification, RF can identify its class ( rare or often ) correctly with a probability of 85.00%. If more data is provided, in addition to the 139 instances of the case study, to the machine learning algorithms, then their accuracy and efficiency in making project classification will improve with subsequent training. For example, if we provide 100 more data instances, these algorithms will have an additional 50 instances for training with a 70:30 split. This continuous improvement facility put the machine learning algorithms in a superior position over other traditional methods. In the current literature, some studies explore the factors contributing to project delay or cost overrun. In most cases, they applied factor analysis or other related statistical methods for research data analysis 72 , 74 , 75 . In addition to identifying important attributes, the proposed machine learning-based framework identified the ranking of factors and how eliminating less important factors affects the prediction accuracy when applied to this case study.

We shared the Python software developed to implement the four machine learning algorithms considered in this case study using GitHub 76 , a software hosting internet site. user-friendly version of this software can be accessed at https://share.streamlit.io/haohuilu/pa/main/app.py . The accuracy findings from this link could be slightly different from one run to another due to the hyperparameter settings of the corresponding machine learning algorithms.

Due to their robust prediction ability, machine learning methods have already gained wide acceptability across a wide range of research domains. On the other side, EVM is the most commonly used method in project analytics due to its simplicity and ease of interpretability 77 . Essential research efforts have been made to improve its generalisability over time. For example, Naeni et al. 34 developed a fuzzy approach for earned value analysis to make it suitable to analyse project scenarios with ambiguous or linguistic outcomes. Acebes 78 integrated Monte Carlo simulation with EVM for project monitoring and control for a similar purpose. Another prominent method frequently used in project analytics is the time series analysis, which is compelling for the longitudinal prediction of project time and cost 30 . Apparently, as evident in the present current literature, not much effort has been made to bring machine learning into project analytics for addressing project management research problems. This research made a significant attempt to contribute to filling up this gap.

Our proposed data-driven framework only includes the fundamental model development and application process components for machine learning algorithms. It does not have a few advanced-level machine learning methods. This study intentionally did not consider them for the proposed model since they are required only in particular designs of machine learning analysis. For example, the framework does not contain any methods or tools to handle the data imbalance issue. Data imbalance refers to a situation when the research dataset has an uneven distribution of the target class 79 . For example, a binary target variable will cause a data imbalance issue if one of its class labels has a very high number of observations compared with the other class. Commonly used techniques to address this issue are undersampling and oversampling. The undersampling technique decreases the size of the majority class. On the other hand, the oversampling technique randomly duplicates the minority class until the class distribution becomes balanced 79 . The class distribution of the case study did not produce any data imbalance issues.

This study considered only six fundamental machine learning algorithms for the case study, although many other such algorithms are available in the literature. For example, it did not consider the extreme gradient boosting (XGBoost) algorithm. XGBoost is based on the decision tree algorithm, similar to the random forest algorithm 80 . It has become dominant in applied machine learning due to its performance and speed. Naïve Bayes and convolutional neural networks are other popular machine learning algorithms that were not considered when applying the proposed framework to the case study. In addition to the three feature selection methods, multi-view can be adopted when applying the proposed framework to the case study. Multi-view learning is another direction in machine learning that considers learning with multiple views of the existing data with the aim to improve predictive performance 81 , 82 . Similarly, although we considered five performance measures, there are other potential candidates. One such example is the area under the receiver operating curve, which is the ability of the underlying classifier to distinguish between classes 48 . We leave them as a potential application scope while applying our proposed framework in any other project contexts in future studies.

Although this study only used one case study for illustration, our proposed research framework can be used in other project analytics contexts. In such an application context, the underlying research goal should be to predict the outcome classes and find attributes playing a significant role in making correct predictions. For example, by considering two types of projects based on the time required to accomplish (e.g., on-time and delayed ), the proposed framework can develop machine learning models that can predict the class of a new data instance and find out attributes contributing mainly to this prediction performance. This framework can also be used at any stage of the project. For example, the framework’s results allow project stakeholders to screen projects for excessive cost overruns and forecast budget loss at bidding and before contracts are signed. In addition, various factors that contribute to project cost overruns can be figured out at an earlier stage. These elements emerge at each stage of a project’s life cycle. The framework’s feature importance helps project managers locate the critical contributor to cost overrun.

This study has made an important contribution to the current project analytics literature by considering the applications of machine learning within project management. Project management is often thought of as being very fluid in nature, and because of this, applications of machine learning are often more difficult. Further, existing implementations have largely been limited to safety monitoring, risk prediction and cost estimation. Through the evaluation of machine learning applications, this study further demonstrates the uses for which algorithms can be used to consider and model the relationship between project attributes and cost overrun frequency.

The applications of machine learning in project analytics are still undergoing constant development. Within construction projects, its applications have been largely limited and focused on profitability or the design of structures themselves. In this regard, our study made a substantial effort by proposing a machine learning-based framework to address research problems related to project analytics. We also illustrated an example of this framework’s application in the context of construction project management.

Like any other research, this study also has a few limitations that could provide scopes for future research. First, the framework does not include a few advanced machine learning techniques, such as data imbalance issues and kernel density estimation. Second, we considered only one case study to illustrate the application of the proposed framework. Illustrations of this framework using case studies from different project contexts would confirm its robust application. Finally, this study did not consider all machine learning models and performance measures available in the literature for the case study. For example, we did not consider the Naïve Bayes model and precision measure in applying the proposed research framework for the case study.

Data availability

This study obtained research data from publicly available online repositories. We mentioned their sources using proper citations. Here is the link to the data https://www.kaggle.com/datasets/amansaxena/survey-on-road-construction-delay .

Venkrbec, V. & Klanšek, U. In: Advances and Trends in Engineering Sciences and Technologies II 685–690 (CRC Press, 2016).

Google Scholar  

Damnjanovic, I. & Reinschmidt, K. Data Analytics for Engineering and Construction Project Risk Management (Springer, 2020).

Book   Google Scholar  

Singh, H. Project Management Analytics: A Data-driven Approach to Making Rational and Effective Project Decisions (FT Press, 2015).

Frame, J. D. & Chen, Y. Why Data Analytics in Project Management? (Auerbach Publications, 2018).

Ong, S. & Uddin, S. Data Science and Artificial Intelligence in Project Management: The Past, Present and Future. J. Mod. Proj. Manag. 7 , 26–33 (2020).

Bilal, M. et al. Investigating profitability performance of construction projects using big data: A project analytics approach. J. Build. Eng. 26 , 100850 (2019).

Article   Google Scholar  

Radziszewska-Zielina, E. & Sroka, B. Planning repetitive construction projects considering technological constraints. Open Eng. 8 , 500–505 (2018).

Neely, A. D., Adams, C. & Kennerley, M. The Performance Prism: The Scorecard for Measuring and Managing Business Success (Prentice Hall Financial Times, 2002).

Kanakaris, N., Karacapilidis, N., Kournetas, G. & Lazanas, A. In: International Conference on Operations Research and Enterprise Systems. 135–155 Springer.

Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science 349 , 255–260 (2015).

Article   ADS   MathSciNet   CAS   PubMed   MATH   Google Scholar  

Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, 2014).

Book   MATH   Google Scholar  

Rahimian, F. P., Seyedzadeh, S., Oliver, S., Rodriguez, S. & Dawood, N. On-demand monitoring of construction projects through a game-like hybrid application of BIM and machine learning. Autom. Constr. 110 , 103012 (2020).

Sanni-Anibire, M. O., Zin, R. M. & Olatunji, S. O. Machine learning model for delay risk assessment in tall building projects. Int. J. Constr. Manag. 22 , 1–10 (2020).

Cong, J. et al. A machine learning-based iterative design approach to automate user satisfaction degree prediction in smart product-service system. Comput. Ind. Eng. 165 , 107939 (2022).

Li, F., Chen, C.-H., Lee, C.-H. & Feng, S. Artificial intelligence-enabled non-intrusive vigilance assessment approach to reducing traffic controller’s human errors. Knowl. Based Syst. 239 , 108047 (2021).

Mohri, M., Rostamizadeh, A. & Talwalkar, A. Foundations of Machine Learning (MIT press, 2018).

MATH   Google Scholar  

Whyte, J., Stasis, A. & Lindkvist, C. Managing change in the delivery of complex projects: Configuration management, asset information and ‘big data’. Int. J. Proj. Manag. 34 , 339–351 (2016).

Zangeneh, P. & McCabe, B. Ontology-based knowledge representation for industrial megaprojects analytics using linked data and the semantic web. Adv. Eng. Inform. 46 , 101164 (2020).

Akinosho, T. D. et al. Deep learning in the construction industry: A review of present status and future innovations. J. Build. Eng. 32 , 101827 (2020).

Soman, R. K., Molina-Solana, M. & Whyte, J. K. Linked-Data based constraint-checking (LDCC) to support look-ahead planning in construction. Autom. Constr. 120 , 103369 (2020).

Soman, R. K. & Whyte, J. K. Codification challenges for data science in construction. J. Constr. Eng. Manag. 146 , 04020072 (2020).

Soman, R. K. & Molina-Solana, M. Automating look-ahead schedule generation for construction using linked-data based constraint checking and reinforcement learning. Autom. Constr. 134 , 104069 (2022).

Shi, F., Soman, R. K., Han, J. & Whyte, J. K. Addressing adjacency constraints in rectangular floor plans using Monte-Carlo tree search. Autom. Constr. 115 , 103187 (2020).

Chen, L. & Whyte, J. Understanding design change propagation in complex engineering systems using a digital twin and design structure matrix. Eng. Constr. Archit. Manag. (2021).

Allison, J. T. et al. Artificial intelligence and engineering design. J. Mech. Des. 144 , 020301 (2022).

Dutta, D. & Bose, I. Managing a big data project: The case of ramco cements limited. Int. J. Prod. Econ. 165 , 293–306 (2015).

Bilal, M. & Oyedele, L. O. Guidelines for applied machine learning in construction industry—A case of profit margins estimation. Adv. Eng. Inform. 43 , 101013 (2020).

Tayefeh Hashemi, S., Ebadati, O. M. & Kaur, H. Cost estimation and prediction in construction projects: A systematic review on machine learning techniques. SN Appl. Sci. 2 , 1–27 (2020).

Arage, S. S. & Dharwadkar, N. V. In: International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). 594–599 (IEEE, 2017).

Cheng, C.-H., Chang, J.-R. & Yeh, C.-A. Entropy-based and trapezoid fuzzification-based fuzzy time series approaches for forecasting IT project cost. Technol. Forecast. Soc. Chang. 73 , 524–542 (2006).

Joukar, A. & Nahmens, I. Volatility forecast of construction cost index using general autoregressive conditional heteroskedastic method. J. Constr. Eng. Manag. 142 , 04015051 (2016).

Xu, J.-W. & Moon, S. Stochastic forecast of construction cost index using a cointegrated vector autoregression model. J. Manag. Eng. 29 , 10–18 (2013).

Narbaev, T. & De Marco, A. Combination of growth model and earned schedule to forecast project cost at completion. J. Constr. Eng. Manag. 140 , 04013038 (2014).

Naeni, L. M., Shadrokh, S. & Salehipour, A. A fuzzy approach for the earned value management. Int. J. Proj. Manag. 29 , 764–772 (2011).

Ponz-Tienda, J. L., Pellicer, E. & Yepes, V. Complete fuzzy scheduling and fuzzy earned value management in construction projects. J. Zhejiang Univ. Sci. A 13 , 56–68 (2012).

Yu, F., Chen, X., Cory, C. A., Yang, Z. & Hu, Y. An active construction dynamic schedule management model: Using the fuzzy earned value management and BP neural network. KSCE J. Civ. Eng. 25 , 2335–2349 (2021).

Bonato, F. K., Albuquerque, A. A. & Paixão, M. A. S. An application of earned value management (EVM) with Monte Carlo simulation in engineering project management. Gest. Produção 26 , e4641 (2019).

Batselier, J. & Vanhoucke, M. Empirical evaluation of earned value management forecasting accuracy for time and cost. J. Constr. Eng. Manag. 141 , 05015010 (2015).

Yang, R. J. & Zou, P. X. Stakeholder-associated risks and their interactions in complex green building projects: A social network model. Build. Environ. 73 , 208–222 (2014).

Uddin, S. Social network analysis in project management–A case study of analysing stakeholder networks. J. Mod. Proj. Manag. 5 , 106–113 (2017).

Ong, S. & Uddin, S. Co-evolution of project stakeholder networks. J. Mod. Proj. Manag. 8 , 96–115 (2020).

Khanzode, K. C. A. & Sarode, R. D. Advantages and disadvantages of artificial intelligence and machine learning: A literature review. Int. J. Libr. Inf. Sci. (IJLIS) 9 , 30–36 (2020).

Loyola-Gonzalez, O. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 7 , 154096–154113 (2019).

Abioye, S. O. et al. Artificial intelligence in the construction industry: A review of present status, opportunities and future challenges. J. Build. Eng. 44 , 103299 (2021).

Doloi, H., Sawhney, A., Iyer, K. & Rentala, S. Analysing factors affecting delays in Indian construction projects. Int. J. Proj. Manag. 30 , 479–489 (2012).

Alkhaddar, R., Wooder, T., Sertyesilisik, B. & Tunstall, A. Deep learning approach’s effectiveness on sustainability improvement in the UK construction industry. Manag. Environ. Qual. Int. J. 23 , 126–139 (2012).

Gondia, A., Siam, A., El-Dakhakhni, W. & Nassar, A. H. Machine learning algorithms for construction projects delay risk prediction. J. Constr. Eng. Manag. 146 , 04019085 (2020).

Witten, I. H. & Frank, E. Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, 2005).

Kanakaris, N., Karacapilidis, N. I. & Lazanas, A. In: ICORES. 362–369.

Heo, S., Han, S., Shin, Y. & Na, S. Challenges of data refining process during the artificial intelligence development projects in the architecture engineering and construction industry. Appl. Sci. 11 , 10919 (2021).

Article   CAS   Google Scholar  

Bross, I. D. How to use ridit analysis. Biometrics 14 , 18–38 (1958).

Uddin, S., Khan, A., Hossain, M. E. & Moni, M. A. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 19 , 1–16 (2019).

LaValle, S. M., Branicky, M. S. & Lindemann, S. R. On the relationship between classical grid search and probabilistic roadmaps. Int. J. Robot. Res. 23 , 673–692 (2004).

Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2 , 433–459 (2010).

Saxena, A. Survey on Road Construction Delay , https://www.kaggle.com/amansaxena/survey-on-road-construction-delay (2021).

Noble, W. S. What is a support vector machine?. Nat. Biotechnol. 24 , 1565–1567 (2006).

Article   CAS   PubMed   Google Scholar  

Hosmer, D. W. Jr., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression Vol. 398 (John Wiley & Sons, 2013).

LaValley, M. P. Logistic regression. Circulation 117 , 2395–2399 (2008).

Article   PubMed   Google Scholar  

Menard, S. Applied Logistic Regression Analysis Vol. 106 (Sage, 2002).

Batista, G. E. & Monard, M. C. A study of K-nearest neighbour as an imputation method. His 87 , 48 (2002).

Agatonovic-Kustrin, S. & Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 22 , 717–727 (2000).

Zupan, J. Introduction to artificial neural network (ANN) methods: What they are and how to use them. Acta Chim. Slov. 41 , 327–327 (1994).

CAS   Google Scholar  

Hopfield, J. J. Artificial neural networks. IEEE Circuits Devices Mag. 4 , 3–10 (1988).

Zou, J., Han, Y. & So, S.-S. Overview of artificial neural networks. Artificial Neural Networks . 14–22 (2008).

Maind, S. B. & Wankar, P. Research paper on basic of artificial neural network. Int. J. Recent Innov. Trends Comput. Commun. 2 , 96–100 (2014).

Wolpert, D. H. Stacked generalization. Neural Netw. 5 , 241–259 (1992).

Pavlyshenko, B. In: IEEE Second International Conference on Data Stream Mining & Processing (DSMP). 255–258 (IEEE).

Jović, A., Brkić, K. & Bogunović, N. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). 1200–1205 (Ieee, 2015).

Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46 , 389–422 (2002).

Article   MATH   Google Scholar  

Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Louppe, G., Wehenkel, L., Sutera, A. & Geurts, P. Understanding variable importances in forests of randomized trees. Adv. Neural. Inf. Process. Syst. 26 , 431–439 (2013).

Al-Hazim, N., Salem, Z. A. & Ahmad, H. Delay and cost overrun in infrastructure projects in Jordan. Procedia Eng. 182 , 18–24 (2017).

Breiman, L. Random forests. Mach. Learn. 45 , 5–32. https://doi.org/10.1023/A:1010933404324 (2001).

Shehu, Z., Endut, I. R. & Akintoye, A. Factors contributing to project time and hence cost overrun in the Malaysian construction industry. J. Financ. Manag. Prop. Constr. 19 , 55–75 (2014).

Akomah, B. B. & Jackson, E. N. Contractors’ perception of factors contributing to road project delay. Int. J. Constr. Eng. Manag. 5 , 79–85 (2016).

GitHub: Where the world builds software , https://github.com/ .

Anbari, F. T. Earned value project management method and extensions. Proj. Manag. J. 34 , 12–23 (2003).

Acebes, F., Pereda, M., Poza, D., Pajares, J. & Galán, J. M. Stochastic earned value analysis using Monte Carlo simulation and statistical learning techniques. Int. J. Proj. Manag. 33 , 1597–1609 (2015).

Japkowicz, N. & Stephen, S. The class imbalance problem: A systematic study. Intell. data anal. 6 , 429–449 (2002).

Chen, T. et al. Xgboost: extreme gradient boosting. R Packag. Version 0.4–2.1 1 , 1–4 (2015).

Guarino, A., Lettieri, N., Malandrino, D., Zaccagnino, R. & Capo, C. Adam or Eve? Automatic users’ gender classification via gestures analysis on touch devices. Neural Comput. Appl. 1–23 (2022).

Zaccagnino, R., Capo, C., Guarino, A., Lettieri, N. & Malandrino, D. Techno-regulation and intelligent safeguards. Multimed. Tools Appl. 80 , 15803–15824 (2021).

Download references

Acknowledgements

The authors acknowledge the insightful comments from Prof Jennifer Whyte on an earlier version of this article.

Author information

Authors and affiliations.

School of Project Management, The University of Sydney, Level 2, 21 Ross St, Forest Lodge, NSW, 2037, Australia

Shahadat Uddin, Stephen Ong & Haohui Lu

You can also search for this author in PubMed   Google Scholar

Contributions

S.U.: Conceptualisation; Data curation; Formal analysis; Methodology; Supervision; and Writing (original draft, review and editing) S.O.: Data curation; and Writing (original draft, review and editing) H.L.: Methodology; and Writing (original draft, review and editing) All authors reviewed the manuscript).

Corresponding author

Correspondence to Shahadat Uddin .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Uddin, S., Ong, S. & Lu, H. Machine learning in project analytics: a data-driven framework and case study. Sci Rep 12 , 15252 (2022). https://doi.org/10.1038/s41598-022-19728-x

Download citation

Received : 13 April 2022

Accepted : 02 September 2022

Published : 09 September 2022

DOI : https://doi.org/10.1038/s41598-022-19728-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Evaluation and prediction of time overruns in jordanian construction projects using coral reefs optimization and deep learning methods.

  • Jumana Shihadeh
  • Ghyda Al-Shaibie
  • Hamza Al-Bdour

Asian Journal of Civil Engineering (2024)

Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets

  • Shahadat Uddin

Health and Technology (2024)

Prediction of SMEs’ R&D performances by machine learning for project selection

  • Hyoung Sun Yoo
  • Ye Lim Jung
  • Seung-Pyo Jun

Scientific Reports (2023)

A robust and resilience machine learning for forecasting agri-food production

  • Amin Gholamrezaei
  • Kiana Kheiri

Scientific Reports (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

analytical framework case study

  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, an indicator measuring the influence of the online public food environment: an analytical framework and case study.

analytical framework case study

  • 1 Department of Earth System Science, Institute for Global Change Studies, Ministry of Education Ecological Field Station for East Asian Migratory Birds, Tsinghua University, Beijing, China
  • 2 Vanke School of Public Health, Tsinghua University, Beijing, China
  • 3 Department of Geography and Resource Management and Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
  • 4 Ministry of Education Ecological Field Station for East Asian Migratory Birds, Tsinghua University, Beijing, China
  • 5 Department of Geography and Earth Sciences, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China

The online public food environment (OPFE) has had a considerable impact on people's lifestyles over the past decade; however, research on its exposure is sparse. The results of the existing research on the impact of the food environment on human health are inconsistent. In response to the lack of food elements in the definition of the food environment and the lack of a clear method to assess the health attributes and the impact degree of the food environment, we proposed a new analytical framework based on the latest disease burden research, combining the characteristics of China's current food environment, from the perspective of environmental science. We redefined the food environment and proposed that food and its physical space are two core elements of the food environment. Accordingly, we extracted four domains of characteristics to describe the basic components of the food environment. Using the sales records, we designed an approach by referring to the standard process of environmental health indicators, including the health attributes and the impact degree of the food environment, to measure the OPFE of takeaway food outlets. Further, we conducted a case study and extracted three domains of characteristics for more than 18,000 effective takeaway meals from 812 takeaway food outlets located in 10 administrative subdivisions in the Haidian District and Xicheng District of Beijing Municipality. The results showed that more than 60% of single meals sold by takeaway food outlets were considered as healthy, and only 15% of takeaway food outlets sold healthy meals exclusively. Additionally, there were significant differences in health effects among different types of food environments, and high-risk areas of different types of food environments can be spatially identified. Compared with the counting method in the availability of food environment, the proposed new approach can depict food environment characteristics not only in the macro-scale like the counting method but also in the meal-scale. The indicators could be useful for large-scale and long-term monitoring of food environmental changes due to their simple calculation and design depending on the food delivery platform.

Introduction

Currently, many countries have multiple forms of malnutrition, from the individual to the national scale ( 1 ). The Global Nutrition Report 2018 notes that about 88% of the 141 countries it analyzed have been experiencing more than one form of malnutrition, and 29% of countries have high levels of malnutrition in all three forms ( 2 ). In the form of malnutrition, the prevalence of overweight and obesity has increased in many developing countries over the past four decades ( 3 ). Among them, the prevalence of overweight and obesity in China is worrying. According to a recent official report, more than half of Chinese adults are overweight or obese ( 4 ). Obesity is not only a disease, but also a symptom ( 5 ) and cause ( 6 , 7 ) of many chronic diseases. Previous practical intervention, such as educational, behavioral, and pharmacological ( 8 ), have been ineffective, as no country has successfully prevented the obesity pandemic ( 9 ). Additionally, undernutrition is also a pressing concern, especially in the low-and middle-income countries (LMICs) ( 10 ). Although the undernutrition- and micronutrient malnutrition-related health problems have been mitigated in China ( 4 ), these forms of malnutrition interweaving with overweight and obesity ( 10 ) make further prevention of malnutrition more difficult. As one of the many same underlying causes of malnutrition, the food environment has become a studied object to explore environmental interventions to mitigate malnutrition.

In its original definition, food environments comprise all “collective physical, economic, policy and socio-cultural surroundings, opportunities and conditions that influence people's food and beverage choices and nutritional status, such as food composition, food labeling, food promotion, food prices, food provision in schools and other settings, food availability and trade policies affecting food availability, price and quality” [( 11 ), p. 8]. This definition broadly delimited the boundary of food environment but did not pinpoint the relationships among all components, and some food features, such as nutrition were considered beyond the scope of food environment research ( 12 ). Although a study reorganized two domains of the food environment measurements ( 13 ), parts of dimensions in the personal domain, such as desirability, were out of the scope of the food environment ( 14 ). Many studies in high-income countries have examined different dimensions of food environment measurement, within different conceptual frameworks ( 13 , 15 – 18 ). However, no unified framework of measurement of the food environment has emerged. A recent study summarized eight measurement dimensions for low- and middle-income countries—accessibility, affordability, desirability, convenience, availability, price, product characteristics, and promotion/marketing. Of these, only four (availability, accessibility, affordability, and price) maintain consistency across frameworks ( 14 ). The availability and accessibility dimensions are the most frequently studied ( 19 ). In these two measurements, the characterization of the physical food environment includes different types of food outlets, such as fast-food outlets, supermarkets, grocery stores, etc., which are defined directly as healthy or unhealthy. A recent study evaluated the differences of characteristics from food supplied by different retail food outlets, such as supermarkets, grocery stores, and convenience stores across different types of neighborhoods ( 20 ); thus, we focus on the fast-food outlets in this study.

Most studies on the impact of a fast-food environment on human health proposed a latent hypothesis: that fast-food outlets are unhealthy because they sell energy-dense foods and drinks ( 21 ), especially in Europe and America. However, a recent study compared the food provided by different food outlets and found that not all fast-foods were unhealthy ( 22 ). In two other analyses, many other types of food outlets provided fast-food, including supermarkets and grocery stores, which are typically considered healthy food outlets ( 23 , 24 ). Further, another latent hypothesis is that most studies considered the impact of food outlets to be the same on human health, except one study that weighted the food environment based on the rating of the experts ( 25 ). This hypothesis cannot reflect reality and therefore has been questioned in a recent study ( 26 ). Another study discussed how the error caused by this hypothesis impacts the result from the association model of food access and human health ( 27 ). All these issues might cause discrepant conclusions among studies on the association between the food environment and human health. Therefore, Cobb et al. ( 28 ) suggested the exploration of additional measurements.

While, research on the food environment in China is still in its early stages, several studies on the impact of the neighborhood food environment on people's diet and obesity have been conducted but with limited results ( 29 ). Thus, the abovementioned issues cannot be solved in China owing to the lack of a systematic framework and standard tools ( 30 ). The way to acquire food in China is changing tremendously. Currently, takeaway is the most popular form of food in China. The trend has been driven by the development of mobile internet and increasingly efficient delivery systems over the past decade ( 31 ). Although this has made the food environment in China more complex ( 32 ), the cumulative internet catering big data have also created better opportunities to explore human dietary behavior and characteristics of the food environment. The food delivery platform (FDP) is a potential tool for monitoring the food environment and mitigating the overweight and obesity concerns in China for four reasons. First, takeaway food outlets provide cooked food products — they can be consumed immediately upon reception; every single package may represent the individual's total potential food consumption per meal (food order). These food orders are recorded by an FDP ( 31 ), along with other food-related attributes, such as price, promotions, packaging fees, distribution fees, and the type and weight of ingredients. Second, FDPs cover most cities in China and have accumulated data in metropolitan areas for nearly 10 years. The broad spatial and longitudinal monitoring of food orders provides an unprecedented amount of data for food environment research ( 33 ). Third, the cost of data collection is lower than that of other types of food environment constructs ( Table 1 ). Finally, FDPs have had a considerable impact on contemporary lifestyles. An important character is that the convenience of FDPs can save more time and make consumers' time-use more diverse. A qualitative study from Guangzhou found that at least 2 h a day could be “saved” by using FDPs ( 34 ). However, FDPs potentially promote a sedentary lifestyle and the consumption of unhealthy food ( 35 ), which is harmful to one's health, and food waste and packaging are harmful to the environment. Thus, to guide the change of the food environment toward a healthy and sustainable direction, FDPs will inevitably become the main intervention area to promote human health and environmental sustainability ( 31 ).

www.frontiersin.org

Table 1 . Comparison of characteristics of different types of food outlets.

With the above context, this study designed a new indicator to measure the online public food environment for takeaway food (OPFE-TF) and developed an approach to define the healthiness of food outlets and their various degrees of impact. Therefore, the indicators developed in this study were mainly used to monitor the characteristics of OPFE-TF, recognize the high unhealthy food risk areas, and prepare to model the associations between OPFE-TF and human health.

Specifically, we explored the following research questions: (1) How different are the healthiness and nutrition of food in different fast-food outlets? (2) How different is the health impact of different fast-food outlets? Considering the indicator measuring the OPFE-TF is an environmental health indicator (EHI), it was designed according to standard steps in the construction of EHIs ( 36 ). Supplementary Figure 1 shows the coupling of the structure of this study and the construction steps of EHI. In Section 2 Analytical framework and measurement of OPFE-TF, we defined the exposure-effect relationship between the food environment and human health, target point in Driving force-Pressure-State-Exposure-Effect-Action (DPSEEA) chain ( 37 ), and parameters on which the indicator will be based. Further, we prepared the data to test the indicator, evaluated its performance, and tried to answer the two research questions in Sections 3 and 4 Analytical framework and measurement of OPFE-TF and Case study.

Analytical Framework and Measurement of OPFE-TF

The relationship between food environment and populations.

In a systematic analysis about the global burden of disease, many risk factors for attributable deaths were correlated to food, such as dietary risk, high systolic blood pressure, high fasting plasma glucose, high low-density lipoprotein cholesterol, and child and maternal malnutrition, and so on ( 38 ). The food environment includes the main physical places for food storage and distribution in the food system ( 17 ), and it impacts human health in the process of interacting with people through food. Figure 1 shows that the food supplied by the food outlets is consumed by customers, making food the key element connecting people and food environment. Hence, it can be considered the core path through which the food environment affects people's health. Based on this relationship between the food environment and populations, food could be considered as the boundary of the food environment and connect the food environment and human population. The food environment can include the food attributes but not personal domains, such as desires, tastes, or attitudes. Thus, the characteristics of the food environment should be depicted by food attributes when we discuss the relationship between food environment and human health, and it should be consistent with the method to assess human dietary quality in nutrition science. All subsequent design of food environmental indicators would adhere to this principle.

www.frontiersin.org

Figure 1 . The fundamental fact of the relationship between food environment and populations.

The Analytical Framework for OPFE-TF

According to the relationship between the food environment and populations, food could be considered as the boundary of the food environment linking the food environment and human population. This indicated that the food environment can include the food attributes, but not personal domains, such as desires, tastes, or attitudes. Based on this relationship, we constructed the analytical framework for OPFE-TF ( Figure 2 ). The framework mapped the corresponding relationship between food environment-food-people and each operating system of the delivery platform. It illustrated how food connected the food environment and people during the transfer process between them. Following the analytical framework and the above principle, we proposed a new food environment definition:

www.frontiersin.org

Figure 2 . An analytical framework of the online public food environment.

The food environment is a new type of environment composed of the food and the physical space carrying the food, as the interface between humans and the broader food system, interacting with the covered population constantly through economic, social, and cultural factors and influencing human health .

This is a systematic definition from an environmental perspective. In the new definition, there are two key elements of the food environment: the food and the physical space carrying it. Therefore, the measurement of the food environment is separated into two levels. At each level, the dimensions describing the element of the food environment should be consistent. The only physical space that cannot be a food environment is one without food in it. Thus, the food is the precondition, and food attributes are fundamental to depict food environment characteristics.

The dietary risk factors identified in the global burden of disease research included 15 level III factors: diet low in fruits, diet low in vegetables, diet low in legumes, diet low in whole grains, diet low in nuts and seeds, diet low in milk, diet high in red meat, diet high in processed meat, diet high in sugar-sweetened beverages, diet low in fiber, diet low in calcium, diet low in seafood omega-3 fatty acids, diet low in polyunsaturated fatty acids, diet high in trans fatty acids, and diet high in sodium ( 7 , 38 – 40 ). However, in reality, food impacts human health through steady accumulation, one meal at a time. Furthermore, most takeaway food on FDPs is sold in the form of one meal or a group of meals. Considering the single meal as the basic unit, we designed the indicator based on this unit in the environmental scale.

Among the dietary risk factors, trans fatty acids, and sodium typically come from oil and salt and are used as condiments in cooking; sugar is also a popular ingredient used in sugar-sweetened beverages, which may aggravate the excessive intake of energy. Therefore, we used oil, salt, and sugar to describe the health attributes of food. Fruits, vegetables, legumes, whole grains, and the rest of the factors are associated with food groups, and different combinations of foods from these food groups can form different types of meals. Quantitative calculation of food groups can evaluate dietary quality, a nutrition attribute. Therefore, food groups and their combinations can be used to describe the categories and nutritional characteristics of the food environment. Further, food prices, an important social and economic indicator, also affect the interactions between people and the food environment ( 13 ); therefore, price should also be a dimension to measure the food environment.

In sum, characteristics of food elements included four dimensions: nutrition, healthiness, category, and price, which are the factors considered by people most when choosing food in the analytical framework. To clarify the differences and similarities between the traditional food environment and the new one, the corresponding relationships among these measurement dimensions are mapped in Figure 3 . In these four dimensions, price is a well-defined socioeconomic indicator; however, it is not suitable for depicting the physical characteristics of the food environment. Thus, we only designed the measurement of the other three dimensions.

www.frontiersin.org

Figure 3 . The transformational relationship among dimensions of the original food environment and the new food environment.

The indicator proposed in this study was mainly used to identify unhealthy food outlets with high impact degree. Thus, the target point of this indicator was the state of the environment in the DPSEEA framework chain ( 37 ).

Parameters of the Indicator

In various studies ( 41 – 45 ), many food outlets were tagged by the Standard Industrial Classification or the North American Industry Classification System (NAICS) ( 46 ) or by researchers themselves ( 47 , 48 ). However, these classification systems categorize economic entities based on their economic activities but not on the nutritional characteristics of their sold food. Take the “Restaurants and Other Eating Places 1 ” in NAICS for instance, the classification of the other three categories is mainly based on the degree of service provided except for snacks and non-alcoholic beverages. In full-service and limited-service catering enterprises, some food outlets may serve similar food, such as pizzerias and steakhouses. Nevertheless, according to the basic rules of taxonomy ( 49 ), the NAICS classification system cannot effectively distinguish the characteristics of food served by outlets. The Industrial Classification for National Economic Activities in China ( 50 ) has the same problem. Therefore, a new classification system is necessary for fast-food outlets.

The balanced diet plate in the 2016 edition of the Dietary Guidelines for Chinese residents is a simplified way to describe the food composition and approximate proportion of a person's meal in accordance with the principle of a balanced diet without considering cooking oil and salt. As oil, salt, and sugar have been separated as a dimension to describe health attributes of food environment in the above definition, the balanced diet plate of Chinese residents can be used as a reference for the classification. Along with a previous food classification method ( 51 ) and the popular meal categories, a classification system for fast-food outlets and meals was proposed referring to the prior method of dietary patterns ( Table 2 ). A two-level classification system was developed containing 3 Level I (Meal, Snacks and Beverage, and Other) and 14 Level II categories with corresponding Code I and Code II.

www.frontiersin.org

Table 2 . The classification system for food outlets and meals.

In the Meal category, there were eight Level II categories, in which many foods were set by food outlets in advance, such as staple foods, set meals, noodles and dumplings, western fast-food, and healthy light recipes. These types of meals can be easily recognized in a single meal. However, most pot meals (POT) and fried and barbecue meals (F_BBQ) were selected by consumer, and the seafood (SF) was sold by weight. Therefore, it is difficult to recognize these three types of meal in a single meal. One possible solution is to determine the category from the consumer side, because the order could be viewed as the basic unit of sales.

Moreover, the unknown type in the Other category was mainly used to describe features of the food outlet, not the food. The classification system should be revised if there are meals that do not belong to any existing categories. Additionally, western fast-food rushed in the catering market after the Chinese reform and opening-up. Their standard food processing and cooking mode were learned by other Chinese catering enterprises, which boosted the western fast-food supply considerably. The western fast-food was set to compare the differences of characteristics between Chinese and western fast-food and to explore the different impacts of them on Chinese citizens' health in the future.

The Rules for Categorizing Meals and Food Outlets

Rules for Classification of Meals

a. When a meal is composited by a single Level II food in the classification system, this type of meal corresponds to Level II type.

b. When a meal is composited by same or similar quantities of multiple Level I food, the type of the meal is decided by the priority of the category, which is decided by its code. Code 1 represents the highest priority. Subsequently, the food category is defined according to the number of Level II meal items.

c. When a meal is composed of multiple Level I food items in different quantities, the food category is defined according to the most of Level II food items.

Rules for the Classification of Food Outlets

a. Similar to the meal classification rules, when a food outlet sells only one meal type, it will be defined by the meal category.

b. When the food outlet sells various meal types, the type of the food outlet is defined by the meal types it sells most.

c. When the food outlet sells the same or a similar number of meal types, the type of the food outlet will be defined as the Unknown.

Healthiness

In the definition of the food environment, oil, salt, and sugar were grouped to describe the healthiness attribute of the food and food outlets. There are limitation standards for oil, salt, and sugar in most dietary guides across countries ( 52 – 55 ); thus, the healthiness of the meal can be ideally determined based on whether the weight of the oil, salt, or sugar exceeds the limitation standards. Pragmatically, however, it is difficult to obtain the actual content of oil, salt, and sugar in a meal, especially for takeaway food. Moreover, the intake standard of oil, salt, and sugar required in the dietary guidelines is usually calculated based on a person's daily or weekly intake. No study has discussed the intake standard of oil, salt, and sugar in a single meal. Therefore, the measurement of the healthiness of the meal is mainly characterized by a proxy method at present:

l is the healthy score of a meal, and j is a separate health label, calculated as follows:

The maximum value of l is 3, indicating that all of fried food, sweetened sugar beverage, or sauces or pickles with high salt are in a meal, it is the unhealthiest meal. The minimum value of l is 0, implying that neither of fried food, sweetened sugar beverage, or sauces or pickles with high salt in meal, it is the healthiest meal.

The nutrition dimension was designed based on the principle proposed in the exposure-effect relationship between the food environment and human health. Nutrition is related to people's dietary quality. There are many individual dietary assessment methods in nutrition science, such as dietary diversity score (DDS) ( 56 ), healthy eating index ( 57 ), and diet quality index ( 58 ). Except DDS, most of the other dietary assessment methods require at least two of the following three key variables: food groups, intakes, and referenced intakes. However, for the OPFE-TF, it is difficult to obtain the weight of the raw food. Owing to the consideration of the proportion of different food groups in a meal in the category dimension along with its simple calculation, which is suitable to assess the nutrition characteristic of OPFE-TF, the DDS became a prime candidate. Furthermore, there are three reasons for choosing DDS as an indicator construct. First, DDS is designed by the Food and Agriculture Organization to assess whether an individual or family's food intake is adequately nutritious ( 56 ). In practical application, although there is some controversy concerning the association between DDS and population health ( 59 , 60 ), it is still a useful indicator of overall diet quality, especially in large-scale surveys ( 60 ). Second, DDS characterizes the nutritional properties of food, and provides an opportunity to assess the sustainability of the food environment as a link to the ecosystem ( 61 ). Finally, the data for calculating DDS are easier to obtain from the FDPs compared with other nutritional indicators. Although food outlets on FDPs have not yet fully labeled the type and weight of their food ( 31 ), this shortcoming can be easily overcome by using incentives.

In sum, the DDS is the most suitable indicator to assess the nutrition dimension of the food environment till now. DDS is calculated as follows:

Here, DDS is the dietary diversity score of a meal in the food outlet, and k is the index of food groups in a meal. Further, FG is the food group in a meal, and m is the total number of food groups in a meal.

Indicators for Measuring OPFE-TF

Among the three parameters, category is mainly used for the qualitative description of the food environment, and the healthiness and nutrition parameters are mainly used for quantitative measurements. For the indicator to clearly express a healthy or unhealthy food environment with a high impact, the actual meanings of these parameters should be analyzed in depth.

For the healthy score of a meal, a higher value indicates an unhealthier meal. A change in the direction of this parameter is consistent with that of an unhealthy food environmental indicator. When the value of the healthy score of a meal is 0, the meal is healthy. DDS is an indicator used to evaluate the overall dietary diversity of the population. A larger DDS value indicates abundant nutrition of the meal. A smaller DDS value indicates a poor nutritional profile of the meal. Thus, the parameter change direction of DDS is opposite to the unhealthy food environmental indicators. Therefore, for an unhealthy food environment, a transformation of DDS was necessary. As the original value of DDS is a non-zero, non-negative integer, and DDS is meaningless only when the food does not exist, a reciprocal transformation for DDS is conducted to change the direction of DDS value consistent with other parameters.

Sales is the accumulative total food consumption in a period. For OPFE-TP, the sale of a meal is recorded for 1 month. The higher the value of the sale, the more the meal is consumed, indicating that the meal in this food outlet has a strong impact on the population. However, sales have no clear impact direction. The healthy score and DDS are used to determine the healthiness and nutrition direction to construct the following indicator: Total unhealthy impact degree of the food environment (TUHII). It is calculated as follows:

Here, i is the index of the meal in a food outlet, and n is the total number of meals in a food outlet. Further, l i is the healthy score of the i meal, DDS i is the dietary diversity score of the i meal, and S i is the sale in a month for the i meal. Additionally, 1 D D S i ¯ is the normalized reciprocal of DDS i .

Equation (4) measures the total unhealthy impact degree of food environment. The higher the value of TUHII, the stronger is the impact of unhealthy food outlets. However, when l i is equal to 0, there will be no impact of healthy food outlets in the total food environment. With the increasing health awareness of Chinese citizens, healthy food outlets generated a more positive high impact on human health. The modeling on the association between the food environment and human health would involve more confounders if the impact of the healthy food outlets cannot be considered. Therefore, the result of the modeling would be inconsistent and invalid in supporting the designing of intervention strategies. Additionally, healthy food outlets should be fully considered when the food environment is expected to benefit human health. The impact of a healthy and unhealthy food environment is also expected to be compared horizontally. Hence, we divided food outlets into two groups by the health weight, which is calculated as follows:

Here, W is the health weight of the food outlet, N h is the total number of healthy meals, and N t is the total number of meals.

The value of W is in 0–1. A food outlet with health weight 1 sold only healthy meals, while all meals sold in a food outlet with 0 health weight are unhealthy. Specifically, every meal sold in this food outlet included at least one unhealthy food. When the health weight is more than 0 and <1, the food outlet sold the healthy and unhealthy meals simultaneously. The higher the W is, the healthier meals are sold in this food outlet.

According to W and the meaning of DDS and the reciprocal of DDS, we divided the food outlets into two groups and separately constructed the impact degree for healthy food environment (HII) and unhealthy food environment (UHII), which are calculated as follows:

When W is equal to 1,

When W is less than 1,

Here, i and n are the index and total number of meals in a food outlet, respectively. Further, DDS i is the dietary diversity score of the i meal, and S i is the sale in a month for the i meal. Additionally, 1 D D S i ¯ is the normalized reciprocal of DDS i .

In Equation (6), a higher value of HII corresponds to a stronger impact of the healthy food outlet. In Equation (7), a higher value of UHII, corresponds to a stronger impact of the unhealthy food outlet.

Other Steps of EHI

For the statistical analysis, there are two ways to assess the food environment: (1) calculate the cumulative impact of all food outlets within a certain area, or (2) calculate the number of food outlets with the strongest unhealthy impact within a certain area. The first is the basic analysis method used in this study. There are many ways to aggregate this indicator in geography, such as the administrative areas, unit area or population, and buffered area for the food environment itself. We used administrative and unit areas to evaluate the differences of the impact degree of food environment geographically. To the best of our knowledge, this study is the first to discuss the fast-food environment assessment in China; therefore, there were no baseline data for reference. According to the method of geographic aggregation, maps are more suitable to express the result of the food environment assessment.

Data Description and Processing

The Meituan Group is the largest FDP in China, with a market share of 68.2% in the second quarter of 2020 ( 62 ). The food outlets registered on this platform were collected by crawler technology in November 2020, yielding 42,002 food outlets, and a number of associated meals in Beijing. According to the Business Information Database of RESSET 2 Enterprise Big Data Platform, there were approximately 64,000 catering enterprises in Beijing by the end of December 2020. The food outlet data we collected, covered most of the online food services in Beijing apart from the professional food services at the airport, schools, and some hotels providing only offline services. The data of food outlets only selling snacks and beverages cannot be collected owing to feasibility and other limitations. Thus, we only tested the food environment constructed by food outlets in the meal category.

Most fast-food outlets located in the central urban area and the characteristics of the food environment are a long-term cumulative result of the interaction between the food outlets and consumers. Thus, 10 subdistricts were selected from the Haidian and Xicheng districts in Beijing: Balizhuang (BLZ), Beixiaguan (BXG), Ganjiakou (GJK), Shuguang (SG), Yongdinglu (YDL), Yangfangdian (YFD), Zizhuyuan (ZZY), Xinjiekou (XJK), Yuetan (YT), and Zhanlanlu (ZLL). The study area is shown in Figure 4 . After spatial linking, more than 2,000 food outlets were selected in these 10 subdistricts. Details of the data processing are mapped in Supplementary Figure 2 . As we constructed the indicator in the unit of one meal, we excluded the records of more than one meal and other items: (i) sauce ingredients; (ii) separate beverages or other drinks; (iii) concomitant food, such as that which cannot be delivered if it is solely ordered, or some packaged food; (iv) foods in the POT, fried and BBQ, or seafood categories; (v) set meals for multiple persons or group meals; (vi) separate stir-fried dishes, cold dishes, or separate staple foods (rice, steamed bread, etc.) that can be selected by consumers independently; (vii) items sold separately by delicatessen; (viii) single soups; (ix) other meals not in the Meal category, such as cakes, candies, dried fruits, and other snacks; and (x) other foods that cannot be recognized as a meal. After screening, nearly 20,000 meal samples from 847 food outlets were selected for interpreting and extracting food features.

www.frontiersin.org

Figure 4 . The study area and the spatial distribution of interpreted food outlets. BLZ, Balizhuang; BXG, Beixiaguan; GJK, Ganjiakou; SG, Shuguang; XJK, Xinjiekou; YFD, Yangfangdian; YDL, Yongdinglu; YT, Yuetan; ZLL, Zhanlanlu; ZZY, Zizhuyuan.

The food group used to calculate DDS is divided into 12 groups referring to Chinese research ( 63 ) and food composition tables ( 64 , 65 ): (1) cereals; (2) roots and tubers; (3) vegetables; (4) mushroom and seafood (plant); (5) meat, poultry, and offal; (6) eggs; (7) fish and seafood; (8) pulses and legumes; (9) nuts; (10) dairy products; (11) fruits; and (12) miscellaneous including condiments, snacks, and beverages. We identified food groups from food photographs and meal names ( Supplementary Figure 3 ). As it was difficult to identify oil, sugar, and salt content from a food photograph directly, we tagged a meal with an unhealthy label, based on whether the meal contained food that was fried or high in sugar or salt content. It was easy to discern fried food. Regarding high-sugar or high-salt food, we tagged them by considering whether or not the meal included a sweetened sugar beverage (identified by the food label in beverage products), sauces, or pickles with high-salt content.

Some meals were also excluded if sufficient information was not available to determine the food group. Finally, 18,435 valid meals from 812 food outlets ( Figure 4 ) were recognized after interpretation, including five types of meals: staple (ST), set meal (SET), noodles and dumplings (ND), western fast-food (WFF), and healthy light recipes (HLRs). Subsequently, we calculated the category of every food outlet according to the rules presented in Appendix 1 in the Supplementary Material , and calculated the TUHII, HII, UHII, and health weight for every food outlet according to the abovementioned equations. For comparison, we standardized the TUHII, HII, and UHII by Z-score.

Statistical Analysis

For the differences of healthiness and nutrition among the different meals, a descriptive analysis was performed, and the healthy score and DDS were separately described by means and standard deviations. The homoscedasticity of the healthy score and DDS were tested by Levene's test. If the Levene's test indicated homoscedasticity, the difference in healthiness and nutrition among different meals was verified using the analysis of variance (ANOVA), otherwise verified by Welch's ANOVA analysis ( 66 ).

To study the differences in unhealthy impact among different food outlets, a descriptive analysis was performed, and the TUHII, HII, UHII, and health weight were separately described by the mean and standard deviation. Considering the distribution of health weight, TUHII, HII, and UHII ( Figure 5 ), we quartered the health weight and labeled them as healthy 3 (H, W ≥75% quantile), relatively healthy (rH, median ≤ W <75% quantile), relatively unhealthy (ruH, 25% quantile ≤ W < median), and unhealthy (uH, W <25% quantile). We quartered the standardized TUHII, HII, and UHII, and labeled them as follows: Q1 (the standardized TUHII, HII, or UHII <25% quantile), Q2 (25% quantile ≤standardized TUHII, HII, or UHII < median), Q3 (median ≤ the standardized TUHII, HII, or UHII <75% quantile), and Q4 (the standardized TUHII, HII, or UHII≥75% quantile). The difference of standardized TUHII, HII, UHII and health weight among different food outlets was analyzed by the same method as that in meals analysis ( 66 ).

www.frontiersin.org

Figure 5 . The histogram of standardized TUHII, HII, UHII, and health weight.

The spatial pattern of TUHII, HII, and UHII were analyzed by spatial autocorrelation using the inverse distance to specify the neighborhood relationship and tested by the Global Moran's I. The regional patterns of TUHII, HII, and UHII were verified by kernel density analysis, and the spatial resolution was 30 m. Further, we compared the differences of aggregation results of TUHII, HII, and UHII in different subdistricts. We tested the correlation of the results measured by the counting method for food environment availability and TUHII, HII, and UHII in different subdistricts using Spearman's correlation analysis. Subsequently, we then compared the differences between them. All the statistical analyses were conducted using SPSSAU (Beijing, China), and parts of data were processed in Python 3.6. A p -value of < 0.05 was regarded as significant. The final maps were drawn using ArcGIS 10.6.

Differences in Healthiness and Nutrition Among Various Meals

The number distribution of the different meals with different healthy score ( l ) and DDS are listed in Supplementary Table 1 . The table shows that two most popular meals were SET and ND types, accounting for nearly 80% of total meals. The DDS of most meals (~69% of total meals) were 4 or 5. More than 60% of meals were healthy (the healthy score equaled 0), about half of which were ND meals. The healthy score of most unhealthy meals (at least one unhealthy food in the meal) was 1, of which SET meals were of majority. In all types of meals, ST and WFF meals had the highest proportion of unhealthy meals.

The result of Levene's test showed that the variance for the healthy score among five types of meals were not equal (F = 711.16, p < 0.01). Overall, there were significant differences in the healthy score among all types of meals according to Welch's ANOVA test ( Supplementary Table 2 ). Specifically, the average healthy score (0.45) of WFF meals was the highest (0.79), and that of HLR meals was the lowest (0.14). In the five types of meals, the average healthy score of only two types of meals was below the total average (HLR, 0.14; ND, 0.30). The average healthy scores of ST (0.64) and WFF (0.79) were significantly higher than the total average.

The variance in DDS among five types of meals were not equal too (F = 79.28, p < 0.01). There were significant differences in DDS among all types of meals ( Supplementary Table 3 ). Specifically, the average DDS of HLR was the highest (6.34) and that of ST was the lowest (4.58). Among the five types of meals, the average DDS of only ND (4.70) and ST were less than the total average (4.82). Surprisingly, the average DDS of HLR was the highest, followed by WFF.

Differences in the Impact Degree Among Various Food Outlets

After aggregation, there were 123 healthy food outlets and 689 food outlets at different unhealthiest levels, including six types of food outlets: the unknown type (UN) and five other categories (same as the meals). In total, regardless of the total food outlet samples or healthy and unhealthy food outlet samples, the number of ND and SET food outlets was the highest ( Supplementary Tables 4–6 ).

Among the total food outlets, there were significant differences in standardized TUHII ( Supplementary Table 4 ). The average standardized TUHII of WFF (0.25), ST (0.22), and SET (0.13) food outlets were significantly higher than the total average standardized TUHII (0.00). Further, the average standardized TUHII of HLR (-0.28), ND (-0.11), and UN (-0.30) food outlets were considerably lower than the total average standardized TUHII. Specifically, the mean standardized TUHII in the Q1 (−0.39), Q2 (−0.37), and Q3 (−0.25) groups were significantly lower than the total average standardized TUHII. Moreover, the mean standardized TUHII of all types of food outlets were nearly equal to that in the corresponding Q1, Q2, and Q3 groups; however, those in the Q4 group were different. The mean standardized TUHII of HLR (0.24) and UN (0.33) food outlets were significantly lower than the total average standardized TUHII (1.02) in the Q4 group. The mean standardized TUHII of ND (0.76) and WFF (0.80) food outlets were considerably lower and that of SET (1.34) food outlets was considerably higher than the total average standardized TUHII in the Q4 group.

Regarding healthy food outlets, Welch's ANOVA test could not be conducted owing to lack of food outlets in some categories. Thus, we simply listed the number of healthy food outlets and the mean and standard deviation of the standardized HII in each quartile group ( Supplementary Table 5 ). There was no Q1, Q2, and Q3 groups in HLR and ST food outlets and no Q1, Q3, and Q4 groups in WFF food outlets. The number of HLR, ST, and WFF of food outlets was less than five. Thus, it was difficult to reasonably assess the position of the mean standardized HII of these food outlets relative to the total average of standardized HII.

Among unhealthy food outlets, there were significant differences in standardized UHII ( Supplementary Table 6 ). In total, the mean standardized UHII of UN (−0.41) and WFF (−0.28) food outlets were significantly lower, and that of ST (0.13) food outlets was significantly higher than the total average standardized UHII (0.00). Specifically, the mean standardized UHII of various food outlets in the Q1 and Q2 groups were similar to the total average standardized UHII in the Q1 and Q2 groups. In the Q3 group, the average standardized UHII of ST (-0.06) and WFF (−0.05) food outlet was considerably higher than the total average of standardized UHII (−0.12). In the Q4 group, the average standardized UHII of ST (1.83) and UN (1.6) food outlets were significantly higher, and those of HLR (0.68) and WFF (0.61) food outlets were significantly lower than the total average standardized UHII (1.38).

There were significant differences of health weight across various types of food outlets ( Supplementary Table 7 ). The mean health weight in HLR (0.85) food outlets was significantly higher and those of WFF (0.42) and ST (0.46) food outlets were significantly lower than the total average health weight (0.61). The average health weight of various food outlets in the H, rH, and ruH groups were approximately equal to the total average of health weight in the corresponding groups, and the difference of health weight in various food outlet in uH group was similar to the differences in total.

Characteristics of the Spatial Distribution for TUHII, HII, and UHII

As shown in Supplementary Figure 4 , Global Moran's I test (Z = 0.12, p = 0.91) for TUHII revealed that there was no significant spatial variance indicating that the TUHII of food outlets were distributed randomly. Moran's index of HII was 0.87 (Z = 6.67, p < 0.01). Moran's index of UHII was 0.09 (Z = 2.02, p = 0.04).

For local spatial differences, the kernel density analysis result recognized the similar amount and distribution of areas with high TUHII and UHII and only one area with high HII, which also was the same area with TUHII and UHII (Panels B, D, and F in Supplementary Figure 5 ). However, the different areas with high TUHII, HII, and UHII were influenced by different types of food outlets (Panels A, C, and E in Supplementary Figure 5 ). For aggregation results of standardized TUHII ( Supplementary Figure 6 ), the different subdistricts were highly influenced by different food outlets. Specifically, the sum of standardized TUHII of HLR food outlets was considerably higher in the ZLL subdistrict than in others. Further, the sum of standardized ND and ST food outlets were relatively higher in the XJK subdistrict than in others. Moreover, the sum of standardized SET food outlets was higher in the YDL subdistrict compared with the others, and the sum of standardized UN food outlets was higher in the ZZY subdistrict than in others. Additionally, the sum of standardized WFF food outlets was comparably higher in the BXG subdistrict than in others. For aggregation results of standardized HII ( Supplementary Figure 7 ), the different subdistricts were also influenced by different food outlets, except the ST and WFF, which were unhealthier than other types of food outlets in the differences analysis of indicators among various food outlets. In the remaining food outlets, the sum of standardized HII of HLR food outlets was considerably higher in the GJK subdistrict than in other subdistricts. Further, the sum of standardized ND food outlets was relatively higher at BXG subdistrict than in others. Moreover, the sum of standardized SET food outlets was pretty higher in the BLZ and BXG subdistrict compared with others. Additionally, the sum of standardized UN food outlets was higher in the BXG and XJK subdistricts than in others. Similar to TUHII and HII, for aggregation results of standardized UHII ( Supplementary Figure 8 ), the different subdistricts were also influenced by different food outlets. However, the number of subdistricts influenced highly was more than that in the case of TUHII and HII. In HLR food outlets, ZZY, BXG, and ZLL subdistricts were highly influenced; in ND food outlets, SG and BXG subdistricts were influenced higher than in other subdistricts. Further, in SET food outlets, XJK and YDL subdistricts were influenced higher than in others, and in ST food outlets, only BLZ subdistrict was highly influenced. Moreover, in UN food outlets, only GJK subdistrict was influenced higher than in others; in WFF food outlets, SG, BLZ, and YT subdistricts were highly influenced.

Differences of the Food Environment Measured by the Counting Method and TUHII, HII, and UHII in Different Sub-districts

The results of Spearman's correlation test ( Table 3 ) showed that there were no significant correlations between the results of food environment measured by the counting method and TUHII, HII, and UHII in different sub-districts. This means that the availability measured by counting at different subdistricts were completely different from that measured by corresponding new indicators.

www.frontiersin.org

Table 3 . Spearman's correlation test between results of food environment measured by the counting method and TUHII, HII, and UHII in different subdistricts.

Specifically, the consistent measurement results for total food outlets (Panels A and B in Figure 6 ) were in the BXG, ZZY, ZLL, and YT sub-districts. Further, consistent results for healthy food outlets (Panels C and D in Figure 6 ) were in the BXG, SG, and BLZ sub-districts, and those for unhealthy food outlets (Panels E and F in Figure 6 ) were in the SG, BLZ, and YT sub-districts. Compared to the result measured by standardized TUHII, the results measured by the counting method in the SG, BLZ, and YFD sub-districts were considerably overestimated, and the high risk in the YDL sub-district was not recognized in the result of counting method. Moreover, compared to the result measured by standardized HII, the results measured by the counting method in the GJK sub-district was overestimated, and the results in the ZZY and XJK sub-districts were underestimated. For unhealthy food outlets, the high-risk areas recognized by two methods were different (observe the red color area between E and F in Figure 6 ): the high risk recognized by standardized UHII was in the XJK subdistrict, but the high risk recognized by the counting method was in the BXG subdistrict. Additionally, compared to the result measured by standardized UHII, those measured by counting in the GJK and YFD subdistricts were overestimated, and the result in the YDL subdistrict was underestimated.

www.frontiersin.org

Figure 6 . Comparison between two measurements: (A, C, E) new method, and (B, D, F) old method (measuring the food environment by counting food outlets in an area). ND, Noodles and dumplings; WFF, Western fast-food; SET, Set meal; ST, Staple; HLR, Healthy and light recipes; UN, Unknown. BLZ, Balizhuang; BXG, Beixiaguan; GJK, Ganjiakou; SG, Shuguang; XJK, Xinjiekou; YFD, Yangfangdian; YDL, Yongdinglu; YT, Yuetan; ZLL, Zhanlanlu; ZZY, Zizhuyuan. (A) The standardized TUHII summed by subdistricts. (B) The standardized counts of total food outlets summed by subdistricts. (C) The standardized HII summed by subdistricts. (D) The standardized counts of healthy food outlets summed by subdistricts. (E) The standardized UHII summed by subdistricts. (F) The standardized counts of unhealthy food outlets summed by subdistricts.

To the best of our knowledge, this is the first study to examine the OPFE in China. We proposed an analytical framework in which food is placed in the center linking the food environment and consumers as one of the key elements of the food environment. We first explored the differences of the healthiness and nutrition in different types of food because takeaway food has been typically defined as unhealthy by default as a type of fast-food. From a healthiness perspective, most takeaway foods for a single meal were healthy in 10 subdistricts. The highest ratio of unhealthy food in a single meal was in the ST type of meals, which is reasonable because citizens of Beijing are used to having at least one kind of fried food in ST meals at breakfast, such as fried bread sticks or rings. Although the WFF meal had a higher ratio of unhealthy food in a single meal than other meals, the amount of healthy food in WFF meals was a little more than that of unhealthy food, which was unexpected. Further, surprisingly, there was unhealthy food in HLR meals, which normally featured healthy foods. From the perspective of nutrition, we tested that there was a significant difference among different types of meals, but the remaining meals, except HLR, did not differ significantly in the mean DDS. Notably, the DDS of more than 90% meals were >4. For the population (household DDS, women DDS, or children DDS), a DDS value <4 would be considered low dietary diversity ( 67 ). Thus, the DDS of most takeaway food on a single meal in our study area met the daily standard of population. Considering the distinctive characteristics of WFF and HLR meals, the parameters we selected could depict the features of meals correctly. Thus, the above results answered the first research question, and provided new evidence that takeaway foods cannot be identified unhealthy directly in China.

At the food outlet level, there were only 123 healthy food outlets, accounting for <15% of the total food outlets. Even among the HLR food outlets—which were the representative of healthy food outlets—the healthy food outlets were only one-fifth of the total HLR food outlets. This result was expected to be similar with the initial impression that most fast-food is unhealthy. Moreover, the healthy food outlets may have only been healthy at the time of data collection, and this also might be related to the method of healthiness we designed. We measured the healthiness of food by proxies, which were usually add-on sale. Thus, a reasonable explanation would be that these healthy food outlets did not sell unhealthy food at the time of data collection. Similarly, even among unhealthy food outlets, only a few sold healthy food items ( Supplementary Table 7 ). Furthermore, the unhealthy impact degree of various food outlets was different. Interestingly, the mean standardized TUHII of WFF food outlets was higher than the total average standardized TUHII. However, the mean standardized HII and UHII of WFF food outlets were lower than the corresponding total averages of standardized HII and UHII. This result indicated the following: (1) the actual unhealthy impact of WFF food outlets was lower than the expected, regardless of the healthy or unhealthy groups samples; and (2) the number of unhealthy foods in WFF meals was more than in any other type of meals ( Supplementary Table 1 ), especially in terms of WFF meals with two unhealthy foods.

According to the results of kernel density analysis and aggregation by subdistricts, various high-risk areas were identified by different food outlet samples. Notably, the average number of high-risk areas identified by total food outlets was lower than that identified by healthy and unhealthy food outlets. This indicates that the actual number of high-risk areas might be underestimated in the mixed food outlet samples. The comparison between the aggregation results measured by the counting method and the method proposed in this study showed that the unhealthy impact of food outlets was incorrectly estimated by the counting method in approximately two-thirds of the subdistricts.

All analyses in the above two paragraphs could answer the second research question and the proposed indicators could be potentially useful tools for monitoring the food environment. Moreover, many researchers have explored the reasons behind the inconsistent conclusion: the data sources ( 68 – 74 ); the measurement or selection of the food environment ( 22 , 69 , 75 , 76 ); neighborhood effect ( 77 ); study design and quality ( 28 , 78 ); the change of the food environment ( 79 ); the temporal and spatial uncertainty ( 80 ); and the complexity ( 32 ). However, few studies have deconstructed the inner characteristics of the food environment. The current results shed light on these inconsistent conclusions.

Limitations

This study had several limitations. First, we cannot assess the performance of the indicator on the association between the food environment and human health, owing to the lack of population data. Although we compared the differences between the two results of the food environment measured by the counting method and the indicators proposed in this study, we cannot conclude that our methods were better than the counting method in assessing the association between the food environment and human health. Thus, population health data should be collected in the future to verify the indicator performance.

Second, the indicator was designed based on single-meal foods, which means that we cannot identify the food attributes of F_BBQ and POT food and corresponding food outlets because they were self-selected by consumers, seafood, and corresponding food outlets because they were sold by weight. Moreover, the characteristics of snacks and beverages were not identified owing to a lack of data. While calculating the healthy score, using proxies would inevitably involve uncertainty. Thus, in future research, the indicator should be revised, or a new indicator should be developed for F_BBQ, POT, and SF types of food and food outlets. The indicators proposed in this study should also be applied to snacks and beverages in the future. FDPs and the government should cooperate to solve the uncertainty problem caused by proxies.

Third, regarding the classification system of food outlets and meals, although we tried to cover all types of food, we cannot be sure that the classification system can be generalized because this system was developed based on the data from a small area in Beijing. More work is needed to expand its boundary of application to the whole food environment, comprising all varieties of food outlets and providing food service. Additionally, the cutoff of the ratio of the maximum and minimum(MMR, a parameter to identify the category of the food outlet; Appendix 1 in the Supplementary Material ) in rules for calculating the type of food outlets from that of food was set referring to authors' data observations. A theoretical method should be developed to calculate the type of food outlet from that of food in the future.

Fourth, the DDS is designed to assess a personal or a family's access to nutrition, such as nutrient adequacy and overall diet quality, over the preceding days or during a week. This is possibly the first time DDS is used to estimate the food diversity of takeaway food as a part of an indicator for measuring the food environment. Nevertheless, we cannot verify the relationships between DDS in a day and that in a single meal. Hence, more research is needed to verify the association of the single-meal DDS and individual DDS.

Finally, the method of feature extraction by artificial interpretation is inefficient. As this is the first study to examine the OPFE-TF in China and the complexity of Chinese food environment have been tested in other provinces ( 32 ), the artificial feature extraction is a good way to guarantee the accuracy of the analysis data. Artificial interpretation can be used at the beginning of the research to help collect data characteristics. Automatic or semi-automated feature extraction, however, should be developed in the future to cater to big data.

The severity of obesity in China is undisputed ( 81 ). While it is not easy to intervene within the context of a population, finding a solution from the food environment perspective is worth the effort. This trial study explored a new methodological framework for the OPFE in China by developing an analytical framework and a measurement indicator to define the healthiness and the health impact weight of food outlets. We built a new food environment research framework based on the evidence from the latest disease burden research, combining the characteristics of China's current food environment, from the perspective of environmental science, and referring to the standard process of EHIs. We redefined the food environment and proposed that food and its physical space are two core elements of the food environment. According to this definition, we extracted four domains of characteristics to describe the basic components of the food environment by referring to the existing methods of dietary quality evaluation in nutrition. Based on the single meal, a common form between human eating habit and selling food on the FDP, we designed an approach including three indicators: TUHII, HII, and UHII. As we stated in the Introduction, the indicators developed in this study were mainly used to monitor the characteristics of OPFE-TF, recognize the high unhealthy food risk areas, and prepare to model the associations between OPFE-TF and human health.

The conclusion included: first, the takeaway foods cannot be identified unhealthy directly in China; second, the indicator constructed based on the analytical framework proposed in this study can depict the food environment better compared with the traditional counting method and identify the high risk of OPFE-TF. The food environment characteristics measured by the new approach proposed in this study is closer to reality. This simple but more meaningful approach makes it useful for large-scale and long-term food environmental change monitoring. More importantly, measuring the nutritional value of food available at food outlets to draw public health implications from an analysis of the food environment ( 12 ) would not be beyond the scope of food environment research according to the new food environment definition.

Our work in this area is in its initial phase, and more research is needed to verify the effectiveness of the measurement indicator in assessing the impact of OPFE-TF on human health, with interdisciplinary support from nutrition, geography, environmental science, marketing management, and big data companies. Furthermore, more work is needed in the future to revise the measurement to better monitor the change of OPFE and explore the associations between OPFE-TF and population health in a broad spatial range and a longitudinal cohort to develop healthy cities in China ( 82 ).

Data Availability Statement

The original contributions presented in the study are included in the article/ Supplementary Material , further inquiries can be directed to the corresponding author/s.

Author Contributions

NC conducted the whole analyses and drafted the manuscript. AZ helped with the indicator design, M-PK helped with the spatial analysis, and JY helped with the framework. AZ, M-PK, and JY revised the manuscript. PG helped with the framework, supervised, funded the project, and revised the manuscript. All authors read and approved the final manuscript.

This research was supported by two grants from the National Natural Science Foundation of China (Nos. 42090015 and 42071400) and donations from the Cyrus Tang Foundation.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnut.2022.818374/full#supplementary-material

1. ^ https://www.naics.com/naics-code-description/?code=72251

2. ^ Please see the introduction of RESSET database at http://www.resset.cn/endatabases .

3. ^ The “healthy” and other labels here were only used to describe the healthiness of unhealthy food outlets.

1. Nugent R, Levin C, Hale J, Hutchinson B. Economic effects of the double burden of malnutrition. The Lancet. (2020) 395:156–64. doi: 10.1016/S0140-6736(19)32473-0

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Development Initiatives. Global Nutrition Report: Shining a Light to Spur Action on Nutrition. Bristol: Development Initiatives (2018).

3. Risk Factor Collaboration NCD. Trends in adult body-mass index in 200 countries from 1975 to 2014: a pooled analysis of 1,698 population-based measurement studies with 19·2 million participants. The Lancet . (2016) 56:1377–96. doi: 10.1016/S0140-6736(16)30054-X

4. NHFPC-BDC. Report on Chinese Residents' Chronic Diseases and Nutrition, 2020 . Beijing: People's Medical Publishing House (2020).

5. Haslam DW, James WP. Obesity. The Lancet . (2005) 366:1197–209. doi: 10.1016/S0140-6736(05)67483-1

6. Roth GA, Abate D, Abate KH, Abay SM, Abbafati C, Abbasi N. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the global burden of disease study 2017. The Lancet . (2018) 392:1736–88. doi: 10.1016/S0140-6736(18)32203-7

7. Stanaway JD, Afshin A, Gakidou E, Lim SS, Abate D, Abate KH. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: a systematic analysis for the global burden of disease study 2017. The Lancet . (2018) 392:1923–94. doi: 10.1016/S0140-6736(18)32225-6

8. Swinburn B. Dissecting obesogenic environments: the development and application of a framework for identifying and prioritizing environmental interventions for obesity. Prev Med . (1999) 29:563–570.

PubMed Abstract | Google Scholar

9. Swinburn BA, Sacks G, Hall KD, McPherson K, Finegood DT, Moodie ML, et al. The global obesity pandemic: shaped by global drivers and local environments. The Lancet . (2011) 378:804–14. doi: 10.1016/S0140-6736(11)60813-1

10. Fanzo J, Davis C. Global Food Systems, Diets, and Nutrition . Cham: Springer International Publishing (2021).

11. Swinburn B, Dominick CH, Vandevijvere S. Benchmarking Food Environments: Experts' Assessments of Policy Gaps and Priorities for the New Zealand Government . Auckland: University of Auckland (2014).

12. Raja S, Ma C, Yadav P. Beyond food deserts. J Planning Edu Res . (2008) 27:469–82. doi: 10.1177/0739456X08317461

CrossRef Full Text | Google Scholar

13. Turner C, Aggarwal A, Walls H, Herforth A, Drewnowski A, Coates J. Concepts and critical perspectives for food environment research: a global framework with implications for action in low- and middle-income countries. Global Food Security . (2018) 18:93–101. doi: 10.1016/j.gfs.2018.08.003

14. Toure D, Herforth A, Pelto GH, Neufeld LM, Mbuya MNN. An emergent framework of the market food environment in low- and middle-income countries. Curr Develop Nutri . (2021) 5:nzab023. doi: 10.1093/cdn/nzab023

15. Herforth A, Ahmed S. The food environment, its effects on dietary consumption, and potential for measurement within agriculture-nutrition interventions. Food Security . (2015) 7:505–20. doi: 10.1007/s12571-015-0455-8

16. Global Panel. Improving Nutrition Through Enhanced Food Environments . London: Global Panel on Agriculture and Food Systems for Nutrition (2017).

17. HLPE. Nutrition and Food Systems. A report by the High Level Panel of Experts on Food Security and Nutrition of the Committee on World Food Security. Rome (2017).

18. Downs SM, Ahmed S, Fanzo J, Herforth A. food environment typology: advancing an expanded definition, framework, and methodological approach for improved characterization of wild, cultivated, and built food environments toward sustainable diets. Foods . (2020) 9:532. doi: 10.3390./foods9040532

19. Lytle LA, Sokol RL. Measures of the food environment: a systematic review of the field, 2007–2015. Health Place . (2017) 44:18–34. doi: 10.1016/j.healthplace.12, 007

20. Jin H, Lu Y. Evaluating consumer nutrition environment in food deserts and food swamps. Int J Environ Res Public Health . (2021) 18:2675. doi: 10.3390./ijerph18052675

21. Swinburn BA, Caterson I, Seidell JC, James WP. T Diet, nutrition, and the prevention of excess weight gain and obesity. Public Health Nutr. (2004). 7:123–46. doi: 10.1079/phn2003585

22. Thomson JL, Goodman MH, Landry AS. Measurement of nutrition environments in grocery stores, convenience stores, and restaurants in the lower Mississippi delta. Prev Chronic Dis . (2020) 17:E24. doi: 10.5888/pcd17.190293

23. Joseph R, Cassandra SMJ, Wesley RD, Scott AH. Association between proximity to and coverage of traditional fast-food restaurants and nontraditional fast-food outlets and fast-food consumption among rural adults. Int J Health Geograph . (2011) 10:37. doi: 10.1186/1476-072X-10-37

24. Zenk SN, Mentz G, Schulz AJ, Johnson-Lawrence V, Gaines CR. Longitudinal associations between observed and perceived neighborhood food availability and body mass index in a multiethnic urban sample. Health Educ Behav . (2017) 44:41–51. doi: 10.1177/1090198116644150

25. Thornton LE, Kavanagh AM. Association between fast food purchasing and the local food environment. Nutr Diabetes . (2012) 2:e53. doi: 10.1038/nutd.2012.27

26. Thornton LE, Lamb K. E., White SR. The use and misuse of ratio and proportion exposure measures in food environment research. Int J Behav Nutr Phys Act . (2020) 17, 118. doi: 10.1186/s12966-020-01019-1

27. Yenerall J, You W, Hill J. Investigating the spatial dimension of food access. Int J Environ Res Public Health . (2017) 14. doi: 10.3390./ijerph14080866

28. Cobb LK, Appel LJ, Franco M, Jones-Smith JC, Nur A, Anderson CAM. The relationship of the local food environment with obesity: a systematic review of methods, study quality, and results. Obesity . (2015) 23:1331–44. doi: 10.1002/oby.21118

29. An R, He L, Shen J. Impact of neighbourhood food environment on diet and obesity in China: a systematic review. Public Health Nutr . (2020) 23:457–73. doi: 10.1017/S1368980019002167

30. Turner C, Kalamatianou S, Drewnowski A, Kulkarni B, Kinra S, Kadiyala S. Food Environment research in low- and middle-income countries: a systematic scoping review. Adv Nutr . (2020) 11:387–97. doi: 10.1093/advances/nmz031

31. Maimaiti M, Ma X, Zhao X, Jia M, Li J, Yang M. Multiplicity and complexity of food environment in China: full-scale field census of food outlets in a typical district. Eur J Clin Nutr . (2020) 74:397–408. doi: 10.1038/s41430-019-0462-5

32. Cong N, Zhao A, Peng I, Gong. Food delivery platform: a potential tool for monitoring the food environment and mitigating overweight/obesity in China. Front Nutri. (2021) 8:703090. doi: 10.3389/fnut.2021.703090

33. Holden NM, White EP, Lange MC, Oldfield TL. Review of the sustainability of food systems and transition using the Internet of Food. NPJ Sci Food . (2018) 2:18. doi: 10.1038/s41538-018-0027-3

34. Liu C, Chen J. Consuming takeaway food: convenience, waste and Chinese young people's urban lifestyle. J Consumer Cult . (2021) 21:848–66. doi: 10.1177/1469540519882487

35. Li C, Mirosa M, Bremer P. Review of Online Food Delivery Platforms and their Impacts on Sustainability. Sustainability . (2020) 12, 5528. doi: 10.3390/su12145528

36. Briggs D, Corvalán C, Nurminen M. Linkage Methods for Environment and Health Analysis: General Guidelines . Geneva: World Health Organization (1996).

37. WHO. Making a Difference: Indicators to Improve Children's Environmental Health . Geneva: World Health Organization (2003).

38. Murray CJL, Aravkin AY, Zheng P, Abbafati C, Abbas KM, Abbasi-Kangevari M. Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of disease study 2019. The Lancet . (2020) 396:1223–49. doi: 10.1016/S.0140-6736(20)30752-2

39. Gakidou E, Afshin A, Abajobir AA, Abate KH, Abbafati C, Abbas KM. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2016: a systematic analysis for the global burden of disease study 2016. The Lancet . (2017) 390, 1345–422. doi: 10.1016/S0140-6736(17)32366-8

40. Afshin A, Sur PJ, Fay KA, Cornaby L, Ferrara G, Salama JS. Health effects of dietary risks in 195 countries, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet . (2019) 393:1958–72. doi: 10.1016/S0140-6736(19)30041-8

41. Mazidi M, Speakman JR. Higher densities of fast-food and full-service restaurants are not associated with obesity prevalence. Am J Clin Nutr. (2017) 106:603–13.

42. Rummo PE, Guilkey DK, Ng SW, Meyer KA, Popkin BM, Reis JP. Does unmeasured confounding influence associations between the retail food environment and body mass index over time? The coronary artery risk development in young adults (CARDIA) study. Int J Epidemiol . (2017) 46:1456–64. doi: 10.1093/ije/dyx070

43. Wong MS, Chan KS, Jones-Smith JC, Colantuoni E, Thorpe RJ, Bleich SN. The neighborhood environment and obesity: understanding variation by race/ethnicity. Prev Med . (2018) 111:371–7. doi: 10.1016/j.ypmed.11029

44. Zenk SN, Tarlov E, Wing CM, Matthews SA, Tong H, Jones KK. Long-term weight loss effects of a behavioral weight management program: does the community food environment matter? Int J Environ Res Public Health . (2018) 15:211. doi: 10.3390./ijerph15020211

45. Dornelles A. Impact of multiple food environments on body mass index. PLoS ONE . (2019) 14:e0219365. doi: 10.1371/journal.pone.0219365

46. NAICS Association, N. A. I. C. S. NAICS Code: 722511 Full-Service Restaurants . (2021). Available online at: https://www.naics.com/naics-code-description/?code=7225 (accessed August 31, 2021).

47. Hobbs M, Green MA, Wilkins E, Lamb KE, McKenna J, Griffiths C. Associations between food environment typologies and body mass index: evidence from Yorkshire, England. Soc Sci Med . (2019) 239:112528. doi: 10.1016/j.socscimed.2019.112528

48. Wilkins E, Morris M, Radley D, Griffiths C. Methods of measuring associations between the Retail Food Environment and weight status: Importance of classifications and metrics. SSM Popul Health . (2019) 8:100404. doi: 10.1016/j.ssmph.2019.100404

49. Bailey K. Typologies and Taxonomies . 2455 Teller Road, Thousand Oaks, CA: SAGE Publications, Inc (1994).

50. Standardization Administration of the People's Republic of China Industrial classification for national economic activities 35.040, 4754.-−2017. General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China; Standardization Administration of the People's Republic of China(2017), June 30, Available online at: http://www.stats.gov.cn/tjsj/tjbz/hyflbz/201710/t20171012_1541679.html

51. Lennernas M, Andersson I. Food-based classification of eating episodes (FBCE). Appetite . (1999) 32:53–65.

52. National Health and Medical Research Council. Australian Dietary Guidelines . Canberra: National Health and Medical Research Council (2013)

53. Brazil Ministry of Health of Brazil. Secretariat of Health Care, and Primary Health Care, Department. Dietary Guidelines for the Brazilian Population . Brasília: Ministry of Health of Brazil (2015).

54. Chinese Nutrition Society. The Dietary Guidelines for Chinese (2016) . Beijing: People's Medical Publishing House (2016).

55. US Department of Agriculture, and U. S., Department of Health, and Human, Services. Dietary Guidelines for Americans 2020–2025 (2020). Available online at: DietaryGuidelines.gov

56. Kennedy G, Ballard T, Dop M. Guidelines for Measuring Household and Individual Dietary Diversity. Rome: FAO (2013).

57. Lin BH. Healthy eating index. Agric Inf Bull. (2005) 796:1–4. Available online at: https://www.ers.usda.gov/webdocs/publications/42596/30098_aib796-1.pdf?v=0

58. Soowon K, Pamela S, Anna Maria H, Barry SRP. The diet quality index-International (DQI-I) provides an effective tool for cross-national comparison of diet quality as illustrated by China and the United States. J. Nutr . (2003) 133:3476–84. doi: 10.1093/jn/133.11.3476

59. Salehi-Abargouei A, Akbari F, Bellissimo N, Azadbakht L. Dietary diversity score and obesity: a systematic review and meta-analysis of observational studies. Eur J Clin Nutr . (2016) 70:1–9. doi: 10.1038/ejcn.2015.118

60. Qorbani M, Mahdavi-Gorabi A, Khatibi N, Ejtahed HS, Khazdouz M, Djalalinia S. Dietary diversity score and cardio-metabolic risk factors: an updated systematic review and meta-analysis. Eat Weight Disord . (2021) 4:1090. doi: 10.1007./s40519-020-01090-4

61. Habte T, Krawinkel M. Dietary diversity score: a measure of nutritional adequacy or an indicator of healthy diet? J Nutri Health Sci . (2016) 3:303. doi: 10.15744./2393-9060.3.303

62. Trustdata. Analysis report on the development of China's food delivery industry in Q2 . (2020). Available online at: http://www.100ec.cn/detail−6573959.html (accessed August 27, 2021).

63. Zhao A, Li Z, Ke Y, Huo S, Ma Y, Zhang Y. Dietary diversity among Chinese residents during the COVID-19 outbreak and its associated factors. Nutrients . (2020) 12:1699. doi: 10.3390/nu12061699

64. Yang Y. China Food Composition (Standard) . Beijing: Peking University Medical Press (2018).

65. Yang Y. China Food Composition (Standard) . Beijing: Peking University Medical Press (2019).

66. McDonald JH. One-Way Anova - Handbook of Biological Statistics . (2016). http://www.biostathandbook.com/onewayanova.html (accessed January 15, 2022).

67. Ochieng J, Afari-Sefa V, Lukumay PJ, Dubois T. Determinants of dietary diversity and the potential role ofmen in improving household nutrition in Tanzania . (2017) 12:3569. doi: 10.7910./DVN/INRWQA

68. Cummins S, Macintyre S. Are secondary data sources on the neighbourhood food environment accurate? Case-study in Glasgow, UK. Prev Med . (2009) 49:527–8. doi: 10.1016/j.ypmed.10007

69. Bader MD, Ailshire JA, Morenoff JD, House JS. Measurement of the local food environment: a comparison of existing data sources. Am J Epidemiol . (2010) 171:609–17. doi: 10.1093/aje/kwp419

70. Burgoine T. Collecting accurate secondary foodscape data. A reflection on the trials and tribulations. Appetite . (2010) 55:522–7. doi: 10.1016/j.appet.08020

71. Lake AA, Burgoine T, Greenhalgh F, Stamp E, Tyrrell R. The foodscape: classification and field validation of secondary data sources. Health Place . (2010) 16:666–73. doi: 10.1016/j.healthplace.02004

72. Mendez DD, Kim KH, Hardaway CR, Fabio A. Neighborhood racial and socioeconomic disparities in the food and alcohol environment: are there differences by commercial data sources? J Racial Ethnic Health Disp . (2016) 3:108–16. doi: 10.1007/s40615-015-0120-0

73. Wilkins EL, Radley D, Morris MA, Griffiths C. Examining the validity and utility of two secondary sources of food environment data against street audits in England. Nutr J . (2017) 16:82. doi: 10.1186/s12937-017-0302-1

74. Lucan SC, Maroko AR, Abrams C, Rodriguez N, Patel AN, Gjonbalaj I. Government data v. ground observation for food-environment assessment: businesses missed and misreported by city and state inspection records. Public Health Nutr . (2020) 23:1414–27. doi: 10.1017/S1368980019002982

75. Moore LV, Diez Roux AV, Nettleton JA, Jacobs DR, Franco M. Fast-food consumption, diet quality, and neighborhood exposure to fast food: the multi-ethnic study of atherosclerosis. Am J Epidemiol . (2009) 170:29–36. doi: 10.1093/aje/kwp090

76. Bivoltsis A, Cervigni E, Trapp G, Knuiman M, Hooper P, Ambrosini GL. Food environments and dietary intakes among adults: does the type of spatial exposure measurement matter? A systematic review. Int J Health Geograph . (2018) 17:19. doi: 10.1186/s12942-018-0139-7

77. Kwan MP. The limits of the neighborhood effect: contextual uncertainties in geographic, environmental health, and social science research. Annals Am Assoc Geograp . (2018) 108:1482–90. doi: 10.1080/24694452.2018.1453777

78. Gamba RJ, Schuchter J, Rutt C, Seto EYW. Measuring the food environment and its effects on obesity in the United States: a systematic review of methods and results. J Community Health . (2015) 40:464–75. doi: 10.1007/s10900-014-9958-z

79. Bleich SN, Soto MJ, Jones-Smith JC, Wolfson JA, Jarlenski MP, Dunn CG. Association of chain restaurant advertising spending with obesity in US adults. JAMA Netw Open . (2020) 3:e2019519. doi: 10.1001/jamanetworkopen.2020.19519

80. Chen X, Kwan MP. Contextual uncertainties, human mobility, and perceived food environment: the uncertain geographic context problem in food access research. Am J Public Health . (2015) 105:1734–7. doi: 10.2105/ajph.2015.302792

81. Huang L, Wang Z, Wang H, Zhao L, Jiang H, Zhang B. Nutrition transition and related health challenges over decades in China. Eur J Clin Nutr . (2021) 75:247–52. doi: 10.1038/s41430-020-0674-8

82. Yang J, Siri JG, Remais JV, Cheng Q, Zhang H, Chan KKY. The Tsinghua–lancet commission on healthy cities in China: unlocking the power of cities for a healthy China. The Lancet . (2018) 391:2140–84. doi: 10.1016/S0140-6736(18)30486-0

Keywords: online public food environment, exposure, takeaway food, indicator/measurement, dietary diversity, analytical framework

Citation: Cong N, Zhao A, Kwan M-P, Yang J and Gong P (2022) An Indicator Measuring the Influence of the Online Public Food Environment: An Analytical Framework and Case Study. Front. Nutr. 9:818374. doi: 10.3389/fnut.2022.818374

Received: 19 November 2021; Accepted: 23 May 2022; Published: 30 June 2022.

Reviewed by:

Copyright © 2022 Cong, Zhao, Kwan, Yang and Gong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Peng Gong, penggong@hku.hk

This article is part of the Research Topic

Advances in Food Environments

analytical framework case study

Data Analytics Case Study Guide 2024

by Sam McKay, CFA | Data Analytics

analytical framework case study

Data analytics case studies reveal how businesses harness data for informed decisions and growth.

For aspiring data professionals, mastering the case study process will enhance your skills and increase your career prospects.

Sales Now On Advertisement

So, how do you approach a case study?

Use these steps to process a data analytics case study:

Understand the Problem: Grasp the core problem or question addressed in the case study.

Collect Relevant Data: Gather data from diverse sources, ensuring accuracy and completeness.

Apply Analytical Techniques: Use appropriate methods aligned with the problem statement.

Visualize Insights: Utilize visual aids to showcase patterns and key findings.

Derive Actionable Insights: Focus on deriving meaningful actions from the analysis.

This article will give you detailed steps to navigate a case study effectively and understand how it works in real-world situations.

By the end of the article, you will be better equipped to approach a data analytics case study, strengthening your analytical prowess and practical application skills.

Let’s dive in!

Data Analytics Case Study Guide

Table of Contents

What is a Data Analytics Case Study?

A data analytics case study is a real or hypothetical scenario where analytics techniques are applied to solve a specific problem or explore a particular question.

It’s a practical approach that uses data analytics methods, assisting in deciphering data for meaningful insights. This structured method helps individuals or organizations make sense of data effectively.

Additionally, it’s a way to learn by doing, where there’s no single right or wrong answer in how you analyze the data.

So, what are the components of a case study?

Key Components of a Data Analytics Case Study

Key Components of a Data Analytics Case Study

A data analytics case study comprises essential elements that structure the analytical journey:

Problem Context: A case study begins with a defined problem or question. It provides the context for the data analysis , setting the stage for exploration and investigation.

Data Collection and Sources: It involves gathering relevant data from various sources , ensuring data accuracy, completeness, and relevance to the problem at hand.

Analysis Techniques: Case studies employ different analytical methods, such as statistical analysis, machine learning algorithms, or visualization tools, to derive meaningful conclusions from the collected data.

Insights and Recommendations: The ultimate goal is to extract actionable insights from the analyzed data, offering recommendations or solutions that address the initial problem or question.

Now that you have a better understanding of what a data analytics case study is, let’s talk about why we need and use them.

Why Case Studies are Integral to Data Analytics

Why Case Studies are Integral to Data Analytics

Case studies serve as invaluable tools in the realm of data analytics, offering multifaceted benefits that bolster an analyst’s proficiency and impact:

Real-Life Insights and Skill Enhancement: Examining case studies provides practical, real-life examples that expand knowledge and refine skills. These examples offer insights into diverse scenarios, aiding in a data analyst’s growth and expertise development.

Validation and Refinement of Analyses: Case studies demonstrate the effectiveness of data-driven decisions across industries, providing validation for analytical approaches. They showcase how organizations benefit from data analytics. Also, this helps in refining one’s own methodologies

Showcasing Data Impact on Business Outcomes: These studies show how data analytics directly affects business results, like increasing revenue, reducing costs, or delivering other measurable advantages. Understanding these impacts helps articulate the value of data analytics to stakeholders and decision-makers.

Learning from Successes and Failures: By exploring a case study, analysts glean insights from others’ successes and failures, acquiring new strategies and best practices. This learning experience facilitates professional growth and the adoption of innovative approaches within their own data analytics work.

Including case studies in a data analyst’s toolkit helps gain more knowledge, improve skills, and understand how data analytics affects different industries.

Using these real-life examples boosts confidence and success, guiding analysts to make better and more impactful decisions in their organizations.

But not all case studies are the same.

Let’s talk about the different types.

Types of Data Analytics Case Studies

 Types of Data Analytics Case Studies

Data analytics encompasses various approaches tailored to different analytical goals:

Exploratory Case Study: These involve delving into new datasets to uncover hidden patterns and relationships, often without a predefined hypothesis. They aim to gain insights and generate hypotheses for further investigation.

Predictive Case Study: These utilize historical data to forecast future trends, behaviors, or outcomes. By applying predictive models, they help anticipate potential scenarios or developments.

Diagnostic Case Study: This type focuses on understanding the root causes or reasons behind specific events or trends observed in the data. It digs deep into the data to provide explanations for occurrences.

Prescriptive Case Study: This case study goes beyond analytics; it provides actionable recommendations or strategies derived from the analyzed data. They guide decision-making processes by suggesting optimal courses of action based on insights gained.

Each type has a specific role in using data to find important insights, helping in decision-making, and solving problems in various situations.

Regardless of the type of case study you encounter, here are some steps to help you process them.

Roadmap to Handling a Data Analysis Case Study

Roadmap to Handling a Data Analysis Case Study

Embarking on a data analytics case study requires a systematic approach, step-by-step, to derive valuable insights effectively.

Here are the steps to help you through the process:

Step 1: Understanding the Case Study Context: Immerse yourself in the intricacies of the case study. Delve into the industry context, understanding its nuances, challenges, and opportunities.

Data Mentor Advertisement

Identify the central problem or question the study aims to address. Clarify the objectives and expected outcomes, ensuring a clear understanding before diving into data analytics.

Step 2: Data Collection and Validation: Gather data from diverse sources relevant to the case study. Prioritize accuracy, completeness, and reliability during data collection. Conduct thorough validation processes to rectify inconsistencies, ensuring high-quality and trustworthy data for subsequent analysis.

Data Collection and Validation in case study

Step 3: Problem Definition and Scope: Define the problem statement precisely. Articulate the objectives and limitations that shape the scope of your analysis. Identify influential variables and constraints, providing a focused framework to guide your exploration.

Step 4: Exploratory Data Analysis (EDA): Leverage exploratory techniques to gain initial insights. Visualize data distributions, patterns, and correlations, fostering a deeper understanding of the dataset. These explorations serve as a foundation for more nuanced analysis.

Step 5: Data Preprocessing and Transformation: Cleanse and preprocess the data to eliminate noise, handle missing values, and ensure consistency. Transform data formats or scales as required, preparing the dataset for further analysis.

Data Preprocessing and Transformation in case study

Step 6: Data Modeling and Method Selection: Select analytical models aligning with the case study’s problem, employing statistical techniques, machine learning algorithms, or tailored predictive models.

In this phase, it’s important to develop data modeling skills. This helps create visuals of complex systems using organized data, which helps solve business problems more effectively.

Understand key data modeling concepts, utilize essential tools like SQL for database interaction, and practice building models from real-world scenarios.

Furthermore, strengthen data cleaning skills for accurate datasets, and stay updated with industry trends to ensure relevance.

Data Modeling and Method Selection in case study

Step 7: Model Evaluation and Refinement: Evaluate the performance of applied models rigorously. Iterate and refine models to enhance accuracy and reliability, ensuring alignment with the objectives and expected outcomes.

Step 8: Deriving Insights and Recommendations: Extract actionable insights from the analyzed data. Develop well-structured recommendations or solutions based on the insights uncovered, addressing the core problem or question effectively.

Step 9: Communicating Results Effectively: Present findings, insights, and recommendations clearly and concisely. Utilize visualizations and storytelling techniques to convey complex information compellingly, ensuring comprehension by stakeholders.

Communicating Results Effectively

Step 10: Reflection and Iteration: Reflect on the entire analysis process and outcomes. Identify potential improvements and lessons learned. Embrace an iterative approach, refining methodologies for continuous enhancement and future analyses.

This step-by-step roadmap provides a structured framework for thorough and effective handling of a data analytics case study.

Now, after handling data analytics comes a crucial step; presenting the case study.

Presenting Your Data Analytics Case Study

Presenting Your Data Analytics Case Study

Presenting a data analytics case study is a vital part of the process. When presenting your case study, clarity and organization are paramount.

To achieve this, follow these key steps:

Structuring Your Case Study: Start by outlining relevant and accurate main points. Ensure these points align with the problem addressed and the methodologies used in your analysis.

Crafting a Narrative with Data: Start with a brief overview of the issue, then explain your method and steps, covering data collection, cleaning, stats, and advanced modeling.

Visual Representation for Clarity: Utilize various visual aids—tables, graphs, and charts—to illustrate patterns, trends, and insights. Ensure these visuals are easy to comprehend and seamlessly support your narrative.

Visual Representation for Clarity

Highlighting Key Information: Use bullet points to emphasize essential information, maintaining clarity and allowing the audience to grasp key takeaways effortlessly. Bold key terms or phrases to draw attention and reinforce important points.

Addressing Audience Queries: Anticipate and be ready to answer audience questions regarding methods, assumptions, and results. Demonstrating a profound understanding of your analysis instills confidence in your work.

Integrity and Confidence in Delivery: Maintain a neutral tone and avoid exaggerated claims about findings. Present your case study with integrity, clarity, and confidence to ensure the audience appreciates and comprehends the significance of your work.

Integrity and Confidence in Delivery

By organizing your presentation well, telling a clear story through your analysis, and using visuals wisely, you can effectively share your data analytics case study.

This method helps people understand better, stay engaged, and draw valuable conclusions from your work.

We hope by now, you are feeling very confident processing a case study. But with any process, there are challenges you may encounter.

EDNA AI Advertisement

Key Challenges in Data Analytics Case Studies

Key Challenges in Data Analytics Case Studies

A data analytics case study can present various hurdles that necessitate strategic approaches for successful navigation:

Challenge 1: Data Quality and Consistency

Challenge: Inconsistent or poor-quality data can impede analysis, leading to erroneous insights and flawed conclusions.

Solution: Implement rigorous data validation processes, ensuring accuracy, completeness, and reliability. Employ data cleansing techniques to rectify inconsistencies and enhance overall data quality.

Challenge 2: Complexity and Scale of Data

Challenge: Managing vast volumes of data with diverse formats and complexities poses analytical challenges.

Solution: Utilize scalable data processing frameworks and tools capable of handling diverse data types. Implement efficient data storage and retrieval systems to manage large-scale datasets effectively.

Challenge 3: Interpretation and Contextual Understanding

Challenge: Interpreting data without contextual understanding or domain expertise can lead to misinterpretations.

Solution: Collaborate with domain experts to contextualize data and derive relevant insights. Invest in understanding the nuances of the industry or domain under analysis to ensure accurate interpretations.

Interpretation and Contextual Understanding

Challenge 4: Privacy and Ethical Concerns

Challenge: Balancing data access for analysis while respecting privacy and ethical boundaries poses a challenge.

Solution: Implement robust data governance frameworks that prioritize data privacy and ethical considerations. Ensure compliance with regulatory standards and ethical guidelines throughout the analysis process.

Challenge 5: Resource Limitations and Time Constraints

Challenge: Limited resources and time constraints hinder comprehensive analysis and exhaustive data exploration.

Solution: Prioritize key objectives and allocate resources efficiently. Employ agile methodologies to iteratively analyze and derive insights, focusing on the most impactful aspects within the given timeframe.

Recognizing these challenges is key; it helps data analysts adopt proactive strategies to mitigate obstacles. This enhances the effectiveness and reliability of insights derived from a data analytics case study.

Now, let’s talk about the best software tools you should use when working with case studies.

Top 5 Software Tools for Case Studies

Top Software Tools for Case Studies

In the realm of case studies within data analytics, leveraging the right software tools is essential.

Here are some top-notch options:

Tableau : Renowned for its data visualization prowess, Tableau transforms raw data into interactive, visually compelling representations, ideal for presenting insights within a case study.

Python and R Libraries: These flexible programming languages provide many tools for handling data, doing statistics, and working with machine learning, meeting various needs in case studies.

Microsoft Excel : A staple tool for data analytics, Excel provides a user-friendly interface for basic analytics, making it useful for initial data exploration in a case study.

SQL Databases : Structured Query Language (SQL) databases assist in managing and querying large datasets, essential for organizing case study data effectively.

Statistical Software (e.g., SPSS , SAS ): Specialized statistical software enables in-depth statistical analysis, aiding in deriving precise insights from case study data.

Choosing the best mix of these tools, tailored to each case study’s needs, greatly boosts analytical abilities and results in data analytics.

Final Thoughts

Case studies in data analytics are helpful guides. They give real-world insights, improve skills, and show how data-driven decisions work.

Using case studies helps analysts learn, be creative, and make essential decisions confidently in their data work.

Check out our latest clip below to further your learning!

Frequently Asked Questions

What are the key steps to analyzing a data analytics case study.

When analyzing a case study, you should follow these steps:

Clarify the problem : Ensure you thoroughly understand the problem statement and the scope of the analysis.

Make assumptions : Define your assumptions to establish a feasible framework for analyzing the case.

Gather context : Acquire relevant information and context to support your analysis.

Analyze the data : Perform calculations, create visualizations, and conduct statistical analysis on the data.

Provide insights : Draw conclusions and develop actionable insights based on your analysis.

How can you effectively interpret results during a data scientist case study job interview?

During your next data science interview, interpret case study results succinctly and clearly. Utilize visual aids and numerical data to bolster your explanations, ensuring comprehension.

Frame the results in an audience-friendly manner, emphasizing relevance. Concentrate on deriving insights and actionable steps from the outcomes.

How do you showcase your data analyst skills in a project?

To demonstrate your skills effectively, consider these essential steps. Begin by selecting a problem that allows you to exhibit your capacity to handle real-world challenges through analysis.

Methodically document each phase, encompassing data cleaning, visualization, statistical analysis, and the interpretation of findings.

Utilize descriptive analysis techniques and effectively communicate your insights using clear visual aids and straightforward language. Ensure your project code is well-structured, with detailed comments and documentation, showcasing your proficiency in handling data in an organized manner.

Lastly, emphasize your expertise in SQL queries, programming languages, and various analytics tools throughout the project. These steps collectively highlight your competence and proficiency as a skilled data analyst, demonstrating your capabilities within the project.

Can you provide an example of a successful data analytics project using key metrics?

A prime illustration is utilizing analytics in healthcare to forecast hospital readmissions. Analysts leverage electronic health records, patient demographics, and clinical data to identify high-risk individuals.

Implementing preventive measures based on these key metrics helps curtail readmission rates, enhancing patient outcomes and cutting healthcare expenses.

This demonstrates how data analytics, driven by metrics, effectively tackles real-world challenges, yielding impactful solutions.

Why would a company invest in data analytics?

Companies invest in data analytics to gain valuable insights, enabling informed decision-making and strategic planning. This investment helps optimize operations, understand customer behavior, and stay competitive in their industry.

Ultimately, leveraging data analytics empowers companies to make smarter, data-driven choices, leading to enhanced efficiency, innovation, and growth.

author avatar

Related Posts

4 Types of Data Analytics: Explained

4 Types of Data Analytics: Explained

Data Analytics

In a world full of data, data analytics is the heart and soul of an operation. It's what transforms raw...

Data Analytics Outsourcing: Pros and Cons Explained

Data Analytics Outsourcing: Pros and Cons Explained

In today's data-driven world, businesses are constantly swimming in a sea of information, seeking the...

What Does a Data Analyst Do on a Daily Basis?

What Does a Data Analyst Do on a Daily Basis?

In the digital age, data plays a significant role in helping organizations make informed decisions and...

analytical framework case study

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

A system dynamics-based synergistic model of urban production-living-ecological systems: An analytical framework and case study

Contributed equally to this work with: Jiawei Wu, Junlin Huang

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Software, Validation, Visualization

Affiliation School of Geographical Sciences, Hunan Normal University, Changsha, China

ORCID logo

Roles Formal analysis, Funding acquisition, Methodology, Supervision, Writing – review & editing

* E-mail: [email protected]

  • Jiawei Wu, 
  • Junlin Huang

PLOS

  • Published: October 19, 2023
  • https://doi.org/10.1371/journal.pone.0293207
  • Reader Comments

Fig 1

Human-land coordination represents urbanization and is a key component of urban modernization. In this study, the theory of system dynamics was introduced, in which a "production-living-ecological" complex system was used based on the human-land coordination concept. Moreover, the characteristics of system dynamics of causal cycle, dynamic and sustainable development, man-land synergy, integrity and openness, and self-organization and adaptability were analyzed by dividing it into three subsystems: urban production, urban living, and urban ecological subsystems. Here, causal feedback and system structure flow diagrams were designed using causal loop diagrams and system structure flow diagrams to evaluate the causal relationships between variables and quantitatively analyzing their interactions between variables and predicting the future development of variables. Changsha City, China was selected as the case study area, where we constructed system dynamics practice equation model was then constructed to determine the interaction between the subsystems. Our findings indicate that by the year 2035 in the future, the overall trend of factors influencing the function of the subsystems such as population, GDP and built-up area are positively correlated with an increasing trend, and there are interactions between. Furthermore, these factors interact with each other, and a mutual correlation was found among the production-living-ecological functions system, Therefore, this study provides a novel perspective and exploratory practice for the study of the synergistic coupling of ecological, production, and living functions of cities and evaluating high-quality development of cities. Thus, the coupling and coordination of urban production, living and ecological functions reflects the coupling and coordination of the "people-land" relationship, which is the key to high-quality urban development.

Citation: Wu J, Huang J (2023) A system dynamics-based synergistic model of urban production-living-ecological systems: An analytical framework and case study. PLoS ONE 18(10): e0293207. https://doi.org/10.1371/journal.pone.0293207

Editor: Xingwei Li, Sichuan Agricultural University, CHINA

Received: May 19, 2023; Accepted: October 7, 2023; Published: October 19, 2023

Copyright: © 2023 Wu, Huang. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and Supporting Information files.

Funding: This research was funded by National Natural Science Foundation of China Grant Program [52008167]. The funders were involved in research design content.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In 2022, the Chinese government proposed a people- centered approach to urbanization, which incorporates a regional economic and territorial space structure to support high-quality development. Urbanization promotes coordinated regional development and fosters an improved quality of life for the people. It involves the spatial expansion of cities and towns and modernization of systems and cultures [ 1 – 3 ]. Hence, the "human-land" relationship is considered as an essential tool to identify approaches to high-quality development of urban areas [ 4 – 7 ]. The human-land relation demonstrates a complementary relationship between human economic activities and environmental resource capacity [ 4 , 8 , 9 ]. Furthermore, in terms of urban function, it reflects the degree of rationalization and efficiency of "production, living and ecology" functions in the city [ 10 , 11 ]. By understanding the coordinated developments between the subsystems of the "production-living-ecological", proper urban development can be achieved [ 12 – 14 ]. However, for a long time, the accelerated process of urbanization has put serious pressure on the ecological space of cities and human production activities, such as the uncontrolled expansion of construction land, have continuously caused negative impacts on the ecological environment. This has resulted in the trade-offs and incoherence between the "production-life-ecology" functions of cities, and has attracted the attention of many scholars [ 15 , 16 ].

Since 2012, research on the "production-living-ecological" system has gained attention. The concept includes three primary aspects: The first is the definition of the connotation of the "production-living-ecological" functions and the construction of the index system and the evaluation based on the index system [ 17 – 20 ]. The second is to analyze the relationship between the coordination characteristics of the "production-living-ecological" functions and land planning [ 21 ]. Scholars such as Zhang Z, Shan Y, and Ni W quantitatively analyzed the process of mutual influence of transformation and coupling coordination among the "production-living-ecological" functions and proposed the theory that production functions determine living functions and living functions influence ecological functions. For example, the alpine grassland on the Qinghai-Tibet Plateau need to realize the coordinated development of ecological, production, and living functions of the alpine grassland ecosystem by regulating the population carrying capacity according to the mutual influence mechanism and reasonable proportional structure of the production-living-ecological functions [ 22 – 25 ]. Lastly, identifying and analyzing the "production-living-ecological" space using the land functions perspective [ 26 ]. For instance, Fu [ 27 ]and Heng et al [ 28 , 29 ]. used the "production-living-ecological" and found out that there was a poor overall spatial arrangement, in which spatial functions were not complementary and integrated.

Several studies like those of Hu et al [ 30 ]. have adopted research methods from other disciplines to examine the national spatial data. Here, the systems theory was used to understand the wetland production-living-ecological complex system and its synergies. Meanwhile, Gu et al [ 31 ]. used the system dynamics theory to predict the urbanization rate of China in the next 50 years, while Yi et al [ 32 ]. also adopted the same theory in municipal territorial spatial planning. Hence, system dynamics can be a suitable approach in predicting future development changes and solving complex nonlinear system problems, using scenario simulations and models that combine qualitative and quantitative data on various system levels and their interactions.

Generally, existing studies on the coordination of production-living-ecological functions have been performed from a systematic perspective. However, these have been primarily focused on the internal mechanisms of human-land relationship coordination [ 33 , 34 ], and strategies to promote the coordination, balance and sustainable development of geographical environment and human well-being [ 35 , 36 ]. Meanwhile, research particularly focused on the coordination between human and "production-living-ecological" functions have remained lacking, in which only few studies have attempted to combine only two of the "production-living-ecological" functions, and urban ecology, production, living functions. The projections of the interrelationships between the "production-life-ecology" functions of the future city are limited, which does not provide recommendations for the future development of the city and alerts for risk avoidance. Moreover, urban ecology, production, living functions, and their correlation to the people have not been extensively explored under the system dynamics framework. This study uses the theoretical framework of the human-earth system. It also applies the system dynamics theory [ 37 ] using a holistic-systems thinking approach to develop the complex "production-living-ecological" system, consisted of the urban production, living, and ecology systems. Using Changsha City, China, as a case study area, we employ system dynamics modeling to project the future development of production, living, and ecological functions within the city. By exploring the external characterization of the system layer by layer to the internal structure of the system, the operation mechanism of the integrated state of the urban human-land coupling and coordination relationship is obtained.[ 38 ]. It is expected that this study may provide valuable insights and recommendations to improve the spatial governance of Changsha and develop a theoretical framework for new urbanization approaches.

Material and methods

Study area and data sources.

Changsha City is in the northeastern portion of Hunan Province in China ( Fig 1 , created using ARCGIS 10.2), covers an area of 11,819 km2. It is an important node city of the central urban agglomeration of the Yangtze River and the Yangtze River Economic Belt. It includes six districts, with one county and two county-level cities, such as Yuelu, Yuhua, Furong, Tianxin, Kaifu, and Wangcheng District and Changsha County, as well as Liuyang City and Ningxiang City so it has an outstanding locational advantage. In 2020, it had a population of approximately 10 million, increasing by 42.71% over the last decade. In 2020 gross domestic product (GDP) in 2020 was 1214.252 billion yuan, signifying a 4% increase relative to its previous year [ 39 , 40 ], well above the provincial average. Additionally, Changsha serves as a pivotal grain production center in China and a testing ground for the comprehensive reform of the "two-oriented society" [ 41 ]. Furthermore, its historical and cultural significance further underscores its research value. For a special human geographic unit such as Changsha, as the core growth pole of economic development in Hunan Province, it is extremely important to implement qualitative and quantitative research on the prediction of the future development of its urban "production-life-ecology". In China, there are many Chinese cities that are similar to Changsha in terms of geographic location, resources, and economic development, similar geographic units to Changsha City also include Wuhan and Suzhou, especially Wuhan, which shares many similarities with Changsha in terms of geographic location, ecological environment, and economic development, and many scholars have also focused on the development of Wuhan’s "production-life-ecology" space. For example, scholars used a system dynamics model to study the coordination of the "production-life-ecology" space in Wuhan and Suzhou [ 42 , 43 ]. Based on the previous studies, the importance of Changsha’s geographic location and its rapid economic and social development make it feasible research area for this study, and the study of Changsha can provide a reference for the development of Chinese cities.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0293207.g001

The model data were obtained from the Statistical Yearbook of Hunan Province and the Statistical Yearbook of Changsha City for the period 2010–2019.

Qualitative and quantitative analyses were used to analyze the dynamic characteristics of the production-living-ecology system, including its structure and subsystems and their interactions, and predict future development changes in the subsystems.

Qualitative analysis of the urban production-living-ecology complex system model

A composite system is a high-level system generated from the coupling of two or more systems, in which the original system is considered as one of its subsystems. Here, various interactions between the elements occur [ 44 , 45 ]. Specifically, a coupled coordination not only represents the state of the system, but also its imposed role on the system. Coordination in the former refers to the harmonious relationship between various elements, including their cooperation, complementarity, and synergy that allows the system to maintaining an optimum overall effect or function [ 46 ]. Using this qualitive relationship, Fig 2 shows the relationship between human-land coordination and prime urban functions development.

thumbnail

https://doi.org/10.1371/journal.pone.0293207.g002

Production function involves the utilization of land as the means of labor to directly generate a wide array of products and services, such as the primary, secondary, and tertiary industries to provide residents with the supplies and services needed for production and life including the financial and insurance services industry to provide as a subdivision of the production function. In addition, the life function refers to the various spatial bearings, material and spiritual security functions provided by the land during the process of human survival and development. The life function refers to the various spatial carrying, material and spiritual protection functions provided by the land in the process of human survival and development. It includes elements like leisure, which contributes to people’s well-being by offering relaxation and enhancing their overall quality of life. Additionally, it involves the consumption function, facilitating shopping and consumption for city residents. Both leisure and consumption functions are subdivisions of the life function. Lastly, the ecological function relates to the ecosystems and ecological processes that sustain the natural conditions necessary for human survival and well-being. This encompasses ecological scenic areas, parks, and green spaces that maintain a vital ecological environment for both life and work [ 47 , 48 ].

1. "Causal cycle" characteristics of the urban "production-living-ecological" system

The urban system is composed of multiple elements that interact and overlap each other, forming complex causal relationships. This causal relationship is the basic law of urban systems and considered as the basic unit for analyzing urban complex systems. The economist Gonna Murdar stated that urban systems involved a process of continuous evolution, in which technological, social, economic, and cultural factors, among the few, contribute to this as interrelating, and mutually influencing and causal factors that form a vicious cycle with cumulative effects [ 49 ]. As cities, production, living, and ecological factors also interact with each other in the "production-living-ecological" system, a vicious cycle may occur in cases where one factor is mismanaged. For instance, an uncoordinated development of urban life and production factors may lead to a destruction of ecological factors, which may consequently affect production development. Furthermore, a poor environment condition may lead to lesser business investments. Poor development and utilization of land use affects the potential overall quality of life of the population, owing to reduced happiness and increased disease incidence. Consequently, this lowers the capacity for improving and protecting ecological spaces, as financial and material resource investments are also reduced, forming a vicious circle. Hence, analyzing the urban complex system in consideration with the mutual causality between urban subsystems based on the current urban development conditions is necessary.

2. Dynamics and Sustainable Development Characteristics of the Urban " production-living-ecological " System

Cities are inherently influenced by internal and external factors and are constantly moving and changing. In the process of development, negative impacts such as environmental pollution, pressure on urban infrastructure, and housing constraints arise from the disruption of urban system functions, in which it consequently leads to economic, social, and ecological imbalances and threaten sustainable urban development. Impacts such as these are also simultaneously coupled with the increasing intensity of anthropogenic activities, which further obscures the boundaries between the interacting subsystems [ 50 , 51 ]. Sustainable development of cities requires a coordinated development of the urban production, living, and ecological elements, ensuring that immediate and long-term interests are accounted for and the degree of development of each element and their interaction with the other elements are understood to ensure synergy between these elements within the entire system.

3. People and place synergy in the urban "production-living-ecological" system

The urban "production-living-ecological" system consists of various synergies among its subsystems and elements. It is essentially considered as a human-land relationship system as the "production-living-ecological" comprise the environmental elements, while the various activities occurring comprise the "human elements". Here, the overall stability is dependent of the level of coordination between and among these elements. For instance, a carrying capacity exists for environmental elements, in which a potential irreversible collapse may occur as a result of intensive human activities. Hence, a balance between these elements requires a harmonious coexistence and interaction between human and land [ 52 ]. This relationship also comprises the various functions of the subsystems within the human-land system [ 53 ], in which the production, living, and ecological spatial subsystems form a feedback system that is essential to understanding how the complex urban system can be optimally managed. In addition, understanding its historical evolution process is also required as it may serve as basis for potential urban development strategies, accounting for the driving mechanism of change through time. This is also to ensure that an objective approach is taken and potential personal interests in urban development are eliminated to ensure the achievement of sustainable urban development.

4. Integrity and openness of the urban "production-living-ecological" system

A city is the center of human life and business production. It is an open system as exchanges of internal elements between its internal and external areas are constantly and continuously occurring. In this process, unfavorable exchange conditions inevitably arise that makes the entire urban system function in both order and disorder. An open system tends to be disorderly as collisions such as urban political territory and urban identity issues occur and disrupt its equilibrium [ 54 ]. Therefore, a disorderly state can be transformed into an orderly state by coordinating the various elements, particularly the production, living, and ecology elements. A higher degree of decrease in disorder of the system requires a higher degree of coordination [ 54 ]. In this study, the urban area of Changsha was also considered as an open system, in which its internal "production-living-ecological" functions was attempted to be coordinated to ensure that an optimal orderly state is achieved, in addition to maintaining an efficient material and energy exchange between its internal and external urban areas.

5. Self-adaptability and self-organization of the urban "production-living-ecological" system

Composite systems are a combination of natural and man-made systems, in which it has the ability to both self-organize and regulate while also being externally regulated and managed through the introduction of various methods. Urban composite systems possess a characteristic of these two systems, where it can self-organize as a result of various human activities [ 55 , 56 ]. Hence, a coordinated and balanced human production activity with ecological buffers ensure the capability of the system to self-organize and regulate. However, when human interests such as those activities involving the prioritization of higher economic gains without consideration of the resource carrying capacity, urban economic destabilization and ecological imbalance ensue [ 57 ], which consequently affects the self-adaptive capacity of the urban system. To avoid this, urban systems must be allowed to recover and self-organize using appropriate land planning and management.

Through the qualitative analysis mentioned above, from a systemic point of view, in the process of urban development, cities have the characteristics of a systemic "cause and effect cycle", dynamics and sustainable development, synergy between people and land, wholeness and openness, as well as self-adaptation and self-organization, which clearly determine the fundamental problems of the city. To gain a clear understanding of fundamental urban issues, we must initiate our analysis from within the city’s internal structure, while duly considering all constituent elements of the "people" and "land" relationship.

Quantitative analysis of the urban complex production-living-ecology system

As an urban system is complex with multiple components and dynamic interconnections, a single qualitative analysis cannot solely determine its behavioral and functional characteristics. This study explores the coupling and coordination of the "production-living-ecological" system in Changsha to determine the dynamics of the production-living-ecology system by constructing a composite system. It also aims to predict the changes of variables such as GDP in the model using quantitative analysis, as changes in GDP affect fiscal revenues that determines the amount of resource allocation for urban green space areas, including pertinent ecological indicators.

1. Causal feedback flow chart

The mechanism of interaction between the living and production subsystem involves provision of the necessary material materials for the living by the production, while the living subsystem concurrently provides the necessary labor for the production, in addition to acting as consumers as well. Hence, a satisfactory living can consequently improve the production efficiency.

Meanwhile, the production subsystem produces solid, gas, and liquid wastes, that are excreted from production areas to the ecological environment. This causes ecological environmental pollution. However, improving production benefits may also provide the necessary economic support to further protect the physical environment and its ecological functions, because only with a developed economy and increased financial revenues, people’s investment in ecological protection increases relatively. Among other things, a developed economy can also lead to a higher environmental awareness among its people.

The mechanism of interaction between the ecological and living subsystems involves the provision of a good living environment for the people, resulting in an improved quality of life and happiness. Consequently, the former also affects the latter through generation of domestic garbage that reduces the quality of living function.

Based on these feedback systems, the Vensim PLE software was used to analyze the causal feedback flow diagram of the of production, living, and ecological subsystems in Changsha, as shown in Fig 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0293207.g003

2. System structure flow diagram

The causal feedback loop diagram depicts the fundamental linkages and causal interconnections between subsystems, whereas system dynamics flow diagrams are needed for a more in-depth analysis of the interactions of the system’s components and the forecasting of the system’s future. Referring to previous studies, the variables in the model below were selected. The Vensim PLE software-based production-living-ecological system flow diagram for the city is displayed ( Fig 4 ). Based on this, the framework for the production-living-ecology complex system dynamics model was built.

thumbnail

https://doi.org/10.1371/journal.pone.0293207.g004

3. System dynamics equation

The urban production-living-ecology system is characterized by a complex and dynamic cause-effect cycle, integrity and openness, self-organization, and self-adaptive system dynamics. These complexities have a significant impact on development and human-earth synergy. Hence, general mathematical methods were not sufficient to provide quantitative and accurate descriptions and analysis. For instance, the "production-living-ecological" coupling formula can only examine the level of reciprocal coupling among urban systems, but cannot accurately forecast future system evolution. Instead, we employed a system dynamics model, which linearly analyses GDP, total population, landscaped area, and arable land as the influencing factors was linearly analyzed to provide a more accurate representation of the complex relationship between the subsystems. Here, that the state variables of production, living and ecosystem were interrelated and mutually exclusive and were derived from advanced artificial determination. Based on a large body of literature, the correlation between similar measures of "production-life-ecology" in Chinese cities has been confirmed by numerous studies. The system dynamics equations as detailed in ( Table 1 ), describe the quantitative interrelationships between variables in the structural flow diagram of a system using historical data fitting variables of Changsha, the current data was used to predict and analyze future conditions in 2035.

thumbnail

https://doi.org/10.1371/journal.pone.0293207.t001

The results of the simulation part of the system dynamics simulation were compared and analyzed with the existing historical data to test the reliability of the model. However, owing to the complexity of the model and large number of variables, this study focused on the historical verification of the changes in the total population, built-up area, arable land area, GDP, and landscaped area of Changsha City. The errors between the simulated data and the actual data were mostly within 10%, and the model exhibited a relatively reliable result ( Table 2 ).

thumbnail

https://doi.org/10.1371/journal.pone.0293207.t002

Qualitative analysis of the complex "production-living-ecological" system structure in Changsha City

The concept of the "production-living-ecological" system was proposed in this study based on land use functions, derived from the "element-structure-function" in the systems theory, as the system function is dependent on the structure of the system. Each subsystem of the complex production-living-ecology system was developed and operated to obtain a coordinated structure. Fig 5 presents the internal structure analysis diagram used in this study.

thumbnail

https://doi.org/10.1371/journal.pone.0293207.g005

Productive subsystem of Changsha City.

The production subsystem is composed of industrial, commercial, and other economic activity spaces in the city. It provides the necessary material conditions for urban development, and it has a non-negligible influence on the other subsystems in the model. It encompasses all the primary economic production function of the country, including both direct and indirect economic productions. The former refers to all areas that directly generate goods and services such as agricultural lands, whereas the latter refers to all areas used for intermediate transport and housing of these goods and services such as urban construction areas. As Changsha City is mainly an industrial area, it is primarily composed of areas under indirect production [ 58 , 59 ].

Human activities are fundamentally driven by addressing human needs, making them integral to the indirect production process. In recent years, the production space in Changsha City has been mainly shaped by the secondary and tertiary industries. The development of the secondary and tertiary industries relies on the development of high-quality talents and basic labor force, as well as tourism consumption due to increase in population. These factors are closely related to living space and ecological space. Consequently, the internal structure of the production subsystem should be appropriately coordinated with the ecological and living subsystems to provide a better material basis for a more optimum living function balanced with green ecological function.

Living subsystem of Changsha City.

The living function refers to the sum of various areas for daily human activities. According to Wang [ 60 ], living function cover six aspects based on the conduct of these daily human activities: living function, working function, leisure function, consumption function, public service function, and social function. Furthermore, the living subsystem often overlap with the production and ecological subsystems, as these are areas designed to meet basic living needs, which cannot be easily reduced or transformed and tends to restrict the development of the other subsystems. Generally, the ultimate goal of urban development is to satisfy these basic needs to a high level, Livability, comfort, and convenience form the core aspects of living space. However, when the resident population experiences rapid growth alongside a shortage of public service resources, it results in resource limitations, traffic congestion, and inadequate public services.

Consequently, addressing these challenges must take precedence in the coordinated development of Changsha City. Ecological subsystem of Changsha City.

The ecological refers to the functional characteristics of the physical environment, in which it provides the resources and inputs used for human activities such as organic matter and raw materials for food and material production, and ecological services such as biodiversity, climate regulation and mitigation, natural disaster protection, pest control and environmental purification, for the overall human well-being [ 61 ]. Hence, the ecological environment is key to protecting urban growth [ 62 ]. The ecological subsystem operates as an open system that is closely connected linked with the surrounding production and living subsystems, facilitating exchange of materials within and between regions. For example, the industrial pollution produced from the production function and the domestic garbage pollution by the living function will negatively impact ecological function. When the ecological function is compromised, healthy working and living environments is not conducive to the development of the city. To achieve an optimum value of the ecological function, the ecological subsystem must have an appropriate internal and external structure that is balanced with the living and ecological subsystems to allow symbiotic interactions and meet the development needs of Changsha City [ 52 ].

Coupling and coordination quantitative analysis of the "production-living-ecological" space system in Changsha City

According to the Changsha Statistical Yearbook, the total population of Changsha City in 2010 was 7,040,700, and the total GDP was 444,032 billion yuan; the per capita urban disposable income in 2010 was 23,347 yuan; the per capita rural disposable income in 2010 was 10,640 yuan. The arable land area in 2010 was 276.79 thousand ha, and the built-up area was 27239 ha. This study measures the development changes of the production-living-ecological system of Changsha City in the next 15 years based on the conditions from 2010–2019( Fig 6 ). Here, employment in the primary industry, secondary industry, and tertiary industry were derived from the total population, in which its corresponding output values were expected to reach 3,693.66 billion yuan in GDP in 2035.The GDP and living standard also gradually increased, in which the total population was positively correlated with the total GDP. Meanwhile, the economic development will promote the increase of population, as more public service infrastructures are constructed, which maintains the residence of local population in the city and increases the foreign population. Population growth also promotes the development of urban production function, such as accelerated urbanization resulted in the encroachment of the rural population into the urban areas, which would consequently increase the labor force and lower the labor cost. Similarly, it also increased consumption, which revitalized economic production and consumption. This reflects the relationship between population development and economic development. Here, gradual upward trends were observed. The expansion of built-up area is impacted by the rise of GDP. while gradually increases in the built-up areas were also observed along with the development of production space. Built-up areas were predicted to reach 70821.8 ha in 2035.Growth in built-up area as a result of economic progress and population growth However, because Changsha’s total area is limited, and ecological function takes a portion of it, the built-up area reaches a plateau, and the rate of growth gradually slows.

thumbnail

https://doi.org/10.1371/journal.pone.0293207.g006

As shown in Fig 6 , the economic development particularly increased green spaces, which improved environmental protection. On the one hand, people are more aware of environmental protection and have higher requirements for their living and working environment; on the other hand, economic development brings more investment in environmental protection. The reciprocal relationship between productive development and ecological protection is reflected here. Future scenarios indicate that Changsha City will continue to comply with the management requirements of the "three zones and three lines", implementing appropriate arable land protection measures. With this, slight changes in the arable land areas in Changsha City may be observed. It was also observed that the overall development of Changsha City required the preservation of ecological security and basic food security to have healthy and stable economic development, in which the production, living, and ecological functions the intensity of human activities must be maintained below the carrying capacity of the geographic environment. Long-term urban planning projects will shape the urban environment for decades, not just a single or two year cycles. Obtaining accurate urban growth projections is necessary to create a scientific city plan. The above predictions for Changsha’s progress over the next 15 years may provide urban planners, government agencies, and other stakeholders with a scientific basis.

Contribution of system dynamics theory to the study of urban system development

As a composite system, the "production-living-ecological" system of Changsha City was found to have a significant causal cycle, in which the internal factors have interrelated influence on each other, that may potentially form a beneficial or vicious cycle. The characteristic of the cycle depends on the degree of coupling and coordination of the systems [ 63 ]. For example, in the context of coupled production and living systems, Changsha’s population growth within a reasonable range will bring labor force and consumption power to the economic development. However, if population development exceeds the carrying capacity of Changsha’s resources, it will lead to some social problems such as traffic congestion, housing tension, urban villages and so on. Therefore, the development of Changsha City needs to be approached from a systemic point of view, from all aspects within the subsystems, to explore in depth the problems encountered in its development. With the dynamic and open characteristic of Changsha City, efficient communication with external resources is required to achieve efficient coupling of resources within the system to derive its benefits and ensure synergy and sustainability between the people and land [ 62 ]. With this, it provides new insights on urban planning approaches that incorporates a systemic perspective and eliminates traditional methods to urban planning.

System dynamics modelling for urban system collapse risk avoidance

While single analytical framework generally places the issue of "urban riskification" in a static system of categorization, along with globalization and post-industrialization, different types of urban risks have become intertwined, overlapping and compounding each other in a systemic way [ 64 ]. The case study of Changsha City demonstrates that cities are coupled within the system and interacts altogether, suggesting that the primary, secondary and tertiary industries, urban built-up area, and landscaped area are tightly linked. The entire system may be disrupted when one of these variables are overactive and uncontrolled. For instance, when human activities exceed the carrying capacity of the physical environment such as when there is uncontrolled population growth, the urban ecology is threatened, owing to the amount of waste disposed in the environment. Further, an oversupply of labor and a shortage of labor products also threatens the production function, which creates social problems, owing to unemployment. When the ecological environment is damaged, the comfort of urban living functions decreases, population out flows, and subsequently reduces labor force decreases consumption power. This, in turn, impacts the city’s productive functions, leading to urban decline.

The generalizability of urban "production-life-ecology" prediction models in the process of urban development

a) Scientific prediction of urban land use change can avoid the potential risk of urban expansion and other development modes to a certain extent [ 65 ]. The model in this paper aims to provide a certain theoretical basis for assessing future urban land use risks.

b) The innovation of energy technology. The enhancement of production function can provide sufficient financial support for the innovation of energy technology, which is a kind of mutual feedback mechanism. As the convenience and comfort of residents’ life increases and the ecological environment improves, the city can attract high-quality talents and improve the efficiency of energy technology innovation [ 66 ].

c)The guidance of the central government’s policy has a strong determining effect on the change of the city’s "production-life-ecology" function, and it is necessary to weigh and consider all the factors when optimizing the city’s planning strategy using the model presented in this paper. d) The central government’s policy essential to alter the city’s "production-life-ecology" function. The optimization of urban planning strategies using this model must be evaluated against the guidance of the central government’s policies and the specificity of the model itself [ 67 ].

d)Reform and redesign of agricultural production methods, which is a significant aspect of the city’s production function, is closely related to the innovation of energy technology, and indicates the continuous improvement of the model. The reform and redesign of agricultural production methods, which is closely related to the innovation of energy technology, indicates the continuous improvement of the model. The reform and redesign of the production methods of agriculture. Similarly, agricultural production is an important aspect of the productive function of the city, closely related to energy technology innovation, and indicative of the continuous improvement of the urban production-life-ecology model. The results of this study are similar to the findings of the study of the "Literature of the City" [ 68 – 70 ].

Globally, regarding the "production-life-ecology" development of cities, substantial studies have focused on the efficient development and utilization of land functions, established a comprehensive indicator system with 29 indicators and formulated a set of comprehensive assessment methods [ 71 ]. In addition, regarding the development of urban "production-life-ecology", previous studies revealed the importance of coordinating the promotion of the integration of production and urbanization with carbon emission reduction for the construction of a green economic system and in-depth participation in global environmental governance [ 72 ].

Limitations and prospects

Although Changsha City is comparatively more developed than other regions in central China, a coupled and coordinated development within the city is insufficient. Rather, the focus must be on urban development at a regional level. Secondly, the analysis model needs to be further optimized by selecting a higher number of model variables and designing an improved causal feedback path. Data accessibility was also limited, which affected data refinement. It is recommended that more accurate and comprehensive data on urban production, living, and ecology are incorporated. Moreover, the comprehensive and accurate data increases the generalizability and scientific validity of models. Furthermore, variable selection and indicators must be more defined to establish a more systematic. System models at different scales with subtle differences in variable selection. The system dynamics model emphasizes the coordination of subsystems and the synthesis of the larger system. At both township and city scales, the coupling and coordination of internal subsystems are essential for the city system to operate efficiently. However, there are differences in the internal subsystems and elements. For example, the township production function mainly refers to the agricultural, whereas cities focus on the indirect production of products derived from the primary sector. A more representative urban "human-land" relationship and situational simulation will be pursued in subsequent studies. Additionally, exploring micro-levels such as cities and towns may yield refined results.

Conclusions

The aim of this study was to investigate the mechanisms and coupling coordination between the "production-living-ecological" system of a city such as Changsha to determine the interaction among production, life, and ecological space subsystems. A qualitative analysis of the system dynamics was conducted to understand the characteristics and structure of the system, while a system dynamics model of the production-living-ecology complex system was constructed using the Vensim PLE software to determine the internal structure of the complex urban system. It was found that the overall trend of population, GDP, and built-up areas showed an upward trend. A clear correlation among the three factors was also found, in which mutual correlations among the "production-living-ecological" system had a significant influence on urban development and quality of life. Furthermore, this study showed that the system dynamics model can be highly applicable in investigating the coordination of urban production, living, and ecological functions. In the system dynamics model, each variable is interconnected with each other, and none of them exists independently. Our findings conclude that the production, living, and ecological functions of Changsha City are interacting with each other. The model in this paper predicts the changes of Changsha City in the next fifteen years can provide a scientific basis for the government policy makers and urban planners to plan the development of the production-living-ecology functions in the future. It is recommended that more accurate data are used to build a more representative model. This study provides a new basis for decision makers in improving and ensuring sustainable urban development.

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 55. Lai S K. Planning within complex urban systems. Routledge, 2020. https://10.1016/j.jum.2021.12.001 .

Triple Helix Dynamics and Hybrid Organizations: An Analysis of Value Creation Processes

  • Open access
  • Published: 22 April 2024

Cite this article

You have full access to this open access article

  • Gabriel Linton   ORCID: orcid.org/0000-0002-9517-1333 1  

The Triple Helix model, focusing on interactions among academia, industry, and government, has been an influential model for promoting innovation and regional development. However, limited research explores the model’s micro-level dynamics, alling for further investigation into its effectiveness. This study seeks to fill this gap by critically examining the micro-level dynamics of the Triple Helix model, with an emphasis on the roles played by a hybrid organization in the value creation process. Utilizing a case study approach, this research examines Robotdalen—a successful Swedish applied research initiative in robotics—to answer the research questions: How do value-creation activities within a Triple Helix model evolve and how do hybrid organizations facilitate and shape value creation throughout the development stages of an organization? The analysis contributes a fine-grained view of value creation and development over time in a large Triple Helix innovation initiative, highlighting the importance of hybrid organizations in facilitating collaboration and coordinating resources among stakeholders. The results identify critical factors such as collaboration, commercialization, innovation, and adaptation. This research contributes new theoretical insights and practical implications for leveraging hybrid organizations within the Triple Helix framework, thereby providing valuable guidance for policymakers, practitioners, and scholars engaged in crafting strategies to stimulate regional growth, innovation, and value creation in today’s dynamic global landscape.

Avoid common mistakes on your manuscript.

Introduction

The Triple Helix model, which explores the synergistic relationship between universities, industries, and governments, is a crucial model for understanding innovation and economic development in today’s globalized landscape (Amaral & Cai, 2023 ; Cai & Etzkowitz, 2020 ; Carayannis & Campbell, 2009 ; Etzkowitz & Brisolla, 1999 ; Etzkowitz & Leydesdorff, 2000 ). As knowledge and innovation have become increasingly critical factors for the sustainable development of any economy, the role of academic entrepreneurship has grown more significant (Cerver Romero et al., 2021 ; Feola et al., 2021 ; Guerrero & Urbano, 2012 ), positioning universities as central players in the economy (Audretsch, 2014 ). The Triple Helix model has been employed to explain innovation policies, knowledge transfer strategies, and addressing sustainable and inclusive growth (Carayannis & Rakhmatullin, 2014 ; de Lima Figueiredo et al., 2022 ; Farinha et al., 2016 ).

The Triple Helix model emphasizes potential synergies among wealth creation, knowledge production, and government regulations (Leydesdorff, 2012 ). According to Etzkowitz and Leydesdorff ( 2000 ) and Leydesdorff ( 2012 ), the university–industry–government interaction model has evolved, and frictions among the three domains (economics, science, and politics) can generate a plethora of opportunities for problem-solving and innovation (Ranga & Etzkowitz, 2013 ). The model also encourages identifying unevenness between institutional dimensions in arrangements and the social functions executed by such arrangements.

Although the Triple Helix model shows promise, previous studies have not effectively captured its full potential (Hasche et al., 2020 ; Miller et al., 2016 ). Researchers have identified several gaps in the literature, such as the insufficient focus on micro-level intricacies (McAdam and Debackere 2018 ), a lack of understanding of dynamic interactions between the three domains (universities, industries, and governments), and inadequate exploration of how value creation and knowledge transfer processes evolve within Triple Helix collaborations. This has prompted calls for research examining the Triple Helix from a micro perspective, concentrating on dynamic relationships, synergies, collaborations, coordinated environments, and value-creating activities (Cunningham et al., 2018 ; Edquist, 2011 ; Höglund & Linton, 2018 ; McAdam et al., 2012 ). Furthermore, existing research has yet to explore the specifics of knowledge transfer in the innovation process (Hakeem et al., 2023 ). Therefore, delving into Triple Helix’s micro aspects is crucial for a better understanding of value creation (Kriz et al.  2018 ) and its development over time (Pique et al.  2018 ), as well as the knowledge transfer process.

To address these gaps and gain a deeper understanding of the Triple Helix model, this study focuses on a micro-level investigation of the dynamics involved in value creation and knowledge transfer. The research questions are: (1) How do value-creation activities within a Triple Helix model evolve? and (2) What role do hybrid organizations play in value creation throughout an organization’s development stages? To address these questions, a case study analysis (Eisenhardt, 1989 ; Yin, 2009 ) of Robotdalen, an applied research initiative for robotics, is conducted. The Robotdalen case was chosen as it provides a unique opportunity to investigate a Triple Helix collaboration that has a long and rich history and has also been seen as a successful collaboration. The case allows analyzing the dynamics of value creation within a Triple Helix configuration, where the university, industry, and government collaborate (Etzkowitz & Leydesdorff, 2000 ) to create a supportive environment for robotic innovation. This research contributes to the Triple Helix literature by providing a fine-grained view of value creation development over time in a large Triple Helix innovation initiative with a focus on the role of the hybrid organization.

This paper is organized as follows: The introduction sets the context and presents the research question on the role of hybrid organizations in value creation within the Triple Helix model, motivating the case study of Robotdalen. The theoretical background reviews relevant literature on Triple Helix, value creation, and hybrid organizations. The method section details the case study approach and data collection process. The case analysis presents the case of Robotdalen over time. The discussion section focuses on key themes of collaboration, commercialization, innovation, and adaptation. The conclusion section explores the implications of the findings for the Triple Helix literature and the role of hybrid organizations in fostering innovation and value creation and summarizes the main findings, acknowledges limitations, and proposes future research directions.

Theoretical Background

The Triple Helix model has gained considerable traction as a central model for understanding and promoting innovation and regional development in knowledge-based societies. By emphasizing the importance of collaboration and networking among actors from academia, industry, and government, the model has inspired a wide range of applications and research directions. In particular, the development of entrepreneurial ecosystems, knowledge clusters, and regional and national innovation policies have all benefited from the Triple Helix model (Galvao et al., 2019 ).

As scholars have continued to develop the Triple Helix model, several key themes and concepts have emerged to further enrich our understanding of the dynamics and processes involved in collaboration, innovation, and value creation. For example, Carayannis and Campbell ( 2009 , 2010 ) introduced the concept of the Quadruple Helix, which adds a fourth sphere—civil society—into the mix, emphasizing the role of users, consumers, and communities in driving innovation and regional development. This perspective emphasizes how crucial user-driven and demand-oriented innovation is in today’s knowledge-based societies. Recent scholarship proposes that Quadruple and Quintuple Helix dynamics can be understood as interlinked Triple Helix configurations, facilitating a more agile governance approach to innovation (Leydesdorff & Smith, 2022 ; Xue & Gao, 2022 ). The neo-Triple Helix model further integrates societal and environmental dimensions into the innovation ecosystems (Cai, 2022 ).

In the Triple Helix literature, the concept of hybrid organizations refers to entities that combine the characteristics and functions of two or more institutional spheres, such as academia, industry, and government, to advance innovation and economic development. Hybrid organizations have become increasingly important in the Triple Helix literature due to their unique position in facilitating and managing interactions among the different institutional spheres of academia, industry, and government (Champenois & Etzkowitz, 2017 ; Hasche et al., 2020 ). These organizations serve as bridges between the various stakeholders, helping to navigate the complexities and inherent challenges of multi-stakeholder collaborations. Hybrid organizations can take on various forms, such as technology transfer offices, innovation intermediaries, public–private partnerships, and university-industry research centers. These organizations play a crucial role in aligning stakeholder interests, promoting knowledge exchange, and facilitating resource sharing among the different actors involved in Triple Helix collaborations (Howells, 2006 ; Ranga & Etzkowitz, 2013 ). One of the key capabilities of hybrid organizations is their ability to manage the inherent tensions and conflicts that may arise in multi-stakeholder collaborations (Aarikka-Stenroos & Ritala, 2017 ). This often involves finding a balance between competing interests, such as the need for open knowledge sharing and the protection of knowledge, such as intellectual property rights, or the pursuit of fundamental research versus the demands for commercialization and market-driven innovation.

In addition to balancing conflicting interests, hybrid organizations must also manage diverse stakeholder expectations (Mair et al., 2015 ). For example, academic institutions may prioritize knowledge production and the advancement of scientific understanding, while industry partners may be more focused on the development of marketable products and services. Hybrid organizations must effectively communicate and coordinate the needs and expectations of each stakeholder to ensure a mutually beneficial collaboration. Furthermore, hybrid organizations play a critical role in creating an environment conducive to collaboration and trust-building among Triple Helix stakeholders. This involves fostering a culture of openness, transparency, and mutual respect, as well as establishing clear governance structures and processes to guide the collaborative efforts (Hasche et al., 2020 ).

These hybrid organizations take several forms, including Technology Transfer Offices, Innovation Intermediaries, Public–Private Partnerships, and University-Industry Research Centers, each having unique roles and characteristics (see Table  1 ). Technology Transfer Offices, as defined by Siegel et al. ( 2003 ), facilitate the transfer of technology from universities to industry, linking research outputs with industry needs and managing intellectual property. Innovation Intermediaries, according to Howells ( 2006 ), act as brokers within the innovation system, facilitating collaborations, and connecting various actors while providing innovation support services. Hodge and Greve ( 2007 ) describe Public–Private Partnerships as cooperative initiatives between government and private entities for the joint development and management of projects. Lastly, University-Industry Research Centers, as defined by Perkmann et al. ( 2013 ), are collaborative research units involving universities and industry for joint research initiatives, knowledge exchange, and student training.

Among these categories, Robotdalen best aligns with the definition of an Innovation Intermediary. It serves as a catalyst in the Swedish robotics innovation ecosystem, facilitating partnerships and collaborations among various actors such as universities, large corporations, SMEs, and different levels of government. By navigating these complex multi-stakeholder collaborations, Robotdalen fosters knowledge exchange, manages diverse stakeholder interests, and plays an instrumental role in supporting the commercialization of innovative robotic solutions with a particular emphasis on health, field, and industry applications. Additionally, it provides a range of support services, including market analysis, product development, and project management, further substantiating its role as an Innovation Intermediary within the Triple Helix model.

Micro-level and Value Creation in the Triple Helix

The perspective taken in this research is that of the Triple Helix as a network of relationships, where public and private organizations interact in value-creating processes to transform various inputs into valuable outputs for themselves and others. As shown in Table  2 , these interactions can manifest in a variety of value outcomes, ranging from job creation to interdisciplinary collaboration. In this setting, the hybrid organizations act as the glue connecting the various actors within the network and as a catalyst between them. Based on the social exchange theory (Cook and Emerson 1978 ), the relationships discussed are not viewed as created and developed in isolation. Instead, relationships are regarded as part of a broader context, that is, a network of interdependent relationships. In a Triple Helix setting, the actors, resources, and actions are seen as the context that supports value creation. According to Payne et al. ( 2008 ), value is created when various players combine their efforts and resources in order to achieve a specific goal. Before beginning the process, these actors must have standards for the collaboration and its outcomes. For each actor engaged, the expectations and goals may be the same or distinct. For instance, actors can create something through the combination of resources and actions that cannot be accomplished alone (Hasche 2013 ). Value can be difficult to define, in this research the value is seen as something that relies on the viewpoints of the specific actor (van der Haar et al. 2001 ). The interaction between actors will influence how the value-creating process develops over time. During the interaction, the actors relate the present problems and challenges to how they perceive earlier interactions. The interacting parties’ decisions, attitudes, and conduct are also influenced by prior connections and their encounters with those relationships. The current interaction is also affected by the hopes that the actors have for their interactions in the future (Hasche and Linton 2018 ).

Value creation in the Triple Helix framework is a critical area of investigation, as it helps explain how the interactions between university, industry, and government actors can lead to the generation of economic, social, and environmental benefits. Researchers have sought to explore the mechanisms and processes through which value is created and how it evolves over time in complex innovation projects involving the Triple Helix actors. Kriz et al. ( 2018 ) conducted a comprehensive study on value creation in the Triple Helix framework, examining the role of collaboration, coordination, and resource exchange in generating value. They found that the interplay of these factors can lead to value co-creation, where multiple actors work together to generate novel solutions, products, or services that benefit all parties involved. This study also highlighted the importance of understanding the dynamic nature of value creation, as it evolves over time in response to changing market conditions, technological advancements, and policy shifts. Pique et al. ( 2018 ) emphasized the need to examine value creation in the Triple Helix framework from a knowledge transfer perspective. They argued that understanding the processes and mechanisms through which knowledge is transferred between the university, industry, and government actors is essential for fostering innovation and value creation. This focus on knowledge transfer processes can provide insights into how the Triple Helix framework can enhance the effectiveness of innovation projects and drive economic and social benefits. The micro-level perspective on the Triple Helix framework and value creation research has emerged as important areas of investigation in the literature. These perspectives help show the complex interactions, collaborations, and processes that drive innovation and value creation in the university-industry-government nexus.

The micro-level perspective on the Triple Helix framework has gained attention from scholars who believe that understanding the intricacies and interactions among the university, industry, and government actors is vital for fostering innovation and value creation. A key focus of this perspective is on dynamic relationships and collaborations between these actors, as well as the institutional arrangements that enable or hinder innovation and value creation. For instance, McAdam and Debackere ( 2018 ) explored the role of boundary-spanning individuals in university-industry-government collaborations, emphasizing the importance of personal relationships and informal networks in the Triple Helix framework. Their study demonstrated that boundary-spanning individuals play a crucial role in facilitating the flow of knowledge and resources between the helices, thus enhancing innovation potential. Another example is the work of Cunningham et al. ( 2018 ), which highlighted the importance of studying the Triple Helix framework at the micro-level to uncover the processes and mechanisms that drive innovation and value creation. They argued that examining individual actors, their interactions, and the institutional context in which they operate can provide valuable insights into the functioning of the Triple Helix framework and its effectiveness in promoting innovation. Höglund and Linton ( 2018 ) focused on the role of intermediaries in the Triple Helix, demonstrating their importance in facilitating collaborations, knowledge transfer, and resource mobilization between the university, industry, and government actors. The study showed that intermediaries can help bridge gaps and overcome barriers to cooperation, ultimately contributing to value creation in the Triple Helix.

The case study approach was chosen for this research because it allows for an in-depth analysis of the dynamic relationships among the various actors in a Triple Helix model, which is an area in need of further exploration. As Miller et al. ( 2016 ) argue a more comprehensive understanding of the complex activities and interactions in a Triple Helix environment can only be achieved through a detailed, micro-level case-based research. Case studies have the potential to generate rich, context-specific insights that can help refine and advance existing scholarly understanding (Eisenhardt, 2021 ; Eisenhardt & Graebner, 2007 ; Siggelkow, 2007 ). Furthermore, a unique case study that is firmly grounded in prior research can contribute valuable conceptual and theoretical insights with broader implications beyond the specific case itself. Case-based research is particularly well-suited for examining the intricate relationships, processes, and dynamics within a Triple Helix collaboration, as it allows for a deeper understanding of the contextual factors, stakeholder perspectives, and evolving interactions that shape the collaboration’s outcomes. By adopting a qualitative case study approach, researchers can explore the interpretative aspects of the case, capturing the nuances, contingencies, and complexities that underlie Triple Helix collaborations, and addressing gaps in the existing literature (Siggelkow, 2007 ). Moreover, the case study method enables researchers to triangulate data from various sources, such as interviews, documents, and observations, providing a more robust and comprehensive understanding of the phenomena under investigation (Abdalla et al., 2018 ; Eisenhardt, 1989 ). This multi-faceted approach to data collection allows researchers to capture different stakeholder perspectives and experiences, offering a more holistic understanding of the Triple Helix collaboration in question. While the qualitative case study approach offers in-depth insights, its limitations include potential biases and a lower degree of generalizability. However, our rigorous process of data triangulation and engagement with multiple stakeholders helps to mitigate these concerns and enhances the validity of our findings.

Robotdalen in Sweden was examined, an initiative in the Mälardalen region that encompasses the three counties of Västmanland, Sörmland, and Örebro. Robotdalen was selected as a case because it could offer additional information and deepen our understanding of how a Triple Helix initiative can collaborate with stakeholders in the Triple Helix. Robotdalen represents a rather unique and successful partnership among various organizations, including universities, large corporations, SMEs, national government, and regional and local governments. Robotdalen was established in 2003 in an area where large companies have long utilized robots, such as the industrial robot manufacturer ABB.

To develop an understanding of how Robotdalen interacts with Triple Helix stakeholders, a qualitative research method was employed, adopting a case study approach that focuses on the interpretative aspects of the case and addresses gaps in the literature (cf. Siggelkow, 2007 ). The top management team members, including the general manager and deputy general manager, were interviewed annually between 2015 and 2019 to collect empirical data. In addition to formal interviews, several meetings were also held over the years with various Robotdalen employees, board members, funding organizations, and business partners.

The researcher conducted the interviews and participated solely as a researcher. A total of 46 interviews were conducted with various stakeholders involved with Robotdalen. This included the management team of Robotdalen, board members, funding bodies, top-level regional representatives, high-ranking local politicians, university representatives like vice-chancellors and heads of departments, as well as industry representatives such as top-level managers and owners/CEOs of startups. Each interview session lasted between 45 to 90 min, with all interactions being recorded and subsequently transcribed for in-depth analysis.

Instead of a strict set of questions, our interviews revolved around central themes, largely inspired by an interview guide. For instance, participants were prompted with overarching queries such as “Describe your engagement with Robotdalen,” “How crucial is the collaboration with Robotdalen for your organization?” and “In the absence of Robotdalen, how would that impact your operations?” These themes helped delve deep into the dynamics of their collaboration with Robotdalen, capturing both the essence and nuances of value creation within the Triple Helix collaborations. Such open-ended, thematic questions allowed the participants to provide detailed insights, recount experiences, and offer their perspectives on the evolving nature of their relationship with Robotdalen. This approach, we believe, garnered richer and more nuanced data, capturing the essence of value creation in the context of Triple Helix collaborations.

In addition to these interviews, our research was further supplemented with a range of secondary data sources. This included previous research reports on Robotdalen, evaluations of the initiative, annual reports, PowerPoint presentations from board meetings and other various meetings and interactions, meeting agendas, and publicly available resources like websites and press releases. The integration of both primary and secondary data sources ensured a holistic understanding of Robotdalen’s value creation dynamics within the Triple Helix model.

Analytical Process

The analytical process started with sorting interviews and documents and constructing a timeline of Robotdalen. After an overall timeline was constructed some more specific time periods could be identified in the data where the data showed specific focus or themes. Therefore, four time periods were defined. Thereafter each time period was separately analyzed and the most important events for each time period were highlighted and written up. Here empirical labels were found such as “international evaluation,” “shift towards commercialization,” and “change of core areas.” Once these first-order concepts were in place and the timeline seemed complete, the timeline was discussed with the management of Robotdalen. They gave some minor feedback and minor revisions were made to the timeline. Thereafter, the timeline was also sent and discussed with the major funder to get another perspective on the timeline. At this stage, there were no suggestions for revisions. The next step in the analytical process was to go back and forth between the theoretical framework and the written-up case, even in more detail than previously (Eisenhardt, 2021 ). This shifted the focus to include more about the hybrid organization of Robotdalen than was thought of beforehand, as it became apparent how important this hybrid organization was for the development of Robotdalen as an initiative. Second-order constructs were starting to form as the research continued the process of back and forth between theory and empirical case. In addition to the hybrid organization, collaboration, commercialization, innovation, and adaptation were second-order constructs that were found to be important. The iterative process and the ability to go back and forth to, for example, the management of Robotdalen helped validate and triangulate data from different sources. Key informants were also used to ensure the accuracy and credibility of the results.

Case Description: Robotdalen

Robotdalen stands as a symbol of structured collaboration, drawing inspiration from the Triple Helix model. Established in 2003 in Sweden’s Mälardalen region, this initiative converges academic, industry, and governmental entities, all striving for cutting-edge robotic advancements. Two regional universities, Mälardalen and Örebro, offer academic expertise. Global entities, like ABB and Volvo CE, alongside numerous SMEs, drive the industrial components. Governance is shaped by regional and local governments, complemented by multiple municipalities and hospitals that channel healthcare-oriented innovation.

Housed within Mälardalen University, Robotdalen spans three counties and numerous municipalities. It is financially sustained by the VINNVÄXT initiative by Vinnova which is Sweden’s innovation agency, receiving an annual allocation of approximately 1 million euros over a 10-year period which has to be matched with funding from local and regional governments, alongside firms operating within the region. Thus, it is about 2 million euros in total financing over a 10-year period. European alignments, notably with the European Union’s smart specialization strategy and the European Regional Development Fund, amplify Robotdalen’s reach. International affiliations with initiatives like euRobotics and SPARC further its global footprint.

Robotdalen operates as a non-profit initiative, not a traditional company, with a CEO and deputy CEO overseeing daily operations. Governance resides with a board comprising representatives from key stakeholders such as local government, academia, and industry partners. This board determines strategic direction and investment priorities, reflecting the collaborative nature of the initiative. Ownership, in traditional terms, does not apply; rather, it is a collaboration of various actors, each contributing resources and expertise. By 2017’s close, Robotdalen had fostered the creation of 45 products and 28 firms, much based on its collaborations.

Case Analysis

Starting up and finding a structure — 2003–2007, starting up.

Robotdalen started its operations in 2003 with a vision of regional growth in Mälardalen and world-class research. The aim was first and foremost to bring out new products, start new companies, and work for world-class research. The first years dealt a lot with building up the organization and partners. A hybrid organization was formed. The intent was to have a free-standing organization but, in the end, it became an entity under Mälardalen University. Robotdalen selected four core focus areas which were industrial robotics, field robotics, robotics for health and care as well as technology and knowledge dissemination. In the beginning, Robotdalen also searched for how an organization like Robotdalen could function effectively. The management and organization of Robotdalen initially had difficulty finding the right structures and people, which led to some turnover in the management in the first few years.

The Robots to a Thousand Project

In 2004, Robotdalen started a project called Robot to a Thousand, which is about seeing what can be automated and robotized for small and medium-sized companies. Small and medium-sized companies often have a lack of knowledge about automation through robotics, but through the project, studies are carried out at companies about the possibility of automation with investment costs taken into account. The studies result in concrete proposals on how the company can invest and have had an implementation rate of around 50%. The project delivered around 300 studies since 2004 and continued as part of Robotdalen, albeit on a somewhat smaller scale and under the name PILAR (pilot project Automation Challenge in the Robotliftet). Robot till tusen and now PILAR has not only benefited companies and the business world in general through robot solutions and investments but has also been an important way for Robotdalen to collaborate with, among others, MDH and Örebro University. Through this project, students have had the opportunity to practice converting theoretical knowledge into practical and important studies in specific courses in the education program. Since 2004, the project has thus been an important means of creating collaboration in Mälardalen between higher education and business, where Robotdalen had a central and coordinating role with direct positive effects for business and a useful experience for the many students who got to work with the pre-studies.

Focus on Research

In 2007, the Robotdalen Scientific Award was introduced, which is an award that can be applied for by young researchers or doctoral students from all over the world. The winning young researchers must have ground-breaking ideas and exceptional talents. The winner in 2007 received a winning sum of 20,000 Euro, financed by Robotdalen. This initiative can be seen as a clear reflection of the focus on research that Robotdalen has during the first period, from the start until about 2010. During this period, research and collaboration with the universities is an important and prioritized activity. A large part of Robotdalen’s resources goes to collaborating around research at MDH and Örebro University. One example of a research project is the Friction project Stir Robot Welding (FSW) which was carried out in collaboration with AASS at Örebro University, Esab, ABB, and Specma. The project intended to robotize friction welding where copper and aluminum can be joined together. Another research-based project during the years 2005–2007 was Navigation Systems for Automated Loaders (NSAL), which aimed to get mine loaders to drive completely autonomously. The project was a collaboration between Atlas Copco, Robotdalen, AASS (Örebro University), and the KK Foundation. It can therefore be stated that for this period initiatives such as the Robotdalen Scientific Award and several major research projects were a distinct focus for Robotdalen.

The Triple Helix Time Period (Approx. 2007–2011)

The Triple Helix time period is distinguished by focusing on Triple Helix collaboration, especially involving public sector collaboration more clearly.

Change of the Core Areas

Relatively early on, Robotdalen realized that the four core areas of industrial robotics, field robotics, robotics for health and care as well as technology and knowledge dissemination were too broad and that it was too difficult to manage all four in a good way. In particular, it was Technology and Knowledge Dissemination, which was about engaging children and young people in robotics, which was comparatively far from the other core areas. Technology and the spread of knowledge is one area that required completely different skills, for example, pedagogy, which is very different compared to the other three core areas where technology and development skills were important. In order to be able to work more focused, Robotdalen, therefore, decided to end this core area and focus more on competencies within technology, research, and development.

Development of Cooperation

Although there was a lot of focus on research (mainly applicable research) during the time period approx. 2007–2010, the Triple Helix idea was also an important focus area in which resources were invested. Robotdalen worked to develop the Triple Helix collaboration in different ways. Collaboration between industry and academia is fulfilled through research projects and projects such as Robot to a Thousand. However, it was not as easy to get concrete cooperation with municipalities and regions/county councils. But they tried in different ways. One example that shows that they worked to start cooperation with public actors was when Robotdalen took part in the technical convention (Teknikmässan) 2008. Then, it was not only Robots and projects that were displayed but also the municipalities that were active in Robotdalen that also came along and showcased the project together with Robotdalen.

Strong Brand

At this time, the Robotdalen brand began to grow strongly and now the brand gained traction in Mälardalen but also outside this region. One reason why Robotdalen’s brand began to gain traction was that the organization began to gain real momentum in its operations, and Robotdalen was now considered a well-functioning Triple Helix that made a difference in Mälardalen. The vision for Robotdalen during this time is still regional growth, which indicates a continued regional focus.

During this time, the goals for Robotdalen were broadened to not only focus on new products, companies, and world-class research, but a new goal is also to create and save jobs. This can be a result of several different reasons. Robotdalen is going well, which makes you feel that you can take on more. At the same time, they want to work closely with public actors, who are interested in job creation. Another factor is the financial crisis where many jobs were lost. In addition, a further goal is to create an innovation system within robotics. At the same time, the core areas that were previously restricted are being increased to now also include innovation support and logistics automation, which means that they now have five different core areas.

Regional Competence and Job Ventures

In line with the Triple Helix model and to strengthen cooperation with regions/county councils/municipalities, a new goal was to create and save jobs. In 2009, a venture was created together with ABB, VINNOVA, and the city of Västerås, which was called Växthuset. There, 12 people with important robotics competence could be retained when ABB had to lay off employees due to the financial crisis. It was important for ABB and the region not to lose this competence and they had to work with development projects that would otherwise not have been possible, which resulted in three new products and scientific articles. At the same time, Robotdalen was involved and contributed to the start of a new automation profile at Mälardalen University College and a doctoral program in robotics was also created.

Innovation System for Robotics

Creating a separate innovation system for robotics sounds like a very vast task. But in fact, the “innovation system” was more about Robotdalen becoming a center in a robotics innovation system where Robotdalen created the “Robotdalen innovation process,” partial innovation support, and a strong network of partners that collaborated with the already existing innovation system with incubators for example.

Several startups were established during this time period. For example, Robcab, a logistics robot that was intended to be used in healthcare, was getting a pilot installation. Another establishment is that of entrepreneur Steven Von Rump, who chooses to establish his new company within Robotdalen’s network. This was the first international establishment. Giraffe is a robot that makes it possible to see and talk to, for example, the elderly, but also to control the robot around the home.

Extended Operation

The time period shows expansion and broadening of targets and core areas which may be a result of things going relatively well for Robotdalen. There was room to continue the expansion by broadening the business in terms of targets and operational areas. The investment in research continues to distinguish this time period as well. Robotdalen has many collaborative projects with AASS at Örebro University. AASS and Robotdalen have several areas of focus that coincide with each other, probably a result of the fact that they have enriched each other through the many joint projects that have been carried out.

The Time Period Internationalization 2011–2014

The years 2011–2014 can be characterized by a major focus on internationalization, which largely has its background in the international evaluation that was carried out in 2010, but there where changes in the business that began first and foremost in 2011 and later. This period is also marked by the renewed financial trust that Robotdalen receives from Vinnova and the regional and local partners.

The International Evaluation

In 2010, Robotdalen was evaluated by a group of international experts in the area (Cooke et al., 2010 ). Overall, Robotdalen receives a good review from the evaluators, but they also find that Robotdalen has diversified operations and needs to focus more on niche technologies and markets. The evaluation also suggests that Robotdalen needs collaborations outside the region and that Robotdalen should also work towards becoming an internationally recognized environment.

Changes After the Evaluation

In retrospect, it can be seen that the international evaluation had a relatively large impact on how Robotdalen continued to conduct its business. The focus areas were reduced and became more focused while investing more in national and international collaborations to also achieve high international status. After this evaluation, it is also possible to discern a strategy that the evaluators did not directly present in their reports, but which was possibly a result of analytical strategy work in connection with the change work. Robotdalen changes its vision to “enable commercial success.” Robotdalen changes and chooses to focus more on products and succeeds in commercializing various projects.

In line with the evaluation that recommended focusing on fewer areas and also more niche areas, the core areas for Robotdalen are now changing once again. The focus will now be on the three core areas of industrial robotics, field robotics, and health robotics.

Shift Toward Commercialization

The research that has been an important part of Robotdalen is beginning to fall back and no longer has such a central role. Robotdalen wants to see more concrete results in the form of products on the market and new companies. Research has not delivered enough products and companies to the market. Robotdalen, therefore, chooses to instead focus more directly on the commercialization of products and the creation of new companies. A clear reflection of this time period is how the Robotdalen Scientific Award, which has been awarded to young researchers, changes its name to the Robotdalen Innovation Award. Instead of a prize for researchers, there will now be a prize awarded to entrepreneurs, innovators, and startups or people with solutions that can be commercialized. Broadly speaking, resources are being shifted from research to commercialization.

Internationalization of Robotdalen

In order to increase internationalization, Robotdalen works in different ways to gain international impact as well. Among other things, they create the Robotdalen innovation challenge—an international event to discuss challenges and opportunities in the commercialization of robotics to show that Robotdalen is a meeting place for robotics. Robotdalen is now also starting to become more active in EU projects and is more actively looking for international business partners, including Japan and the USA. One example is the Japanese company Cyberdune, which moved to Västerås in 2012 to establish itself within Robotdalen.

The previous idea of creating a separate innovation system has now been reformulated, and it is now emphasized that it is in collaboration with other actors that Robotdalen is an important part of the innovation system. Instead, Robotdalen’s role is to develop a well-functioning idea development and commercialization process. In-depth cooperation with regions, county councils, and municipalities leads to several physical locations. One example is the Ängen testbed in Örebro through Örebro Science Park, the Robot Application Center (RAC) in Munktell Science Park in Eskilstuna, and the Automation Center in Västerås. All these physical locations are the results of collaboration in the regional innovation system. In addition, they also managed to get funding for Technology for Independent Life (T4IL) to invest even more in health robotics.

The Time Period Commercialization 2014–2020

Although commercialization was focused on in the previous period, commercialization becomes even more emphasized in this time period. There is also a change in the commercialization aspect by not only focusing on commercializing products but also commercializing Robotdalen as an organization. This time period is also marked by the fact that the VINNVÄXT program will be phased out and end in 2019, which will lead to several major changes to the organization.

New Board and the New Robotdalen

Robotdalen needs to renew the board and bring in other skills. The management of Robotdalen argues for a renewal of the board which is based on getting the right competence into the board. In the past, the board has primarily had representatives from the major actors and financiers of Robotdalen. In the later part of the time period, the direction of strategy/focus changes to applying existing and new robotics to new areas of application. One project that reflects this focus is Våroffer, which is a very different project for Robotdalen. It is a performance where the dancer and choreographer Fredrik “Benke” Rydman dances with an industrial robot at the Kulturhuset in Stockholm. Robotdalen’s task is to program a large industrial robot from ABB to dance.

A few years into this period, it is clear that Vinnova, which has already extended Robotdalen’s VINNVÄXT investment, will not extend the funding, but will instead step down the funding with an end in June 2019. This message was clearly not what Robotdalen’s management wanted, and it meant great demands on changing an organization that has been built up around generous funding over many years. This became a big challenge for the management to come up with a plan for how Robotdalen could continue. Robotdalen’s strategy to survive is to change the business to become more commercial. By becoming more commercial, Robotdalen can continue to use the resources, networks, and especially the knowledge that has been created. The management succeeded in reorganizing the business to survive without the base funding from Vinnova. The management also expressed the change that the lost funding has contributed to in a cautiously positive spirit, and there is a strong belief that Robotdalen will be able to continue to live on in a good but a different way in the future.

Research and Development

Robotdalen sees increased interest from external stakeholders to buy research and development assignments from Robotdalen. It is Robotdalen’s unique domain expertise that is sought after. Robotdalen is starting several large collaborative projects with major industrial partners, such as Skanska and Volvo. These companies do not themselves have sufficient competence in robotics but make use of the knowledge available within Robotdalen and its network.

From Internationalization Back to Regionalization

If the previous time period involved a great focus on internationalization, then during the last time period you can see that internationalization is something that has become less interesting in recent years. Robotdalen continued with internationalization, but it was no longer the same priority, even if, for example, they started cooperation with investors in China. The fact that the international initiatives within Robotdalen are reduced can be attributed to the fact that the local and regional actors were not as interested in seeing Robotdalen invest in international projects and initiatives. These actors are more interested in that funding, and especially the funding that they themselves contribute stays within the region. During this time, the local and regional financiers become even more important than they were before since Robotdalen will in the future get its basic funding from the local and regional ones when the VINNVÄXT investment ends. Another factor that came into play at the same time is that it was not easy to switch from a regional and national center to becoming a European and international center. One example of the regional focus is establishing the Collaborative Robot Test Center (CRTC) in Västerås to find new processes and applications with new technology within the Swedish manufacturing industry. The operation is partly financed by Vinnova and ABB.

Örebro Going in its Own Direction

Örebro municipality, region Örebro, and Örebro University choose to end their collaboration with Robotdalen. The obvious reaction and conclusion can easily be taken to mean that Robotdalen failed with Örebro. But this defection can be interpreted in several ways. Robotdalen’s focus in Örebro has primarily been linked to Örebro University and the research group AASS. This research group was growing stronger and stronger. In 2019, AASS consists of over 60 employees (professors, lecturers, doctoral students, and postdoctoral fellows), which means that the research group will be very large and can act more independently. It may be that AASS is no longer interested in being “under” Robotdalen but instead wants to become more independent, especially since Robotdalen does not invest as much in research at the academies. This in turn can be interpreted as Robotdalen having played out its role, but it can also be argued that Robotdalen has succeeded in being involved and creating, or at least contributed to AASS which can now continue to live independently. Furthermore, it was also known that the larger funding from Vinnova would not be available anymore for Robotdalen when the decision was made to go our separate ways. In the local and regional innovation systems, there is always fierce competition for funding between actors in the local innovation system. Reducing the number of actors, especially when Robotdalen will no longer contribute as generously, makes it easier for the local and regional funders to focus their funding efforts.

The analysis of Robotdalen’s value creation, organized around key themes, and the examination of its hybrid organization provides a comprehensive understanding of the dynamic nature of the Triple Helix model and the importance of collaboration, commercialization, innovation, and adaptation in fostering regional growth and value creation (Table  3 ). The case of Robotdalen extends the Triple Helix model by illustrating how hybrid organizations can act as central nodes in the innovation networks. This case highlights that a single entity can streamline collaboration, knowledge exchange, and commercialization processes among the three helices, thus acting as both a participant and a facilitator within the innovation ecosystem.

Throughout the various periods, the collaboration between academia, industry, and the government played a crucial role in Robotdalen’s success. This collaboration was facilitated by the hybrid organization, which served as a boundary-spanning entity, connecting the various actors and enabling the flow of knowledge and resources among them. Collaborative initiatives like the creation and saving of jobs, the establishment of new companies, and partnerships with academic institutions like AASS at Örebro University exemplify the power of collaboration in driving value creation. By leveraging the strengths of each sector and the coordination provided by the hybrid organization, Robotdalen was able to create a synergistic environment conducive to innovation and growth. This finding illustrates the evolution of value-creation activities within the Triple Helix model, showing how a strong collaboration between academia, industry, and government contributes to the development of a hybrid organization like Robotdalen.

The focus on commercialization proved to be vital in ensuring the practical application of research and development efforts. The hybrid organization’s role as an intermediary allowed Robotdalen to effectively connect research with industry partners, resulting in the successful translation of research into tangible products and services. By bringing new products and services to market, Robotdalen generated economic value for the region and contributed to its reputation as a leader in robotics and automation. Startups like Robcab and Giraff demonstrate the successful combination of research, collaboration and the facilitative role of the hybrid organization in commercialization efforts. The successful commercialization efforts of Robotdalen, facilitated by the hybrid organization, demonstrate how value creation within the Triple Helix model evolves over time, with the hybrid organization playing a critical role in connecting research and industry partners.

Central to Robotdalen’s value creation was the fostering of an innovation system for robotics, leveraging the Triple Helix model to drive advancements in research, products, and services. The hybrid organization played a critical role in coordinating the efforts of academia, industry, and government, creating a collaborative ecosystem that promoted the exchange of ideas, resources, and expertise. This ecosystem, combined with initiatives like the Robotdalen Scientific Award and various collaborative research projects, allowed Robotdalen to consistently produce world-class research and innovations.

Robotdalen’s ability to adapt to changing circumstances and recommendations throughout different time periods played a significant role in its sustained value creation. The hybrid organization’s adaptability and resilience enabled it to respond effectively to shifts in stakeholder expectations and the availability of funding. By focusing on specific market segments, increasing internationalization, and forging new partnerships outside Sweden, Robotdalen was able to maintain its competitive edge and continually deliver value despite shifting conditions. Similarly, in the 2014–2019 period, when national funding was ending, Robotdalen shifted its focus towards commercialization and consulting-based operations to finance its operations, further highlighting the importance of adaptability in the Triple Helix model. Robotdalen’s adaptability to changing circumstances highlights the significance of the hybrid organization’s role in value creation at different stages of development within the Triple Helix model.

By examining the key themes of collaboration, commercialization, innovation, and adaptation, and analyzing the role of the hybrid organization in facilitating and managing the Triple Helix collaboration, we gain a deeper understanding of the factors contributing to Robotdalen’s successful value creation within the Triple Helix model. This integrated approach underscores the importance of fostering strong relationships among the three sectors, driving innovation, and adapting to changing conditions to ensure long-term growth and success, with the hybrid organization playing an important role in this process.

The study contributes to the Triple Helix literature by providing a fine-grained analysis of the dynamics of value creation in a large Triple Helix innovation project, which was previously underexplored. The key themes of collaboration, commercialization, innovation, and adaptation that emerged from the Robotdalen case expand our understanding of how these factors interplay within the Triple Helix model. The emphasis on collaboration in the Robotdalen case supports the existing notion that synergistic interactions between academia, industry, and government are crucial for fostering innovation (Etzkowitz & Leydesdorff, 2000 ). Moreover, the findings on the role of the hybrid organization in facilitating collaboration and resource exchange provide empirical evidence that supports the theoretical proposition of boundary-spanning entities being essential for effective Triple Helix interactions (Carayannis & Campbell, 2009 ). The Robotdalen case sheds light on the importance of adaptation in sustaining value creation within the Triple Helix model, which has not been extensively discussed in previous studies. Our findings suggest that the ability of hybrid organizations to adapt to changing circumstances and recommendations is a crucial factor in ensuring their long-term success and relevance within the Triple Helix context.

Conclusions

This study examined the evolution of Robotdalen over 16 years, with a particular focus on the dynamics of value creation within a Triple Helix model, encompassing academia, industry, and government. The research identified how value-creation activities shifted and adapted over time, and how different types of capabilities were essential for the creation of expected value at different stages of the organization’s development. This study answers the research question by demonstrating the evolution of value-creation activities within the Triple Helix model and the role hybrid organizations, such as Robotdalen, play in value creation at various stages of an organization's development.

The study revealed that collaboration among academia, industry, and the government is essential for driving value creation and regional growth. The hybrid organization, Robotdalen, played a crucial role in facilitating these collaborations and enabling the flow of knowledge and resources among the different actors. By leveraging the strengths of each sector and the coordination provided by the hybrid organization, Robotdalen created a synergistic environment conducive to innovation and growth. Furthermore, the focus on commercialization, supported by the hybrid organization’s role as an intermediary, allowed for the successful translation of research and development efforts into tangible products and services, generating economic value for the region. The ability to adapt to changing circumstances and recommendations emerged as another critical factor in Robotdalen’s sustained value creation.

Despite the insights gained from this research, there are some limitations to consider. The study focused primarily on a single case, which may limit the generalizability of the findings. Additionally, the retrospective nature of the analysis might introduce potential biases in the interpretation of the data. Future research directions could include comparative studies of multiple hybrid organizations operating within the Triple Helix model to further explain the factors contributing to successful value creation. Moreover, future research could also explore the influence of cultural, political, and economic contexts on the functioning of the Triple Helix model and the effectiveness of hybrid organizations in different regions.

This research has contributed valuable insights into the role of the Triple Helix model and hybrid organizations in fostering innovation and value creation. By examining the key themes of collaboration, commercialization, innovation, and adaptation, and analyzing the role of the hybrid organization, Robotdalen, the study offers a deeper understanding of the factors contributing to successful value creation within the Triple Helix model. The findings of this research provide valuable insights for policymakers and managers seeking to enhance innovation and value creation by applying the Triple Helix model. For policymakers, the results emphasize the need to establish and support boundary-spanning hybrid organizations that can facilitate collaboration among academia, industry, and government, and effectively coordinate resources and knowledge flow between them. Policymakers should prioritize the development of policies and funding mechanisms that enable these hybrid organizations to thrive and adapt to changing circumstances.

Aarikka-Stenroos, L., & Ritala, P. (2017). Network management in the era of ecosystems: Systematic review and management framework. Industrial Marketing Management, 67 , 23–36. https://doi.org/10.1016/j.indmarman.2017.08.010

Article   Google Scholar  

Abdalla, M. M., Oliveira, L. G. L., Azevedo, C. E. F., & Gonzalez, R. K. (2018). Quality in qualitative organizational research: Types of triangulation as a methodological alternative. Administração: ensino e pesquisa , 19 (1). https://www.redalyc.org/journal/5335/533556821002/533556821002.pdf . Accessed 15 December 2023.

Afonso, O., Monteiro, S., & Thompson, M. (2012). A growth model for the quadruple helix. Journal of Business Economics and Management, 13 (5), 849–865.

Amaral, M., & Cai, Y. (2023). A decade of Triple Helix journal – Achievements and challenges. Triple Helix, 9 (3), 239–243. https://doi.org/10.1163/21971927-12340008

Audretsch, D. B. (2014). From the entrepreneurial university to the university for the entrepreneurial society. The Journal of Technology Transfer, 39 (3), 313–321. https://doi.org/10.1007/s10961-012-9288-1

Bøllingtoft, A. (2012). The bottom-up business incubator: Leverage to networking and cooperation practices in a self-generated, entrepreneurial-enabled environment. Technovation, 32 (5), 304–315.

Bolzani, D., Munari, F., Rasmussen, E., & Toschi, L. (2021). Technology transfer offices as providers of science and technology entrepreneurship education. The Journal of Technology Transfer, 46 , 335–365. Chicago

Cai, Y. (2022). Neo-Triple Helix model of innovation ecosystems: Integrating triple, quadruple and quintuple helix models. Triple Helix, 9 (1), 76–106. https://doi.org/10.1163/21971927-bja10029

Cai, Y., & Etzkowitz, H. (2020). Theorizing the Triple Helix model: Past, present, and future. Triple Helix, 7 (2–3), 189–226. https://doi.org/10.1163/21971927-bja10003

Carayannis, E. G., & Campbell, D. F. J. (2009). “Mode 3” and “Quadruple Helix”: Toward a 21st century fractal innovation ecosystem. International Journal of Technology Management, 46 (3–4), 201–234. https://doi.org/10.1504/IJTM.2009.023374

Carayannis, E. G., & Campbell, D. F. J. (2010). Triple Helix, Quadruple Helix and Quintuple Helix and how do knowledge, innovation and the environment relate to each other? : A proposed framework for a trans-disciplinary analysis of sustainable development and social ecology. International Journal of Social Ecology and Sustainable Development (IJSESD), 1 (1), 41–69. https://doi.org/10.4018/jsesd.2010010105

Carayannis, E. G., & Rakhmatullin, R. (2014). The quadruple/quintuple innovation helixes and smart specialisation strategies for sustainable and inclusive growth in Europe and beyond. Journal of the Knowledge Economy, 5 (2), 212–239. https://doi.org/10.1007/s13132-014-0185-8

Cerver Romero, E., Ferreira, J. J. M., & Fernandes, C. I. (2021). The multiple faces of the entrepreneurial university: A review of the prevailing theoretical approaches. The Journal of Technology Transfer, 46 (4), 1173–1195. https://doi.org/10.1007/s10961-020-09815-4

Champenois, C., & Etzkowitz, H. (2017). From boundary line to boundary space: The creation of hybrid organizations as a Triple Helix micro-foundation. Technovation . https://doi.org/10.1016/j.technovation.2017.11.002

Cook, K. S., & Emerson, R. M. (1978). Power, equity and commitment in exchange networks. American Sociological Review, 43 (5), Article 5.  https://doi.org/10.2307/2094546

Cooke, P., Eickelpasch, A., & Ffowcs-Williams, I. (2010). From low hanging fruit to strategic growth: International evaluation of robotdalen, skåne food innovation network and Uppsala BIO. VINNOVA report VR 2010:16 (p. 44). VINNOVA.

Cunningham, J. A., Menter, M., & O’Kane, C. (2018). Value creation in the quadruple helix: A micro level conceptual model of principal investigators as value creators. R&D Management, 48 (1), 136–147. https://doi.org/10.1111/radm.12310

de Lima Figueiredo, N., Fernandes, C. I., & Abrantes, J. L. (2022). Triple Helix Model: Cooperation in knowledge creation. Journal of the Knowledge Economy . https://doi.org/10.1007/s13132-022-00930-1

Dolan, B., Cunningham, J. A., Menter, M., & McGregor, C. (2019). The role and function of cooperative research centers in entrepreneurial universities: A micro level perspective. Management Decision, 57 (12), 3406–3425.

Edquist, C. (2011). Design of innovation policy through diagnostic analysis: Identification of systemic problems (or failures). Industrial and Corporate Change, 20 (6), 1725–1753. https://doi.org/10.1093/icc/dtr060

Eisenhardt, K. M. (1989). Building theories from case study research. Academy of Management Review, 14 (4), 532–550.

Eisenhardt, K. M. (2021). What is the Eisenhardt Method, really? Strategic Organization, 19 (1), 147–160. https://doi.org/10.1177/1476127020982866

Eisenhardt, K. M., & Graebner, M. E. (2007). Theory building from cases: Opportunities and challenges. Academy of Management Journal, 50 (1), 25–32.

Etzkowitz, H., & Brisolla, S. N. (1999). Failure and success: The fate of industrial policy in Latin America and South East Asia. Research Policy, 28 (4), 337–350. https://doi.org/10.1016/S0048-7333(98)00077-8

Etzkowitz, H., & Leydesdorff, L. (2000). The dynamics of innovation: From national systems and “Mode 2” to a Triple Helix of university–industry–government relations. Research Policy, 29 (2), 109–123. https://doi.org/10.1016/S0048-7333(99)00055-4

Farinha, L., Ferreira, J., & Gouveia, B. (2016). Networks of innovation and competitiveness: A Triple Helix case study. Journal of the Knowledge Economy, 7 (1), 259–275. https://doi.org/10.1007/s13132-014-0218-3

Feola, R., Parente, R., & Cucino, V. (2021). The Entrepreneurial University: How to develop the entrepreneurial orientation of academia. Journal of the Knowledge Economy, 12 (4), 1787–1808. https://doi.org/10.1007/s13132-020-00675-9

Galvao, A., Mascarenhas, C., Marques, C., Ferreira, J., & Ratten, V. (2019). Triple helix and its evolution: A systematic literature review. Journal of Science and Technology Policy Management, 10 (3), 812–833. https://doi.org/10.1108/JSTPM-10-2018-0103

Guerrero, M., & Urbano, D. (2012). The development of an entrepreneurial university. The Journal of Technology Transfer, 37 (1), 43–74. https://doi.org/10.1007/s10961-010-9171-x

Hackett, S. M., & Dilts, D. M. (2004). A systematic review of business incubation research. The Journal of Technology Transfer, 29 (1), 55–82.

Hakeem, M. M., Goi, H. C., Frendy, & Ito, H. (2023). Regional sustainable development using a Quadruple Helix approach in Japan. Regional Studies, Regional Science, 10 (1), 119–138. https://doi.org/10.1080/21681376.2023.2171313

Hasche, N. (2013). Value co-creating processes in international business relationships: Three empirical studies of cooperation between Chinese customers and Swedish suppliers. Örebro Universitet.

Hasche, N., & Linton, G. (2018). The value of failed relationships for the development of a Medtech start-up. Journal of Small Business & Entrepreneurship, 30 (1), Article 1. https://doi.org/10.1080/08276331.2017.1388953

Hasche, N., Höglund, L., & Linton, G. (2020). Quadruple helix as a network of relationships: Creating value within a Swedish regional innovation system. Journal of Small Business & Entrepreneurship, 32 (6), 523–544. https://doi.org/10.1080/08276331.2019.1643134

Hausberg, J. P., & Korreck, S. (2021). Business incubators and accelerators: A co-citation analysis-based, systematic literature review (pp. 39–63). Edward Elgar Publishing.

Hodge, G. A., & Greve, C. (2007). Public–private partnerships: An international performance review. Public Administration Review, 67 (3), 545–558.

Hodge, G. A., & Greve, C. (2017). On public–private partnership performance: A contemporary review. Public Works Management & Policy, 22 (1), 55–78.

Höglund, L., & Linton, G. (2018). Smart specialization in regional innovation systems: A quadruple helix perspective. R&D Management, 48 (1), 60–72. https://doi.org/10.1111/radm.12306

Howells, J. (2006). Intermediation and the role of intermediaries in innovation. Research Policy, 35 (5), 715–728. https://doi.org/10.1016/j.respol.2006.03.005

Hearn, G., & Pace, C. (2006). Value‐creating ecologies: Understanding next generation business systems. Foresight, 8 (1), 55–65.

Johannisson, B., & Nilsson, A. (1989). Community entrepreneurs: Networking for local development. Entrepreneurship & regional development, 1 (1), 3–19.

Kriz, A., Bankins, S., & Molloy, C. (2018). Readying a region: Temporally exploring the development of an Australian regional quadruple helix. R&D Management, 48 (1), Article 1. https://doi.org/10.1111/radm.12294

Lecluyse, L., Knockaert, M., & Spithoven, A. (2019). The contribution of science parks: A literature review and future research agenda. The Journal of Technology Transfer, 44 , 559–595.

Leydesdorff, L. (2012). The Triple Helix, Quadruple Helix, …, and an N-Tuple of helices: Explanatory models for analyzing the knowledge-based economy? Journal of the Knowledge Economy, 3 (1), 25–35. https://doi.org/10.1007/s13132-011-0049-4

Leydesdorff, L., & Meyer, M. (2006). Triple helix indicators of knowledge-based innovation systems: Introduction to the special issue. Research Policy, 35 (10), 1441–1449.

Leydesdorff, L., & Smith, H. L. (2022). Triple, quadruple, and higher-order helices: Historical phenomena and (neo-)evolutionary models. Triple Helix, 9 (1), 6–31. https://doi.org/10.1163/21971927-bja10022

Lin, M. W., & Bozeman, B. (2006). Researchers’ industry experience and productivity in university–industry research centers: A “scientific and technical human capital” explanation. The Journal of Technology Transfer, 31 , 269–290.

Mair, J., Mayer, J., & Lutz, E. (2015). Navigating institutional plurality: Organizational governance in hybrid organizations. Organization Studies, 36 (6), 713–739. https://doi.org/10.1177/0170840615580007

McAdam, M., & Debackere, K. (2018). Beyond ‘triple helix’ toward ‘quadruple helix’ models in regional innovation systems: Implications for theory and practice. R &D Management, 48 (1), Article 1. https://doi.org/10.1111/radm.12309

McAdam, R., Miller, K., McAdam, M., & Teague, S. (2012). The development of university technology transfer stakeholder relationships at a regional level: Lessons for the future. Technovation, 32 (1), 57–67. https://doi.org/10.1016/j.technovation.2011.08.001

Meyer, M., Kuusisto, J., Grant, K., De Silva, M., Flowers, S., & Choksy, U. (2019). Towards new triple helix organisations? A comparative study of competence centres as knowledge, consensus and innovation spaces. R&D Management, 49 (4), 555–573.

Miller, K., McAdam, R., Moffett, S., Alexander, A., & Puthusserry, P. (2016). Knowledge transfer in university quadruple helix ecosystems: An absorptive capacity perspective. R&D Management, 46 (2), 383–399. https://doi.org/10.1111/radm.12182

Ranga, M., & Etzkowitz, H. (2013). Triple Helix Systems: An analytical framework for innovation policy and practice in the knowledge society. Industry and Higher Education, 27 (4), 237–262.

Payne, A. F., Storbacka, K., & Frow, P. (2008). Managing the co-creation of value. Journal of the Academy of Marketing Science, 36 (1), Article 1. https://doi.org/10.1007/s11747-007-0070-0

Perkmann, M., Tartari, V., McKelvey, M., Autio, E., Broström, A., D’este, P., ... & Sobrero, M. (2013). Academic engagement and commercialisation: A review of the literature on university–industry relations. Research Policy, 42 (2), 423–442.

Pique, J. M., Berbegal-Mirabent, J., & Etzkowitz, H. (2018). Triple helix and the evolution of ecosystems of innovation: The case of silicon valley. Triple Helix, 5 (1), 1–21. https://doi.org/10.1186/s40604-018-0060-x

Siegel, D. S., Waldman, D. A., Atwater, L. E., & Link, A. N. (2003). Commercial knowledge transfers from universities to firms: Improving the effectiveness of university–industry collaboration. The Journal of High Technology Management Research, 14 (1), 111–133.

Siggelkow, N. (2007). Persuasion with case studies. Academy of Management Journal, 50 (1), 20–24.

van der Haar, J. W., Kemp, R. G. M., & Omta, O. (2001). Creating value that cannot be copied. Industrial Marketing Management, 30 (8), Article 8. https://doi.org/10.1016/S0019-8501(99)00128-5

Xue, L., & Gao, Y. (2022). From modeling the interactions among Institutions to modeling the evolution of an ecosystem: A Reflection on the Triple Helix Model and Beyond. Triple Helix, 9 (1), 54–64. https://doi.org/10.1163/21971927-bja10027

Yin, R. K. (2009). Case study research: Design and methods (4th ed.). Sage Publications.

Youtie, J., & Shapira, P. (2008). Building an innovation hub: A case study of the transformation of university roles in regional technological and economic development. Research policy, 37 (8), 1188–1204.

Download references

Open access funding provided by Inland Norway University Of Applied Sciences

Author information

Authors and affiliations.

CREDS - Center for Research on Digitalization and Sustainability, Inland Norway University of Applied Sciences, Innlandet, Norway

Gabriel Linton

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Gabriel Linton .

Ethics declarations

Conflict of interest.

The author declares no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Linton, G. Triple Helix Dynamics and Hybrid Organizations: An Analysis of Value Creation Processes. J Knowl Econ (2024). https://doi.org/10.1007/s13132-024-01911-2

Download citation

Received : 29 March 2023

Accepted : 14 March 2024

Published : 22 April 2024

DOI : https://doi.org/10.1007/s13132-024-01911-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Triple Helix Model
  • Hybrid Organizations
  • Value Creation
  • Innovation Ecosystems
  • Academic Entrepreneurship
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Research Highlights

How to identify good coupling methods with error analysis.

The simulation of complex systems like the global atmosphere requires coupling of many physics processes. These include, for example, atmospheric motions, cloud and rain formation processes, as well as pollutant transport and transformation. Atmospheric models typically couple these processes by adopting methods found in earlier models or textbooks, but the implications of the methods and their impacts on the simulation results are often unclear. This study introduces a mathematically rigorous analysis framework that provides insight into the features and implications of coupling method choices. As an example, the framework is applied to the Energy Exascale Earth System Model version 1 (E3SMv1) to evaluate two coupling methods used for aerosols.

The framework enables scientists to evaluate the impacts of coupling method choices without implementing all possible methods or deriving lengthy mathematical expressions tailored to each method’s details. Furthermore, the framework provides an accuracy assessment for each interacting process. This can help avoid incidental error cancelation and increase confidence in estimating the overall accuracy of a numerical solution.

The need to choose, update, and evaluate coupling methods during the development of global atmospheric models for weather, climate, and Earth system prediction prompted researchers to launch this study. Many of the existing mathematical analyses focused on details of specific physics problems and numerical models; some analyses were more general but produced mathematical expressions that were hard to interpret by physical scientists. This study introduces an analysis framework that is general and intuitive to use. The framework ignores some details of the mathematical formulation and numerical algorithms used for each interacting process and focuses on coupling issues. The study starts by describing two fundamental sources of numerical error as two processes are coupled. It then demonstrates how these insights can be applied through simple arithmetic to derive numerical errors in multi-process problems. Two coupling methods used for the aerosol life cycles in E3SMv1 are analyzed as a concrete example of applying the tool.

The computational resources used in this study were provided by the National Energy Research Scientific Computing Center, a U.S. Department of Energy (DOE) Office of Science user facility located at Lawrence Berkeley National Laboratory, and by the Compy supercomputer operated for DOE by Pacific Northwest National Laboratory.

Numerical coupling of aerosol emissions, dry removal, and turbulent mixing in the E3SM Atmosphere Model version 1 (EAMv1) – Part 2: A semi-discrete error analysis framework for assessing coupling schemes Vogl, Christopher J., Hui Wan, Carol S. Woodward, and Quan M. Bui . 2024 . “Numerical Coupling Of Aerosol Emissions, Dry Removal, And Turbulent Mixing In The E3Sm Atmosphere Model Version 1 (Eamv1) – Part 2: A Semi-Discrete Error Analysis Framework For Assessing Coupling Schemes” . Geoscientific Model Development 17 (3). Copernicus GmbH : 1409-1428. doi:10.5194/gmd-17-1409-2024.

IMAGES

  1. Analytical framework used to guide the case study research.

    analytical framework case study

  2. methodology case study approach

    analytical framework case study

  3. Case study examples mapped to analytical framework.

    analytical framework case study

  4. Analytical Framework for the Case Study, and Elements of the HGSF

    analytical framework case study

  5. Analytical framework for ARISE case studies.

    analytical framework case study

  6. The five steps of the analytical framework.

    analytical framework case study

VIDEO

  1. State Government

  2. Fiscal and Monetary Policy in an Analytical Framework and Their Implications

  3. Case Study Design Explained

  4. #dissentiente #tools #legal #research #methodology #case #analytical #study

  5. Applied Statistics & Analytics Projects 2019

  6. Introduction

COMMENTS

  1. On-Farm Research: Experimental Approaches, Analytical Frameworks, Case Studies, and Impact

    There is a need for developing a new analytical data framework to make better informed management decisions using on-farm strip trial data. Researchers from Iowa have developed an analytical framework called the "Interactive Summary of On-Farm Strip Trials" (ISOFAST) (Laurent et al., 2019). The analytical component of the online tool is ...

  2. Case Study Methods and Examples

    The purpose of case study research is twofold: (1) to provide descriptive information and (2) to suggest theoretical relevance. Rich description enables an in-depth or sharpened understanding of the case. It is unique given one characteristic: case studies draw from more than one data source. Case studies are inherently multimodal or mixed ...

  3. Writing theoretical frameworks, analytical frameworks and conceptual

    An analytical framework is, the way I see it, a model that helps explain how a certain type of analysis will be conducted. For example, in this paper, Franks and Cleaver develop an analytical framework that includes scholarship on poverty measurement to help us understand how water governance and poverty are interrelated.

  4. A Comprehensive Guide to Data Analytics Framework

    Data Analysis - Appropriate analytical techniques are applied based on the business problem. Statistical modeling, data mining, machine learning methods can be used to analyze patterns. ... Case Study on Data Analytics Framework. A retailer was facing declining sales for the past few quarters. They wanted to understand what was causing this ...

  5. What Is a Case Study?

    Revised on November 20, 2023. A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are ...

  6. Analytical Approach and Framework

    An analytical framework is a structure that helps us make sense of data in an organized way. We take an analytical approach by dividing a complex problem into clear, manageable segments and then reintegrating the results into a unified solution. ... Case Study 2: Marketing channel metrics. Data Science Methodology: Step-by-step approach to ...

  7. On‐Farm Research: Experimental Approaches, Analytical Frameworks, Case

    Analytical Frameworks, Case Studies, and Impact Peter M. Kyveryga* Iowa Soybean Association, Ankeny, IA 50023. Received 15 Oct. 2019. Accepted 7 Nov. 2019. *Corresponding author (PKyveryga@ ... case-studies that involved mother-and-baby trials conducted with small holder farmers in Malawi and under intensive agri -

  8. Creating actionable knowledge one step at a time: An analytical

    In addition, the analytical framework and preliminary case analysis results were presented, discussed and validated through regular meetings of the research consortium and representatives from UBA and BMU as well as an external advisory board of researchers from different disciplines (e.g., spatial planning, engineering, transformation research ...

  9. Analytical framework for planning water reuse in the food industry with

    The analytical framework and case study discussed in this paper therefore provide a fresh perspective for managerial decision making in the food industry. Operational principles discussed in Section 4 imply that plants would either have to schedule low water content cheeses prior to cheeses with high water content or balance out the water ...

  10. Machine learning in project analytics: a data-driven framework and case

    Illustrations of this framework using case studies from different project contexts would confirm its robust application. ... S. Social network analysis in project management-A case study of ...

  11. Analytical Framework

    It allows for previous testing of the theoretical causal mechanism before initiating process tracing in the deep within-case study. Ours is a case-driven approach to congruence analysis (Beach & Pedersen, 2016), as opposed to the variable-driven approach that consists in replicating process tracing to multiple cases (Blatter & Haverland, 2014 ...

  12. Learning to Do Qualitative Data Analysis: A Starting Point

    From framework analysis (Ritchie & Spencer, 1994) ... 24 case studies, 19 generic qualitative studies, and eight phenomenological studies. Notably, about half of the articles reported analyzing their qualitative data via content analysis and a constant comparative method, which was also commonly referred to as a grounded theory approach and/or ...

  13. A Fresh Perspective on Case Study Frameworks

    The SWOT analysis is a widely recognized case study framework that helps consultants evaluate a company's internal strengths and weaknesses as well as external opportunities and threats. This framework enables consultants to identify key strategic issues and develop recommendations to address them. Key Components of the SWOT Analysis Framework

  14. Applying the Harvard Gender Analytical Framework: A Case Study from a

    This paper assesses the effectiveness of the Harvard Gender Analytical Framework using a case study of a Maya-Mam community in Guatemala. Data were collected through participant observation ...

  15. PDF A Framework for Data Quality: Case Studies October 2023

    The sixth case study uses the FCSM Framework for Data Quality to demonstrate how the data quality framework can help data collectors plan a new study. In this case study, a data producer intends to ... specific data collection, estimation method, or analysis. Therefore, some case studies do not address every dimension. Specific quality issues ...

  16. An Indicator Measuring the Influence of the Online Public Food

    Further, we prepared the data to test the indicator, evaluated its performance, and tried to answer the two research questions in Sections 3 and 4 Analytical framework and measurement of OPFE-TF and Case study. Analytical Framework and Measurement of OPFE-TF The Relationship Between Food Environment and Populations

  17. A Framework for Case Analysis

    A case study or case analysis is a research method that requires an in-depth analysis of a case to arrive at a business decision. The reader of the case is expected to act as the business manager ...

  18. Analysis of Business Operation Management under the ...

    Harvard Analytical Framework: A Case Study of the Walt . Disney Company . Jingwen Yang. School of Economics, Hefei University of Technology, Hefei 2306 01, China . [email protected] . Abstract.

  19. Data Analytics Case Study Guide 2024

    A data analytics case study comprises essential elements that structure the analytical journey: Problem Context: A case study begins with a defined problem or question. It provides the context for the data analysis, setting the stage for exploration and investigation.. Data Collection and Sources: It involves gathering relevant data from various sources, ensuring data accuracy, completeness ...

  20. A CONCEPTUAL FRAMEWORK FOR CASE STUDY ANALYSIS ON THE ...

    A conceptual framework for case study analysis is developed through qualitative meta-analysis on N=26 articles on the internationalization of Latin American companies in countries with a GDP of ...

  21. A system dynamics-based synergistic model of urban production-living

    Human-land coordination represents urbanization and is a key component of urban modernization. In this study, the theory of system dynamics was introduced, in which a "production-living-ecological" complex system was used based on the human-land coordination concept. Moreover, the characteristics of system dynamics of causal cycle, dynamic and sustainable development, man-land synergy ...

  22. Triple Helix Dynamics and Hybrid Organizations: An Analysis ...

    To address these questions, a case study analysis (Eisenhardt, 1989; ... Triple Helix Systems: An analytical framework for innovation policy and practice in the knowledge society. Industry and Higher Education, 27(4), 237-262. Article Google Scholar Payne, A. F., Storbacka, K., & Frow, P. (2008). Managing the co-creation of value.

  23. Multi-governance in higher education: The case of Chile 2018-2023

    It proposes a conceptual and analytical framework based on a multi-governance approach, which considers multi-level, multi-actor and multi-agenda dimensions. This framework is used to study the different roles played by the State in higher education governance in Chile, especially after the 2018 legislative reforms.

  24. Analysis of Business Operation Management under the Harvard Analytical

    A comprehensive analysis of financial statements can help its users to understand the production and operation situations and development prospects of enterprises thoroughly and accurately in order to make scientific and rational resolutions. This research employs the Harvard Analytical Framework to analyze the finance and operation management situations of the Walt Disney Company.

  25. How to Identify Good Coupling Methods With Error Analysis

    This study introduces a mathematically rigorous analysis framework that provides insight into the features and implications of coupling method choices. As an example, the framework is applied to the Energy Exascale Earth System Model version 1 (E3SMv1) to evaluate two coupling methods used for aerosols.

  26. Conceptualizing platformed conspiracism: Analytical framework and

    Informed by two case studies of de-platforming interventions performed by Facebook against two high profile conspiracy theorists who had been messaging about Covid-19, this article investigates ...

  27. Designing a sustainable municipal solid waste management system over

    Finally, multidimensional indicators were proposed for a sustainability comparative analysis in overdeployment, pressure on urban ecosystem and urban resource conversion efficiency. The proposed method was applied to the case study of household solid waste management in Shanghai and two scenarios were considered.

  28. Land

    As a case study, the automated machine learning method was applied to predict the spatial distribution of soil subgroups in Heshan farm. ... However, a more robust automated framework should be developed to encompass a broader range of spatial prediction methods, such as spatial statistic methods, rather than only focusing on machine learning ...

  29. Evaluation of pedestrian paths in the trade area with water transport

    The method used in this research is descriptive with a qualitative approach through mapping the potential conditions of each character area that is reinforced by using quantitative methods with descriptive and simulation research frameworks. The results of the analysis show that some corridors in the trade area show better integration, which ...