Article preview image

Practical Data Warehousing: Successful Cases

Table of contents:.

No matter how smooth the plan may be in theory, practice will certainly make adjustments. Because each real case has its own characteristics, which in the general case cannot be taken into account. Let's see how the world's leading brands have adapted to their needs a well-known way of storing information — data warehousing.

Global Data Warehousing Market By Application

The Reason for Making Decisions

The need to make business decisions based on data analysis has long been beyond doubt. But to get this data, it needs to be collected, sorted and prepared for analytics .

Operating Supplement

supplier integrations

cost reduction

David Schwarz photo

David Schwarz

Operating Supplement case image

DATAFOREST has the best data engineering expertise we have seen on the market in recent years.

This is what data warehousing specialists do. To focus on the best performance, it makes sense to consider how high-quality custom assemblies came out of this constructor.

Data warehousing interacts with a huge amount of data

A data warehousing is a digital storage system that integrates and reconciles large amounts of data from different sources. It helps companies turn data into valuable information and make informed decisions based on it. Data warehousing combines current and historical data and acts as a single source of reliable information for business.

After raw data mining (extract, transform, load) info enters the warehouse from operating systems, such as an enterprise data resource planning system or a customer relationship management system. Sources also include databases, partner operational systems, IoT devices, weather apps, and social media. Infrastructure can be on-premises or cloud-based, with the latter option predominating in recent times.

Data warehousing is necessary not only for storing information, but also for processing structured and unstructured data: video, photos, sensor indicators. Some data warehousing options use built-in analytics and in-memory database data technology (info is stored in RAM rather than on a hard drive). This is necessary to access reliable data in real time.

After data is sorted, it is sent to data marts for further analysis by BI or data science .

Why consider data warehousing cases

Consideration of known options for data warehousing is necessary, first of all, in order not to keep making the same mistakes. Based on a working solution, you can improve your own performance.

  • When using data warehouses, executives access data from different sources, they do not have to decide blindly.
  • Data warehousing is needed for quick retrieval and analysis. When using warehouses, you can quickly request large amounts of data without involving personnel for this.
  • Before uploading to the warehouse, the system creates data cleansing tasks and puts them for further processing, ensuring converting the data into a consistent format for subsequent analyst reports.
  • The warehouse contains large amounts of historical data and allows you to study past trends and issues to predict events and improve the business structure.

Blindly repeating other people's decisions is also impossible. Your case is unique and probably requires a custom approach. At best, well-known storage solutions can be taken as a basis. You can do it yourself, or you can contact DATAFOREST specialists for professional services. We have a positive experience and positive customer stories of data warehousing creating and operating.

Data warehousing cases

Case 1: How the Amazon Service Does Data Warehousing

Amazon is one of the world's largest and most successful companies with a diversified business: cloud computing, digital content, and more. As a company that generates vast amounts of data (including data warehousing services), Amazon needs to manage and analyze its data effectively.

Two main businesses

Amazon's data warehousing needs are driven by the company's vast and diverse data sources, which require sophisticated tools and technologies to manage and analyze effectively.

1. One of the main drivers of Amazon's business is its e-commerce platform , which allows customers to purchase a wide range of products through its website and mobile apps. Amazon's data warehousing needs in this area are focused on collecting, storing, and analyzing data related to customer behavior, purchase history, and other metrics. This data is used to optimize Amazon's product recommendations engine, personalize the shopping experience for individual customers, and identify growth strategies.

2. Amazon's other primary business unit is Amazon Web Services (AWS), which offers cloud computing managed services to businesses and individuals. AWS generates significant amounts of data from its cloud data infrastructure, including customer usage and performance data. To manage and analyze this modern data effectively, Amazon relies on data warehousing technologies like Amazon Redshift, which enables AWS to provide real-time analytics and insights to its customers.

3. Beyond these core businesses, Amazon also has significant data warehousing needs in digital content (e.g., video, music, and books). Amazon's advertising business relies on data analysis to identify key demographics and target ads more effectively to specific audiences.

By investing in data warehousing and analytics capabilities, Amazon through digital transformation can maintain its competitive edge and continue to grow and innovate in the years to come.

Do you want to streamline your data integration?

Obstacles on the way to the goal.

Amazon faced several specific implementation details and challenges in its data warehousing efforts.

• The brand needed to integrate data from various sources into a centralized data warehouse. It required the development of custom data pipelines to collect and transform data into a standard format.

• Amazon's data warehousing needs are vast and constantly growing, requiring a scalable solution. The company distributed data warehouse architecture center using technologies like Amazon Redshift, allowing petabyte-scale data storage and analysis.

• As a company that generates big data, Amazon would like to ensure that its data warehousing solution could provide real-time data analytics and insights. Achieving high performance requires optimizing data storage, indexing, and querying processes.

• Amazon stores sensitive customer data in its warehouse, prioritizing data security. To protect against security threats, the brand implements various security measures, including encryption, access controls, and threat detection.

• Building and maintaining a data warehousing solution can be expensive. Amazon leverages cloud-based data warehousing solutions (Redshift) to minimize costs, which provide a cost-effective, pay-as-you-go pricing model.

Amazon's data warehousing implementation required careful planning, significant investment in technology and infrastructure, and ongoing optimization and maintenance to ensure high performance and reliability.

Change for the better

When Amazon considered all the needs, found the right tools, and implemented a successful data warehouse, the company got the following main business outcomes:

• Improved data driven decision

• Better customer enablement

• Cost effective decision

• Improved performance

• Competitive advantage

• Scalability

Amazon's data warehousing implementation has driven the company's growth and success. Not surprisingly, a data storage service provider must understand data storage. The cobbler's children don't need to have no shoes.

Case 1: How the Amazon Service Does Data Warehousing

Case 2: Data Warehousing Adventure with UPS

United Parcel Services (UPS) is an American parcel delivery and supply chain management company founded in 1907 with an annual revenue of 71 billion dollars and logistics services in more than 175 countries. In addition, the brand distributes goods, customs brokerage, postal and consulting services. UPS processes approximately 300 million tracking requests daily. This effect was achieved, among others, thanks to intelligent data warehousing.

One mile for $50 million

In 2013, UPS stated that it hosted the world's largest DB2 relational database in two United States data centers for global operations. Over time, global operations began to increase, as did the amount of semi structured data. The goal was to use different forms of storage data to make better users business decisions.

One of the fundamental problems was route optimization. According to an interview with the UPS CTO, saving 1 mile a day per driver could save 1.5 million gallons of fuel per year or $50 million in total savings.

However, the data was distributed in DB2; some included repositories, some local, and some spreadsheets. UPS needed to solve the data infrastructure problem first and then optimize the route.

Four letters "V."

The big data ecosystem efficiently handles the four "Vs": volume, validity, velocity, and variety. UPS has experimented with Hadoop clusters and integrated its storage details and computing system into this ecosystem. They upgraded data warehousing and computing power to handle petabytes of data, one of UPS's most significant technological achievements.

The following Hadoop components were used:

• HDFS for storage

• Map Reduce for fast processing

• Kafka streaming

• Sqoop (SQL-to-Hadoop) for ingestion

• Hive & Pig for structured queries on unstructured data

• monitoring system for data nodes and names

But that's just speculation because, due to confidentiality, UPS didn't declassify the tools and technologies they used in their big data ecosystem.

Constellation of Orion

The result was a four-year ORION (On-Road Integrated Optimization and Navigation) route optimization project. Costs — about one billion dollars a year. ORION used the results to data stores and calculate big data and got analytics from more than 300 million data points to optimize thousands of routes per minute based on real-time information. In addition to the economic benefits, the Orion project shortened approximately 100 million shipping miles and a 100,000-ton reduction in carbon emissions.

Case 2: Data Warehousing Adventure with UPS

Case 3: 42 ERP Into One Data Warehouse

In general, the topic of specific cases of data warehousing implementation is sufficiently secret. There may be cases of consent and legitimate interests in the contracts. There are open-source examples of work, but the vast majority are on paid libraries. The subject is so relevant that you can earn money from it. Therefore, sometimes there are "open" cases, but the brand name is not disclosed.

Brand X needs help

World leader in industrial pumps, valves, actuators, controls, etc., needed help extracting data from disparate ERP systems. They wanted it from 42 ERP instances, standardized flat files, and collected all the information in one data warehouse. The ERP systems were from different vendors (Oracle, SAP, BAAN, Microsoft, PRMS) to complicate future matters.

The client also wanted a core set of metrics and a central dashboard to combine all the information from different locations worldwide. The project resulted from a surge in demand for corporate data from database management. The company knew its data warehousing needed a central repository for all data from its locations worldwide. Requests often came from top to bottom, and when an administrator required access to the correct data, there were logistical extracting problems. And the project gets started.

Are you interested in enhanced insights through data aggregation?

The foundation stone.

The hired third-party developer center has made a roadmap, according to which ERP data was taken from 8 major databases and placed in a corporate data warehouse. It entailed integrating 5 Oracle ERP instances with 3 SAP ERP. Rapid Marts have also been integrated into Oracle ERP systems to improve the project's progress.

One of the main challenges was the need for more standardization of fields or operational data definitions in ERP systems. To solve this problem, the contractor has developed a data service tool that allows access to the back end of the database and displays info suitably. Since then, the customer has known which fields to use and how to set them each time a new ERP instance is encountered. These data definition patterns were the project's foundation stone and completely changed how customer data is handled. It was a point to launch consent.

All roads lead to data warehousing

The company has one common and consistent way to obtain critical indicators. The long-term effect of the project is the ease of obtaining information. What was once a long and inconsistent process of getting relevant information at an aggregate level is now streamlined to store data in one central repository with one team controlling it.

Case 3: 42 ERP Into One Data Warehouse

Data Warehousing: Different Cases — General Conclusions

Each data warehouse organization has unique methods and tools because business needs differ. In this case, data warehousing can be compared with a mosaic and a children's constructor. You can make different figures from the same parts, arranging the elements especially. And if one part is lost or broken, you need to make a new one or find another one and "process it with a rasp."

Generalities between different cases of data warehousing

There are several common themes and practices among successful data warehousing implementations, including:

• Successful data warehousing implementations start with clearly understanding the business objectives and how the warehouse (or data lake) can support those objectives.

• The data modeling process is critical to the success of data warehousing.

• The data warehouse is only as good as the data it contains.

• Successful data warehousing requires efficient data integration processes that can operate large volumes of data and ensure consistency and accuracy.

• Data warehousing needs ongoing performance tuning to optimize query performance.

• A critical factor in data warehousing is a user-friendly interface that makes it easy for end users to access the data and perform complex queries and analyses.

• Continuous improvement is essential to ensure the data warehouse remains relevant and valuable to the business.

Competent data warehousing implementations combine technical expertise and a deep understanding of business details and user needs.

Your case is not mentioned anywhere

When solving the problem of organizing data warehousing , one would like to find a description of the same case and do everything according to plan. But the probability of this event is negligible — you will have to adapt to the specifics of the customer's business and consider your knowledge and capabilities, as well as the technical and financial conditions of the project. Then it would help if you took a piece of the puzzle or parts of the constructor and built your data warehouse. Minus — you have to work. Plus — it will be your decision on data storage and only your implementation.

Data Warehouse-as-a-Service Market Size Global Report, 2022 - 2030

Data Warehousing Is Like a Trampoline

Changes in data warehousing , like any technological and methodological changes, are carried out to improve the data collection, storage, and analysis level. It takes the customer to a new level in his activity and the contractor — to his own. Like a jumper and a trampoline: separately, it is just a gymnast and just equipment, and in combination, they give a certain third quality — the possibility of a sharp rise.

If you are faced with the problem of organizing a new data warehousing system, or you are simply interested in what you read, let's exchange views with DATAFOREST.

What is the benefit of data warehousing for business?

A data warehouse is a centralized repository that contains integrated data from various sources and systems. Data warehousing provides several benefits for businesses: improved decision-making, increased efficiency, better customer insights, operational efficiency, and competitive advantage.

What is the definition of a successful data warehousing implementation?

The specific definition of a successful data warehouse implementation will vary depending on the goals of the organization and the particular use case for data warehousing. Some common characteristics are: meeting business requirements, high data quality, scalability, user adoption, and positive ROI.

What are the general considerations for implementing data warehousing?

Implementing data warehousing involves some general considerations: business objectives, data sources, quality and modeling, technology selection, performance tuning, user adoption, ongoing maintenance, and support.

What are the most famous examples of the implementation of data warehousing?

There are many famous examples of the implementation of data warehousing across industries:

• Walmart has one of the largest data warehousing implementations in the world

• Amazon's data warehousing solution is known as Amazon Redshift

• Netflix uses a data warehouse to store and analyze data from its streaming platform

• Coca-Cola has a warehouse to consolidate data from business units and analyze it

• Bank of America analyzes customer data by data warehousing to improve customer experience

What are the challenges while implementing data warehousing, and how to overcome them?

Based on the experiences of organizations that have implemented data warehousing, some common challenges and solutions are:

• Ensuring the quality of the data that is being stored and analyzed. You must establish data quality standards and implement data validation and cleansing by data types.

• Integrating from disparate data sources. Establishing a clear data integration strategy that considers the different data sources, formats, and protocols involved is vital.

• As the amount of data stored in a data warehouse grows, performance issues may arise. A brand should regularly monitor query performance and optimize the data warehouse to ensure that it remains efficient and effective.

• To ensure that sensitive data stored in the data warehouse is secure. It involves implementing appropriate measures such as access controls, encryption, and regular security audits. They are details of privacy security.

• Significant changes to existing processes and workflows. Solved by establishing a transparent change management process that involves decision-makers and users at all levels.

What is an example of how successful data warehousing has affected a business?

An example of how successful data warehousing has affected Amazon is its recommendation engine. It suggests products to customers based on their browsing and purchasing history. By using artificial intelligence and machine learning algorithms to analyze customer data, Amazon has improved the fully managed accuracy of its recommendations, resulting in increased sales and customer satisfaction.

What role does data integration play in data warehousing?

Data integration is critical to data warehousing, enabling businesses to consolidate and standardize data from multiple sources, ensure data quality, and establish effective data governance practices.

How are data quality and governance tracked in data warehousing?

Data quality and governance are tracked in data warehousing through a combination of data profiling, monitoring, and management processes and establishing data governance frameworks that define policies and procedures for managing data quality and governance. So, businesses can ensure that their data is accurate, consistent, and compliant with regulations, enabling effective decision-making and driving business applications' success.

Are there any measures to the benefits of data warehousing?

The benefits of business data warehousing can be measured through improvements in data quality, efficiency, decision-making, revenue and profitability, and customer satisfaction. By tracking these metrics, businesses can assess the effectiveness of their data warehousing initiatives and make informed decisions about future investments in data management and analytics with cloud services.

How to avoid blunders when warehousing data?

By following the best practices, businesses can avoid common mistakes, minimize the risk of blunders when warehousing data, and ensure their data warehousing initiatives are successful and practical to be analyzed with business intelligence.

Aleksandr Sheremeta photo

Aleksandr Sheremeta

Get More Value!

You will get from us best tailored content that will help your business grow.

Thanks for your submission!

latest posts

Governing with intelligence: the impact of ai on public sector strategies, data science retail use cases: precision and personalization, llava—new standards in ai accuracy, media about us, when it comes to automation, choosing the right partner has never been more important, 15 most innovative database startups & companies, 10 best web development companies you should consider in 2022, try to trying.

Never give up

We love you to

People like this

Success stories

Web app for dropshippers.

hourly users

Shopify stores

Financial Intermediation Platform

model accuracy

timely development

E-commerce scraping

manual work reduced

pages processed daily

DevOps Experience

QPS performance

Supply chain dashboard

system integrations

More publications

Article preview

Let data make value

We’d love to hear from you.

Share the project details – like scope, mockups, or business challenges. We will carefully check and get back to you with the next steps.

DATAFOREST worker

Real World Data Warehousing Examples: Use Cases and Applications

We’re really beginning to experience another industrial revolution. That is, we’re actively entering into the ‘Age of Data.’ As you look at your own life, business, and world around you - you’ll quickly notice that so much of it is now connected in some way. And, soon, our society will become persistently connected as we spread connectivity even further across the globe.

A recent report from IDC indicates these key trends around data:

  • The evolution of data from business background to life-critical . Once siloed, remote, inaccessible, and mostly underutilized, data has become essential to our society and our individual lives. In fact, IDC estimates that by 2025, nearly 20% of the data in the global datasphere will be critical to our daily lives and nearly 10% of that will be hypercritical.
  • Embedded systems and the Internet of Things (IoT) . As standalone analog devices give way to connected digital devices, the latter will generate vast amounts of data that will, in turn, allow us the chance to refine and improve our systems and processes in previously unimagined ways. Big Data and metadata (data about data) will eventually touch nearly every aspect of our lives—with profound consequences. By 2025, an average connected person anywhere in the world will interact with connected devices nearly 4,800 times per day—basically one interaction every 18 seconds.
  • Mobile and real-time data . Increasingly, data will need to be instantly available whenever and wherever anyone needs it. Industries around the world are undergoing "digital transformation" motivated by these requirements. By 2025, more than a quarter of data created in the global datasphere will be real time in nature, and real-time IoT data will make up more than 95% of this.

That being said, it’s important to understand how you can gather, quantify, and actually analyze this information. Coupled with solutions around data analytics and big data processing, data warehousing allows you to take valuable information to an entirely new level. From there, powerful data warehouse solutions help you create data visualization to make better decisions around your business and the market.

But, we’re getting a bit ahead of ourselves. Let’s define data warehousing, look at some use-cases, and discuss a few best practices.

  • What is a data warehouse?  At a very high level, a data warehouse is a system that pulls together data from many different sources within an organization for reporting and analysis. From there, the reports created from complex queries within a data warehouse are used to improve business efficiency, make better decisions, and even introduce competitive advantages. It’s important to note that a data warehouse is definitely different than a traditional database. Sure, data warehouses and databases are both relational data systems, but they were definitely built to serve different purposes. A data warehouse is built to store large quantities of historical data and enable fast, complex queries across all the data, typically using Online Analytical Processing (OLAP). A database was built to store current transactions and enable fast access to specific transactions for ongoing business processes, known as Online Transaction Processing (OLTP).

So, data warehousing allows you to aggregate data, from various sources. This data, typically structured, can come from Online Transaction Processing (OLTP) data such as invoices and financial transactions, Enterprise Resource Planning (ERP) data, and Customer Relationship Management (CRM) data. Finally, data warehousing focuses on data relevant for business analysis, organizes and optimizes it to enable efficient analysis.

  • How are data warehouses used?  Unlike databases and other systems which simply ‘store’ data, data warehousing takes an entirely different approach. Let me give you a few examples and uses. Data warehouses normally use a denormalized data structure , which uses fewer tables because it groups data and doesn’t exclude data redundancies. Denormalization offers better performance when reading data for analytical purposes. On that note, data warehouses are used for business analysis, data and market analytics, and business reporting. Data warehouses typically store historical data by integrating copies of transaction data from disparate sources. Data warehouses can also use real-time data feeds for reports that use the most current, integrated information.

Here’s the other cool part when it comes to use-cases, the structure of data warehouses makes analytical queries much simpler to perform. No advanced knowledge of database applications is required. Analytics in data warehouses is dynamic, meaning it takes into account data that changes over time.

Finally, the cloud. While a traditional data warehouse implementation can sometimes be a very expensive project, SaaS solutions are taking data warehousing to a new level. New cloud-based tools allow enterprises to setup a data warehouse in days, with no upfront investment, and with much greater scalability, storage and query performance.

  • Bottom tier—database server used to extract data from multiple sources
  • Middle Tier—OLAP server, which transforms data to enable analysis and complex queries
  • Top Tier—tools used for high-level data analysis, querying, reporting, and data mining

So, when creating your own data warehousing architecture, follow these three tiers to help identify data points, how you'll analyse them, and what the visualization will look like.

From there, data warehouses are usually structured using one of the following models:

  • Virtual data warehouse—a set of separate databases, which can be queried together, forming one virtual data warehouse.
  • Data mart—small data warehouses set up for business-line specific reporting and analysis. An organization's data marts together comprise the organization's data warehouse.
  • Enterprise data warehouse (EDW)—a large data warehouse holding aggregated data that spans the entire organization.
  • Cloud-based data warehouse—imagine everything you need from a data warehouse, but hosted in the cloud. Cloud-based data warehouse architectures can typically perform complex analytical queries much faster because they are massively parallel processing (MPP).

As you take this all in, remember the one big point I made earlier in the blog. You don’t need to do this all alone. Good partners can help you establish a date baseline and really understand the type of data warehouse architecture you require. From there, you really begin to unleash the power of data as you analyze vast amounts of information and help visualize it for your business.

Also Check Out

Get panoply updates on the fly., work smarter, better, and faster with monthly tips and how-tos..

U.S. flag

An official website of the United States government

Here’s how you know

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Case studies & examples

Articles, use cases, and proof points describing projects undertaken by data managers and data practitioners across the federal government

Agencies Mobilize to Improve Emergency Response in Puerto Rico through Better Data

Federal agencies' response efforts to Hurricanes Irma and Maria in Puerto Rico was hampered by imperfect address data for the island. In the aftermath, emergency responders gathered together to enhance the utility of Puerto Rico address data and share best practices for using what information is currently available.

Federal Data Strategy

BUILDER: A Science-Based Approach to Infrastructure Management

The Department of Energy’s National Nuclear Security Administration (NNSA) adopted a data-driven, risk-informed strategy to better assess risks, prioritize investments, and cost effectively modernize its aging nuclear infrastructure. NNSA’s new strategy, and lessons learned during its implementation, will help inform other federal data practitioners’ efforts to maintain facility-level information while enabling accurate and timely enterprise-wide infrastructure analysis.

Department of Energy

data management , data analysis , process redesign , Federal Data Strategy

Business case for open data

Six reasons why making your agency's data open and accessible is a good business decision.

CDO Council Federal HR Dashboarding Report - 2021

The CDO Council worked with the US Department of Agriculture, the Department of the Treasury, the United States Agency for International Development, and the Department of Transportation to develop a Diversity Profile Dashboard and to explore the value of shared HR decision support across agencies. The pilot was a success, and identified potential impact of a standardized suite of HR dashboards, in addition to demonstrating the value of collaborative analytics between agencies.

Federal Chief Data Officer's Council

data practices , data sharing , data access

CDOC Data Inventory Report

The Chief Data Officers Council Data Inventory Working Group developed this paper to highlight the value proposition for data inventories and describe challenges agencies may face when implementing and managing comprehensive data inventories. It identifies opportunities agencies can take to overcome some of these challenges and includes a set of recommendations directed at Agencies, OMB, and the CDO Council (CDOC).

data practices , metadata , data inventory

DSWG Recommendations and Findings

The Chief Data Officer Council (CDOC) established a Data Sharing Working Group (DSWG) to help the council understand the varied data-sharing needs and challenges of all agencies across the Federal Government. The DSWG reviewed data-sharing across federal agencies and developed a set of recommendations for improving the methods to access and share data within and between agencies. This report presents the findings of the DSWG’s review and provides recommendations to the CDOC Executive Committee.

data practices , data agreements , data sharing , data access

Data Skills Training Program Implementation Toolkit

The Data Skills Training Program Implementation Toolkit is designed to provide both small and large agencies with information to develop their own data skills training programs. The information provided will serve as a roadmap to the design, implementation, and administration of federal data skills training programs as agencies address their Federal Data Strategy’s Agency Action 4 gap-closing strategy training component.

data sharing , Federal Data Strategy

Data Standdown: Interrupting process to fix information

Although not a true pause in operations, ONR’s data standdown made data quality and data consolidation the top priority for the entire organization. It aimed to establish an automated and repeatable solution to enable a more holistic view of ONR investments and activities, and to increase transparency and effectiveness throughout its mission support functions. In addition, it demonstrated that getting top-level buy-in from management to prioritize data can truly advance a more data-driven culture.

Office of Naval Research

data governance , data cleaning , process redesign , Federal Data Strategy

Data.gov Metadata Management Services Product-Preliminary Plan

Status summary and preliminary business plan for a potential metadata management product under development by the Data.gov Program Management Office

data management , Federal Data Strategy , metadata , open data

PDF (7 pages)

Department of Transportation Case Study: Enterprise Data Inventory

In response to the Open Government Directive, DOT developed a strategic action plan to inventory and release high-value information through the Data.gov portal. The Department sustained efforts in building its data inventory, responding to the President’s memorandum on regulatory compliance with a comprehensive plan that was recognized as a model for other agencies to follow.

Department of Transportation

data inventory , open data

Department of Transportation Model Data Inventory Approach

This document from the Department of Transportation provides a model plan for conducting data inventory efforts required under OMB Memorandum M-13-13.

data inventory

PDF (5 pages)

FEMA Case Study: Disaster Assistance Program Coordination

In 2008, the Disaster Assistance Improvement Program (DAIP), an E-Government initiative led by FEMA with support from 16 U.S. Government partners, launched DisasterAssistance.gov to simplify the process for disaster survivors to identify and apply for disaster assistance. DAIP utilized existing partner technologies and implemented a services oriented architecture (SOA) that integrated the content management system and rules engine supporting Department of Labor’s Benefits.gov applications with FEMA’s Individual Assistance Center application. The FEMA SOA serves as the backbone for data sharing interfaces with three of DAIP’s federal partners and transfers application data to reduce duplicate data entry by disaster survivors.

Federal Emergency Management Agency

data sharing

Federal CDO Data Skills Training Program Case Studies

This series was developed by the Chief Data Officer Council’s Data Skills & Workforce Development Working Group to provide support to agencies in implementing the Federal Data Strategy’s Agency Action 4 gap-closing strategy training component in FY21.

FederalRegister.gov API Case Study

This case study describes the tenets behind an API that provides access to all data found on FederalRegister.gov, including all Federal Register documents from 1994 to the present.

National Archives and Records Administration

PDF (3 pages)

Fuels Knowledge Graph Project

The Fuels Knowledge Graph Project (FKGP), funded through the Federal Chief Data Officers (CDO) Council, explored the use of knowledge graphs to achieve more consistent and reliable fuel management performance measures. The team hypothesized that better performance measures and an interoperable semantic framework could enhance the ability to understand wildfires and, ultimately, improve outcomes. To develop a more systematic and robust characterization of program outcomes, the FKGP team compiled, reviewed, and analyzed multiple agency glossaries and data sources. The team examined the relationships between them, while documenting the data management necessary for a successful fuels management program.

metadata , data sharing , data access

Government Data Hubs

A list of Federal agency open data hubs, including USDA, HHS, NASA, and many others.

Helping Baltimore Volunteers Find Where to Help

Bloomberg Government analysts put together a prototype through the Census Bureau’s Opportunity Project to better assess where volunteers should direct litter-clearing efforts. Using Census Bureau and Forest Service information, the team brought a data-driven approach to their work. Their experience reveals how individuals with data expertise can identify a real-world problem that data can help solve, navigate across agencies to find and obtain the most useful data, and work within resource constraints to provide a tool to help address the problem.

Census Bureau

geospatial , data sharing , Federal Data Strategy

How USDA Linked Federal and Commercial Data to Shed Light on the Nutritional Value of Retail Food Sales

Purchase-to-Plate Crosswalk (PPC) links the more than 359,000 food products in a comercial company database to several thousand foods in a series of USDA nutrition databases. By linking existing data resources, USDA was able to enrich and expand the analysis capabilities of both datasets. Since there were no common identifiers between the two data structures, the team used probabilistic and semantic methods to reduce the manual effort required to link the data.

Department of Agriculture

data sharing , process redesign , Federal Data Strategy

How to Blend Your Data: BEA and BLS Harness Big Data to Gain New Insights about Foreign Direct Investment in the U.S.

A recent collaboration between the Bureau of Economic Analysis (BEA) and the Bureau of Labor Statistics (BLS) helps shed light on the segment of the American workforce employed by foreign multinational companies. This case study shows the opportunities of cross-agency data collaboration, as well as some of the challenges of using big data and administrative data in the federal government.

Bureau of Economic Analysis / Bureau of Labor Statistics

data sharing , workforce development , process redesign , Federal Data Strategy

Implementing Federal-Wide Comment Analysis Tools

The CDO Council Comment Analysis pilot has shown that recent advances in Natural Language Processing (NLP) can effectively aid the regulatory comment analysis process. The proof-ofconcept is a standardized toolset intended to support agencies and staff in reviewing and responding to the millions of public comments received each year across government.

Improving Data Access and Data Management: Artificial Intelligence-Generated Metadata Tags at NASA

NASA’s data scientists and research content managers recently built an automated tagging system using machine learning and natural language processing. This system serves as an example of how other agencies can use their own unstructured data to improve information accessibility and promote data reuse.

National Aeronautics and Space Administration

metadata , data management , data sharing , process redesign , Federal Data Strategy

Investing in Learning with the Data Stewardship Tactical Working Group at DHS

The Department of Homeland Security (DHS) experience forming the Data Stewardship Tactical Working Group (DSTWG) provides meaningful insights for those who want to address data-related challenges collaboratively and successfully in their own agencies.

Department of Homeland Security

data governance , data management , Federal Data Strategy

Leveraging AI for Business Process Automation at NIH

The National Institute of General Medical Sciences (NIGMS), one of the twenty-seven institutes and centers at the NIH, recently deployed Natural Language Processing (NLP) and Machine Learning (ML) to automate the process by which it receives and internally refers grant applications. This new approach ensures efficient and consistent grant application referral, and liberates Program Managers from the labor-intensive and monotonous referral process.

National Institutes of Health

standards , data cleaning , process redesign , AI

FDS Proof Point

National Broadband Map: A Case Study on Open Innovation for National Policy

The National Broadband Map is a tool that provide consumers nationwide reliable information on broadband internet connections. This case study describes how crowd-sourcing, open source software, and public engagement informs the development of a tool that promotes government transparency.

Federal Communications Commission

National Renewable Energy Laboratory API Case Study

This case study describes the launch of the National Renewable Energy Laboratory (NREL) Developer Network in October 2011. The main goal was to build an overarching platform to make it easier for the public to use NREL APIs and for NREL to produce APIs.

National Renewable Energy Laboratory

Open Energy Data at DOE

This case study details the development of the renewable energy applications built on the Open Energy Information (OpenEI) platform, sponsored by the Department of Energy (DOE) and implemented by the National Renewable Energy Laboratory (NREL).

open data , data sharing , Federal Data Strategy

Pairing Government Data with Private-Sector Ingenuity to Take on Unwanted Calls

The Federal Trade Commission (FTC) releases data from millions of consumer complaints about unwanted calls to help fuel a myriad of private-sector solutions to tackle the problem. The FTC’s work serves as an example of how agencies can work with the private sector to encourage the innovative use of government data toward solutions that benefit the public.

Federal Trade Commission

data cleaning , Federal Data Strategy , open data , data sharing

Profile in Data Sharing - National Electronic Interstate Compact Enterprise

The Federal CDO Council’s Data Sharing Working Group highlights successful data sharing activities to recognize mature data sharing practices as well as to incentivize and inspire others to take part in similar collaborations. This Profile in Data Sharing focuses on how the federal government and states support children who are being placed for adoption or foster care across state lines. It greatly reduces the work and time required for states to exchange paperwork and information needed to process the placements. Additionally, NEICE allows child welfare workers to communicate and provide timely updates to courts, relevant private service providers, and families.

Profile in Data Sharing - National Health Service Corps Loan Repayment Programs

The Federal CDO Council’s Data Sharing Working Group highlights successful data sharing activities to recognize mature data sharing practices as well as to incentivize and inspire others to take part in similar collaborations. This Profile in Data Sharing focuses on how the Health Resources and Services Administration collaborates with the Department of Education to make it easier to apply to serve medically underserved communities - reducing applicant burden and improving processing efficiency.

Profile in Data Sharing - Roadside Inspection Data

The Federal CDO Council’s Data Sharing Working Group highlights successful data sharing activities to recognize mature data sharing practices as well as to incentivize and inspire others to take part in similar collaborations. This Profile in Data Sharing focuses on how the Department of Transportation collaborates with the Customs and Border Patrol and state partners to prescreen commercial motor vehicles entering the US and to focus inspections on unsafe carriers and drivers.

Profiles in Data Sharing - U.S. Citizenship and Immigration Service

The Federal CDO Council’s Data Sharing Working Group highlights successful data sharing activities to recognize mature data sharing practices as well as to incentivize and inspire others to take part in similar collaborations. This Profile in Data Sharing focuses on how the U.S. Citizenship and Immigration Service (USCIS) collaborated with the Centers for Disease Control to notify state, local, tribal, and territorial public health authorities so they can connect with individuals in their communities about their potential exposure.

SBA’s Approach to Identifying Data, Using a Learning Agenda, and Leveraging Partnerships to Build its Evidence Base

Through its Enterprise Learning Agenda, Small Business Administration’s (SBA) staff identify essential research questions, a plan to answer them, and how data held outside the agency can help provide further insights. Other agencies can learn from the innovative ways SBA identifies data to answer agency strategic questions and adopt those aspects that work for their own needs.

Small Business Administration

process redesign , Federal Data Strategy

Supercharging Data through Validation as a Service

USDA's Food and Nutrition Service restructured its approach to data validation at the state level using an open-source, API-based validation service managed at the federal level.

data cleaning , data validation , API , data sharing , process redesign , Federal Data Strategy

The Census Bureau Uses Its Own Data to Increase Response Rates, Helps Communities and Other Stakeholders Do the Same

The Census Bureau team produced a new interactive mapping tool in early 2018 called the Response Outreach Area Mapper (ROAM), an application that resulted in wider use of authoritative Census Bureau data, not only to improve the Census Bureau’s own operational efficiency, but also for use by tribal, state, and local governments, national and local partners, and other community groups. Other agency data practitioners can learn from the Census Bureau team’s experience communicating technical needs to non-technical executives, building analysis tools with widely-used software, and integrating efforts with stakeholders and users.

open data , data sharing , data management , data analysis , Federal Data Strategy

The Mapping Medicare Disparities Tool

The Centers for Medicare & Medicaid Services’ Office of Minority Health (CMS OMH) Mapping Medicare Disparities Tool harnessed the power of millions of data records while protecting the privacy of individuals, creating an easy-to-use tool to better understand health disparities.

Centers for Medicare & Medicaid Services

geospatial , Federal Data Strategy , open data

The Veterans Legacy Memorial

The Veterans Legacy Memorial (VLM) is a digital platform to help families, survivors, and fellow veterans to take a leading role in honoring their beloved veteran. Built on millions of existing National Cemetery Administration (NCA) records in a 25-year-old database, VLM is a powerful example of an agency harnessing the potential of a legacy system to provide a modernized service that better serves the public.

Veterans Administration

data sharing , data visualization , Federal Data Strategy

Transitioning to a Data Driven Culture at CMS

This case study describes how CMS announced the creation of the Office of Information Products and Data Analytics (OIPDA) to take the lead in making data use and dissemination a core function of the agency.

data management , data sharing , data analysis , data analytics

PDF (10 pages)

U.S. Department of Labor Case Study: Software Development Kits

The U.S. Department of Labor sought to go beyond merely making data available to developers and take ease of use of the data to the next level by giving developers tools that would make using DOL’s data easier. DOL created software development kits (SDKs), which are downloadable code packages that developers can drop into their apps, making access to DOL’s data easy for even the most novice developer. These SDKs have even been published as open source projects with the aim of speeding up their conversion to SDKs that will eventually support all federal APIs.

Department of Labor

open data , API

U.S. Geological Survey and U.S. Census Bureau collaborate on national roads and boundaries data

It is a well-kept secret that the U.S. Geological Survey and the U.S. Census Bureau were the original two federal agencies to build the first national digital database of roads and boundaries in the United States. The agencies joined forces to develop homegrown computer software and state of the art technologies to convert existing USGS topographic maps of the nation to the points, lines, and polygons that fueled early GIS. Today, the USGS and Census Bureau have a longstanding goal to leverage and use roads and authoritative boundary datasets.

U.S. Geological Survey and U.S. Census Bureau

data management , data sharing , data standards , data validation , data visualization , Federal Data Strategy , geospatial , open data , quality

USA.gov Uses Human-Centered Design to Roll Out AI Chatbot

To improve customer service and give better answers to users of the USA.gov website, the Technology Transformation and Services team at General Services Administration (GSA) created a chatbot using artificial intelligence (AI) and automation.

General Services Administration

AI , Federal Data Strategy

resources.data.gov

An official website of the Office of Management and Budget, the General Services Administration, and the Office of Government Information Services.

This section contains explanations of common terms referenced on resources.data.gov.

book

Estuary Flow

Build fully managed real-time data pipelines in minutes.

Estuary vs. Fivetran

Estuary vs. Confluent

Estuary vs. Airbyte

Estuary vs. Debezium

Product Tour [2 min]

Real-time 101 [30 min]

CASE STUDIES

True Platform

Soli & Company

Connect&GO

Real-Time Data Warehouse Examples (Real World Applications)

Discover how businesses are leveraging real-time data warehouses to gain actionable insights, make informed decisions, and drive growth..

Author's avatar

Gone are the days when organizations had to rely on stale, outdated data for their strategic planning and operational processes. Now,  real-time data warehouses process and analyze data as it is generated, helping overcome the limitations of their traditional counterparts. The impact of real-time data warehousing is far-reaching. From eCommerce businesses to healthcare providers, real-time data warehouse examples and applications span various sectors.

The significance of real-time data warehousing becomes even more evident when we consider the sheer volume of data being generated today. The global data sphere is projected to reach a staggering  180 zettabytes by 2025 . 

With these numbers, it’s no wonder every company is looking for solutions like real-time data warehousing for managing their data efficiently. However, getting the concept of a real-time data warehouse, particularly when compared with a traditional data warehouse, can be quite intimidating, even for the best of us. 

In this guide, with the help of a range of examples and real-life applications, we will explore how real-time data warehousing can help organizations across different sectors overcome the data overload challenge.

  • What Is A Real-Time Data Warehouse?

Blog Post Image

Image Source

A  Real-Time Data Warehouse (RTDW) is a  modern tool for data processing that provides immediate access to the most recent data. RTDWs use real-time  data pipelines to transport and collate data from multiple data sources to one central hub, eliminating the need for batch processing or outdated information.

Despite similarities with traditional data warehouses, RTDWs are capable of  faster data ingestion and processing speeds . They can detect and rectify errors instantly before storing the data, providing consistent data for an effective decision-making process.

Real-Time Data Warehouse Vs Traditional Data Warehouse

Traditional data warehouses act as storage centers for  accumulating an organization’s historical data from diverse sources. They combine this varied data into a unified view and provide comprehensive insights into the past activities of the organization. However, these  insights are often outdated by the time they are put to use , as the data could be days, weeks, or even months old.

On the other hand, real-time data warehousing brings a significant enhancement to this model by  continuously updating the data they house. This dynamic process provides a current snapshot of the organization’s activities at any given time, enabling immediate analysis and action. 

Let’s look at some of the major differences between the two.

Complexity & Cost

RTDWs are  more complex and costly to implement and maintain than traditional data warehouses. This is because they require more advanced technology and infrastructure to handle real-time data processing.

Decision-Making Relevance

Traditional data warehouses predominantly assist in long-term strategic planning. However, the real-time data updates in RTDWs make them  suitable for both immediate, tactical decisions and long-term strategic planning.

Correlation To Business Results

Because of fresher data availability, RTDWs make it easier to  connect data-driven insights with real business results and provide immediate feedback.

Operational Requirements

RTDWs demand constant data updates, a process that can be carried out without causing downtime in the data warehouse operations . Typically, traditional warehouses don't need this feature but it becomes crucial when dealing with data updates happening every week.

Data Update Frequency

While the lines between traditional data warehouses and real-time data warehouses are now blurred due to some data warehouses adopting streaming methods to load data, traditionally, the former updated their data in batches on a daily, weekly, or monthly schedule. As a result, the data some of these data warehouses hold may not reflect the most recent state of the business. In contrast, real-time data warehouses  update their data almost immediately as new data arrives.

3 Major Types Of Data Warehouses

Let's take a closer look at different types of data warehouses and explore how they integrate real-time capabilities.

Enterprise Data Warehouse (EDW)

Blog Post Image

An Enterprise Data Warehouse (EDW) is a  centralized repository that stores and manages large volumes of structured and sometimes unstructured data  from various sources within an organization. It serves as a comprehensive and unified data source for business intelligence, analytics, and reporting purposes. The EDW consolidates data from multiple operational systems and transforms it into a consistent and standardized format.

The EDW is designed to  handle and scale with large volumes of data . As the organization's data grows over time, the EDW can accommodate the increasing storage requirements and processing capabilities. It also acts as a  hub for integrating data from diverse sources across the organization . It gathers information from operational systems, data warehouses, external sources, cloud-based platforms, and more.

Operational Data Store (ODS)

Blog Post Image

An Operational Data Store (ODS) is designed to  support operational processes and provide real-time or near-real-time access to current and frequently changing data. The primary purpose of an ODS is to facilitate operational reporting, data integration, and data consistency across different systems. 

ODS collects data from various sources, like transactional databases and external feeds, and  consolidates it in a more user-friendly and business-oriented format.  It typically stores detailed and granular data that reflects the most current state of the operational environment. 

Blog Post Image

A  Data Mart is a specialized version of a data warehouse that is  designed to meet the specific analytical and reporting needs of a particular business unit , like sales, marketing, finance, or human resources.

Data Marts provide a more targeted and simplified view of data. It contains a  subset of data that is relevant to the specific business area , organized in a way that facilitates easy access and analysis.

Data Marts are  created by extracting, transforming, and loading (ETL) data from the data warehouse or other data sources and structuring it to support analytical needs. They can include pre-calculated metrics, aggregated data, and specific dimensions or attributes that are relevant to the subject area.

  • 11 Applications Of Real-Time Data Warehouses Across Different Sectors 

The use of RTDWs is now common across many sectors. The rapid access to information they provide significantly improves the operations of many businesses, from online retail to healthcare.

Let’s take a look at some major sectors that benefit from these warehouses for getting up-to-the-minute data.

In the dynamic eCommerce industry, RTDWs facilitate immediate data processing that is used to get insights into customer behavior, purchase patterns, and website interactions. This enables marketers to  deliver personalized content, targeted product recommendations, and swift customer service . Additionally, real-time inventory updates help maintain optimal stock levels, minimizing overstock or stock-out scenarios.

RTDWs empower AI/ML algorithms with new, up-to-date data. This ensures models make predictions and decisions based on the most current state of affairs. For instance, in automated trading systems, real-time data is critical for  making split-second buying and selling decisions.

Manufacturing & Supply Chain

RTDWs support advanced manufacturing processes such as  real-time inventory management, quality control, and predictive maintenance . It provides crucial support for business intelligence operations. You can make swift adjustments in production schedules based on instantaneous demand and supply data to  optimize resource allocation and reduce downtime.

RTDWs in healthcare help improve care coordination. It provides  instant access to patient records, laboratory results, and treatment plans, improving care coordination . They also support real-time monitoring of patient vitals and provide immediate responses to critical changes in patient conditions.

Banking & Finance 

In banking and finance, RTDWs give you the  latest updates on customer transactions, market fluctuations, and risk factors . This real-time financial data analysis helps with immediate fraud detection, instantaneous credit decisions, and real-time risk management.

Financial Auditing

RTDWs enable continuous auditing and monitoring to give auditors  real-time visibility into financial transactions . It helps identify discrepancies and anomalies immediately to enhance the accuracy of audits and financial reports.

Emergency Services

RTDWs can keep track of critical data like the  location of incidents, available resources, and emergency personnel status . This ensures an efficient deployment of resources and faster response times, potentially saving lives in critical situations.

Telecommunications

RTDWs play a vital role in enabling efficient network management and enhancing overall customer satisfaction. They provide  immediate analysis of network performance, customer usage patterns, and potential system issues . This improves service quality, optimizes resource utilization, and proactive problem resolution.

Online Gaming

RTDWs provide  analytics on player behaviors, game performance, and in-game purchases  to support online gaming platforms. This enables game developers to promptly adjust game dynamics, improve player engagement, and optimize revenue generation.

Energy Management

In the energy sector, RTDWs provide  instantaneous data on energy consumption, grid performance, and outage situations. This enables efficient energy distribution, quick response to power outages, and optimized load balancing.

Cybersecurity

RTDWs are crucial for cybersecurity as they provide  real-time monitoring of network activities and immediate detection of security threats. This supports swift countermeasures, minimizes damage, and enhances the overall security posture.

  • Real-Time Data Warehouse: 3 Real-Life Examples For Enhanced Business Analytics

To truly highlight the importance of real-time data warehouses, let’s discuss some real-life case studies.

Case Study 1: Beyerdynamic 

Beyerdynamic , an audio product manufacturer from Germany, was facing difficulties with its previous method of analyzing sales data . In this process, they extracted data from their legacy systems into a spreadsheet and then compiled reports, all manually. It was time-consuming and often caused inaccurate reports.  

To overcome these challenges, Beyerdynamic developed a  data warehouse that automatically extracted transactions from its existing ERP and financial accounting systems. This data warehouse was carefully designed to store standard information for each transaction, like product codes, country codes, customers, and regions. 

They also implemented a web-based reporting solution that helped managers create their standard and ad-hoc reports based on the data held in the warehouse.

Supported by an optimized data model, the new system allowed the company to perform detailed sales data analyses and identify trends in different products or markets.

  • Production plans could be adjusted quickly based on changing demand , ensuring the company neither produced excessive inventory nor missed out on opportunities to capitalize on increased demand.
  • With the new system, the company could use  real-time data for performance measurement and appraisal . Managers compared actual sales with targets by region, assessed the success of promotions, and quickly responded to any adverse variances.
  • Sales and distribution strategies could be quickly adapted according to changing demands in the market. For instance, when gaming headphone sales started increasing in Japan, the company promptly responded with tailored promotions and advertising campaigns.

Case Study 2: Continental Airlines 

Continental Airlines is a major player in the aviation world. It  faced significant issues because of old, manual systems. Their outdated approach slowed down decision-making and blocked easy access to useful data from departments like customer service, flight operations, and financials. Also, the lack of real-time data meant that decisions were often based on outdated information.

They devised a robust plan that hinged on 2 key changes: the  ‘Go Forward’ strategy and a  ‘real-time data warehouse’

  • Go Forward Strategy:  This initiative focused on tailoring the airline’s services according to the customer’s preferences. The concept was simple but powerful –  understand what the customer wants and adapt services to fit that mold . In an industry where customer loyalty can swing on a single flight experience, this strategy aims to ensure satisfaction and foster brand loyalty.
  • Real-Time Data Warehouse:  In tandem with the new strategy, Continental also implemented an RTDW. This technological upgrade gave the airline quick access to current and historical data. The ability to extract insights from this data served as a vital reference point for strategic decision-making, optimizing operations, and enhancing customer experiences.

The new strategy and technology led to critical improvements:

  • The airline could offer a personalized touch by understanding and acting on customer preferences. This  raised customer satisfaction and made the airline a preferred choice for many.
  • The introduction of the RTDW brought simplicity and efficiency to the company’s operations. It facilitated quicker access to valuable data which was instrumental in  reducing the time spent on managing various systems. This, in turn, resulted in significant cost savings and increased profitability.

Case Study 3: D Steel 

D Steel, a prominent steel production company, was facing a unique set of challenges when they aimed to  set up a real-time data warehouse to analyze their operations. While they tried to use their existing streams package for synchronization operations, several obstacles emerged.

The system was near real-time but it  couldn't achieve complete real-time functionality.  The load on the source server was significantly high and synchronization tasks required manual intervention.

More so, it lacked automation for  Data Definition Language (DDL) , compatibility with newer technologies, and had  difficulties with data consistency verification, recovery, and maintenance . These challenges pushed the steel company to seek a new solution.

The Solution

D Steel decided to implement real-time data warehouse solutions that enabled instant data access and analysis. 

The new RTDWs system proved to be extremely successful as it resolved all previous problems. It provided:

  • Real-time synchronization
  • Implementing DDL automation
  • Automated synchronization tasks
  • Reduced the load on the source server

The system also introduced a unique function that  compared current year data with that of the previous year  and helped the company in annual comparison analysis.

  • Enhancing Real-Time Data Warehousing: The Role of Estuary Flow

Blog Post Image

Estuary’s Flow is our  data operations platform   that binds various systems by a central  data pipeline . With Flow, you get diverse systems for storage and analysis, like databases and data warehouses. Flow is pivotal in  maintaining synchronization amongst these systems, ensuring that new data feeds into them continuously.

Flow utilizes  real-time data lakes as an integral part of its data pipeline. This serves dual roles. 

First, it works as a transit route for data and facilitates an easy flow and swift redirection to distinct storage endpoints. This feature also helps in backfilling data from these storage points.

The  secondary role of the data lake in Flow is to serve as a reliable storage backbone . You can lean on this backbone without the fear of turning into a chaotic ‘data swamp.’ 

Flow assures automatic organization and management of the  data lake . As data collections move through the pipeline, Flow applies different schemas to them as per the need.

Remember that the data lake in Flow doesn’t replace your ultimate storage solution. Instead, it aims to  synchronize and enhance other storage systems crucial for powering key workflows , whether they're analytical or transactional.

As we have seen with real-time data warehouse examples, this solution transcends industry boundaries. Only those organizations that embrace real-time data warehousing to its fullest can unlock the true potential of their data assets. 

While it can be a little tough to implement, the benefits of real-time data warehousing far outweigh the initial complexities, and the long-term advantages it offers are indispensable in today's data-driven world.

If you’re considering setting up a real-time data warehouse, investing in a top-notch real-time data ingestion pipeline like  Estuary Flow should be your first step. Designed specifically for building real-time data management, Flow provides a no-code solution to synchronize your many data sources and integrate fresh data seamlessly.  Signup for Estuary Flow for free and seize the opportunity today.

Start streaming your data for free

In this article

Popular Articles

debezium alternatives

ChatGPT for Sales Conversations: Building a Smart Dashboard

Author's Avatar

Why You Should Reconsider Debezium: Challenges and Alternatives

debezium alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming pipelines., simple to deploy., simply priced..

  • Storage Hardware
  • Storage Software
  • Storage Management
  • Storage Networking
  • Backup and Recovery

Logo

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More .

A data warehouse is a data management system used primarily for business intelligence (BI) and analytics. Data warehouses store large amounts of historical data from a wide range of sources and make it available for queries and analysis. These systems are capable of storing large amounts of unstructured data, unlike traditional relational databases, making them ideal for big data projects and real-time data processing. The value of the data in a warehouse grows over time, as the historical record of customer, product, and business process metrics can be analyzed to identify trends and behaviors.

This article looks at 10 common enterprise use cases for data warehouses.

Data Warehouses for Tactical Reporting

Data warehouses are great for storing data for reporting purposes. Because they’re optimized for high-performance queries, they’re perfect for ad-hoc or on-demand operations and performance reporting. Data warehouses are often used to consolidate data from multiple source systems, providing a holistic, global view of how particular factors are interacting with different areas.

Because of their speed and built-in performance optimization, they’re ideal for grabbing information on the go or for urgent matters. They provide answers almost instantly instead of making you wait for hours or days to generate reports the traditional way. The reports are also more accurate, as they include information from across the organization rather than piecemeal, which can lead to silos or outdated information.

Data Warehouses for Big Data Integration

It’s estimated that about 80 percent of data generated by enterprises is unstructured—think emails, PDF documents, social media posts, and multimedia files. Unstructured data is notoriously difficult to house and use effectively, and most solutions are not comprehensive enough to integrate all of your organization’s sources of unstructured data effectively, which means you’ll either miss important insights or have subpar results when compared to what you could achieve with an enterprise-grade data warehouse.

Using a data warehouse, the flow of data is more trustworthy because it has been verified at least once by multiple parties through on-demand data queries. It will also let you automate big data analysis, which gives analysts more time to focus on deep dives into specific problems rather than trying to wrangle disparate tools and solutions together. By gathering both structured and unstructured data from multiple sources across your organization and storing it in a data warehouse, you can create a more holistic view of your business’s data for processing and analysis.

Data Warehouses for Natural Language Processing (NLP)

Many organizations are looking to improve customer service through natural language processing (NLP), which allows for quick analysis and provides opportunities for growth in the support, sales, and marketing departments.

A data warehouse can store the massive amounts of structured and unstructured data submitted by customers and clients, which can then be analyzed using NLP models. Adequate analysis of this data leads to a real-time response by organization employees or bots, such as live chat assistance or responses based on past interactions with customers.

This kind of data mining is difficult without a stable data storage system like a data warehouse. It’s important to collect all information about your customers—including email, telephone calls, and social media posts—so it can be properly categorized and filed according to what products or services they use most often. This is essential for constructing profiles about each specific client which make up their unique digital identities, where all related information is stored within one instance.

Data Warehouses for Auditing and Compliance

Auditing and compliance checks are both labor-intensive tasks. Auditors need to look over spreadsheets of data, while compliance officers need to read through legal documents—tedious exercises that make keeping up with regulator demands difficult.

Data warehouses store electronic copies of important documents, saving time and money and reducing the rate of error and enabling more accurate analysis of the results. A good data warehouse will also have a structured storage format, so all relevant records can be retrieved instantly. This makes auditing faster and easier, while also making compliance easier because companies can quickly prove they’re in line with current regulations.

Learn more about compliance regulations for data storage systems . 

Data Warehouses for Data-Mining Analytics

Companies like Netflix base many business decisions on data-mining analytics, including which content is most popular, what promotional strategies work best, and which marketing campaigns resonate with subscribers. The data-mining analytics process stores massive amounts of data in a centralized location for easy analysis. Data warehouses are well-suited to data mining analytics, as they can store and make available the data necessary for insights as well as intellectual property and competitive intelligence.

Data Warehouses to Address Data Quality Issues

It’s important to promptly address errors and missed updates to avoid resulting in corrupt data or generating isolated silos, which can cause accuracy problems in analytics. One of data warehousing’s biggest benefits is that it enables business intelligence teams to act on errors in their databases.

Instead of manually correcting each error as it pops up, these tasks can be automated using extract, transform, load (ETL) tools like Informatica or Talend. For example, you could use SQL Server Integration Services (SSIS) to compare customer records with shipping records, and if a problem occurs—for instance, if one person receives multiple shipments from different addresses—you could fix it by adjusting an existing master record or creating a new one.

Data warehousing makes such fixes possible because it lets companies track and update data in large volumes over time, so errors don’t pile up and go unnoticed. And once a data warehouse is set up, IT departments can add functionality with minimal effort—no need to reinvent data systems when regulations change or when new uses arise for their data. By taking advantage of built-in data management features when necessary, IT professionals also spend less time trying to patch together ad hoc solutions.

Data Warehouses for RTDW Processing

Real-time data warehousing, or RTDW, refers to the instantaneous processing of all enterprise data for analysis as soon as it enters an organization’s information system. This effectively reduces or eliminates costly and time-consuming post-processing long data backlogs. Here are the major benefits of RTDW that help enterprises derive better business results:

  • Instant decision-making support to line of business users and customer service personnel
  • More accurate predictions and forecasts
  • Better data governance and security with fewer updates and reconciliations
  • Improved data quality through real-time validation, quality assurance, and error checking—data is continuously cleaned, updated, and validated
  • Streamlined operations, which can help identify inefficiencies and improve process optimization
  • Reduced costs through predictive analytics and automated diagnostic reporting
  • Reduced manual processing errors through early detection and resolution
  • Increased operational efficiency with advanced high-speed data retrieval
  • Improved customer service and satisfaction through real-time responses to customer behavior and patterns 
  • Risk mitigation through faster issue-responses
  • Reduced capital expenditure through efficient resource usage
  • Augmented business agility and resiliency through reduced dependence on manual processing

Logistics and manufacturing are two industries where real-time data warehousing can have a big impact on operations. For example, a manufacturer may want to know about a faulty component as soon as it is installed to initiate a recall or initiate preventive measures, or logistics providers could analyze shipment data to better prepare for demand spikes and optimize routes.

Data Warehousing for Big Data Analysis

Organizations dealing with large volumes of data—internet-based businesses that process millions of credit card transactions every month, for example—need to manage all that information. Data Warehouses are specifically designed to deal with massive amounts of data quickly and reliably, which makes them an essential tool for analysis purposes.

Traditional data processing systems like relational databases simply can’t cope with such quantities of data. They also lack necessary features such as security and database indexing, which significantly increases latency times during both writing and reading operations.

Also read: Top Big Data Tools & Software 2021

Data Warehouses for Data-Driven Decision-Making

Data warehouse solutions make it possible to make critical business decisions based on new insights from your company’s historical data. You can then use your new knowledge to inform big-picture plans, such as where to focus marketing efforts or what products and services to develop.

For example, consider the University of St. Andrews , where student administrators relied heavily on data warehouse and reporting systems to generate insights into student data. Keeping data on more than 10,000 students created numerous problems with the school’s legacy systems. It implemented a hybrid architecture and approach to its system, allowing staff to analyze student data on-demand and implement the flexibility for future upgrades and developments.

Data Warehouses for Business Intelligence

For true end-to-end system visibility, enterprises need a BI platform that can act as a hub for all of their structured and unstructured data. An online transaction processing (OLTP) data warehouse is great for storing transactional data at high volumes, but not optimized for business intelligence—depending on how the OLTP and BI systems are designed, they may not even integrate. Alternatively, online analytical processing (OLAP) systems are optimized for fast data processing and analysis, enabling businesses to promptly and easily pull insights from large amounts of data, identifying patterns and trends in order to inform business decisions.

An OLAP data warehouse will provide better access to important information in real time and help simplify complex data queries by consolidating critical data in one place. If you already have an operational data store in place but want to go further with your big data strategy, then building out a scalable business intelligence platform is key to moving forward with information discovery efforts across an enterprise.

Bottom Line: Data Warehouses for Enterprises

Data warehousing allows businesses to understand past data performance to develop effective plans and provides historical information they can refer back to later when making important business decisions. Data warehouses are designed to store massive amounts of structured and unstructured data for analysis and business intelligence, providing a holistic and historical record and serving as the enterprise’s “single source of truth.” A successful data warehouse strategy helps businesses understand exactly where they stand today and set measurable benchmarks that can drive long-term growth.

Read next: Enterprise Data Storage Compliance Guide

Anina Ot

Related Articles

What is fibre channel over ethernet (fcoe), what is hyperconverged storage uses & benefits, best enterprise hard drives for 2023, get the free newsletter.

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 software defined storage best practices, 9 types of computer memory defined (with use cases).

Logo

Affiliation Mission Data, Haute Autorité de Santé, Saint-Denis, France

Affiliations Univ. Lille, CHU Lille, ULR 2694—METRICS: Évaluation des Technologies de santé et des Pratiques médicales, Lille, France, Fédération régionale de recherche en psychiatrie et santé mentale (F2RSM Psy), Hauts-de-France, Saint-André-Lez-Lille, France

Affiliation Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-Santé, LIMICS, France

  • Matthieu Doutreligne, 
  • Adeline Degremont, 
  • Pierre-Alain Jachiet, 
  • Antoine Lamer, 
  • Xavier Tannier

PLOS

Published: July 6, 2023

  • https://doi.org/10.1371/journal.pdig.0000298
  • Reader Comments

29 Sep 2023: Doutreligne M, Degremont A, Jachiet PA, Lamer A, Tannier X (2023) Correction: Good practices for clinical data warehouse implementation: A case study in France. PLOS Digital Health 2(9): e0000369. https://doi.org/10.1371/journal.pdig.0000369 View correction

Fig 1

Real-world data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern clinical data warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema, and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multicentric data reuses as well as innovations in routine care.

Author summary

Reusing routine care data does not come free of charges. Attention must be paid to the entire life cycle of the data to create robust knowledge and develop innovation. Building upon the first overview of CDWs in France, we document key aspects of the collection and organization of routine care data into homogeneous databases: governance, transparency, types of data, data reuse main objectives, technical tools, documentation, and data quality control processes. The landscape of CDWs in France dates from 2011 and accelerated in the late 2020, showing a progressive but still incomplete homogenization. National and European projects are emerging, supporting local initiatives in standardization, methodological work, and tooling. From this sample of CDWs, we draw general recommendations aimed at consolidating the potential of routine care data to improve healthcare. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the data transformation tools and studies must improve to allow successful multicentric data reuses as well as innovations for the patient.

Citation: Doutreligne M, Degremont A, Jachiet P-A, Lamer A, Tannier X (2023) Good practices for clinical data warehouse implementation: A case study in France. PLOS Digit Health 2(7): e0000298. https://doi.org/10.1371/journal.pdig.0000298

Editor: Dukyong Yoon, Yonsei University College of Medicine, REPUBLIC OF KOREA

Copyright: © 2023 Doutreligne et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: MD, AD, PAJ salaries were funded by the French Haute Autorité de Santé (HAS). XT received fundings to participate in interviews and participate to the article redaction. AL received no fundings for this study. The funders validated the study original idea and the study conclusions. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: The first author did a (non-paid) visiting in Leo Anthony Celi’s lab during the first semester of 2023.

Introduction

Real-world data.

Health information systems (HIS) are increasingly collecting routine care data [ 1 – 7 ]. This source of real-world data (RWD) [ 8 ] bears great promises to improve the quality of care. On the one hand, the use of this data translates into direct benefits—primary uses—for the patient by serving as the cornerstone of the developing personalized medicine [ 9 , 10 ]. They also bring indirect benefits—secondary uses—by accelerating and improving knowledge production: on pathologies [ 11 ], on the conditions of use of health products and technologies [ 12 , 13 ], on the measures of their safety [ 14 ], efficacy or usefulness in everyday practice [ 15 ]. They can also be used to assess the organizational impact of health products and technologies [ 16 , 17 ].

In recent years, health agencies in many countries have conducted extensive work to better support the generation and use of real-life data [ 8 , 17 – 19 ]. Study programs have been launched by regulatory agencies: the DARWIN EU program by the European Medicines Agency and the Real World Evidence Program by the Food and Drug Administration [ 20 ].

Clinical data warehouse

In practice, the possibility of mobilizing these routinely collected data depends very much on their degree of concentration, in a gradient that goes from centralization in a single, homogenous HIS to fragmentation in a multitude of HIS with heterogeneous formats. The structure of the HIS reflects the governance structure. Thus, the ease of working with these data depends heavily on the organization of the healthcare actors. The 2 main sources of RWD are insurance claims—more centralized—and clinical data—more fragmented.

Claims data is often collected by national agencies into centralized repositories. In South Korea, the government agency responsible for healthcare system performance and quality (HIRA) is connected to the HIS of all healthcare stakeholders. HIRA data consists of national insurance claims [ 21 ]. England has a centralized healthcare system under the National Health Service (NHS). Despite not having detailed clinical data, this allowed the NHS to merge claims data with detailed data from 2 large urban medicine databases, corresponding to the 2 major software publishers [ 22 ]. This data is currently accessed through Opensafely, a first platform focused on Coronavirus Disease 2019 (COVID-19) research [ 23 ]. In the United States, even if scattered between different insurance providers, claims are pooled into large databases such as Medicare, Medicaid, or IBM MarketScan. Lastly, in Germany, the distinct federal claims have been centralized only very recently [ 24 ].

Clinical data on the other hand, tends to be distributed among many entities, that made different choices, without common management or interoperability. But large institutional data-sharing networks begin to emerge. South Korea very recently launched an initiative to build a national wide data network focused on intensive care. United States is building Chorus4ai, an analysis platform pooling data from 14 university hospitals [ 25 ]. To unlock the potential of clinical data, the German Medical Informatics Initiative [ 26 ] created 4 consortia in 2018. They aim at developing technical and organizational solutions to improve the consistency of clinical data.

Israel stands out as one of the rare countries that pooled together both claims and clinical data at a large scale: half of the population depends on 1 single healthcare provider and insurer [ 27 ].

An infrastructure is needed to pool data data from 1 or more medical information systems—whatever the organizational framework—to homogeneous formats, for management, research, or care reuses [ 28 , 29 ]. Fig 1 illustrates for a CDW, the 4 phases of data flow from the various sources that make up the HIS:

  • Collection and copying of original sources.
  • Integration of sources into a unique database.
  • Deduplication of identifiers.
  • Standardization: A unique data model, independent of the software models harmonizes the different sources in a common schema, possibly with common nomenclatures.
  • Pseudonymization: Removal of directly identifying elements.
  • Provision of subpopulation data sets and transformed datamarts for primary and secondary reuse.
  • Usages thanks to dedicated applications and tools accessing the datamarts and data sets.

In France, the national insurer collects all hospital activity and city care claims into a unique reimbursement database [ 13 ]. However, clinical data is historically scattered at each care site in numerous HISs. Several hospitals deployed efforts for about 10 years to create CDWs from electronic medical records [ 30 – 39 ]. This work has accelerated recently, with the beginning of CDWs structuring at the regional and national levels. Regional cooperation networks are being set up—such as the Ouest Data Hub [ 40 ]. In July 2022, the Ministry of Health opened a 50 million euros call for projects to set up and strengthen a network of hospital CDWs coordinated with the national platform, the Health Data Hub by 2025.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

CDW: Four steps of data flow from the Hospital Information System: (1) collection, (2) transformations, and (3) provisioning. CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g001

Based on an overview of university hospital CDWs in France, this study makes general recommendations for properly leveraging the potential of CDWs to improve healthcare. It focuses on: governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes.

Material and methods

Interviews were conducted from March to November 2022 with 32 French regional and university hospitals, both with existing and prospective CDWs.

Ethics statement

This work has been authorized by the board of the French High Authority of Health (HAS). Every interviewed participant was asked by email for their participation and informed on the possible forms of publication: a French official report and an international publication. Furthermore, at each interview, every participant has been asked for their agreement before recording the interview. Only 1 participant refused the video to be recorded.

Semi-structured interviews were conducted on the following themes: the initiation and construction of the CDWs, the current status of the project and the studies carried out, opportunities and obstacles, and quality criteria for observational research. S1 Table lists all interviewed people with their team title. The complete form, with the precised questions, is available in S2 Table .

The interview form was sent to participants in advance and then used as a support to conduct the interviews. The interviews lasted 90 min and were recorded for reference.

Quantitative methods

Three tables detailed the structured answers in S1 Text . The first 2 tables deal with the characteristics of the actors and those of the data warehouses. We completed them based on the notes taken during the interviews, the recordings, and by asking the participants for additional information. The third table focuses on ongoing studies in the CDWs. We collected the list of these studies from the dedicated reporting portals, which we found for 8 out of 14 operational CDWs. We developed a classification of studies, based on the typology of retrospective studies described by the OHDSI research network [ 41 ]. We enriched this typology by comparing it with the collected studies resulting in the 6 following categories:

  • Outcome frequency : Incidence or prevalence estimation for a medically well-defined target population.
  • Population characterization : Characterization of a specific set of covariates. Feasibility and prescreening studies belong to this category [ 42 ].
  • Risk factors : Identification of covariates most associated with a well-defined clinical target (disease course, care event). These studies look at association study without quantifying the causal effect of the factors on the outcome of interest.
  • Treatment effect : Evaluation of the effect of a well-defined intervention on a specific outcome target. These studies intend to show a causal link between these 2 variables [ 43 ].
  • Development of diagnostic and prognostic algorithms : Improve or automate a diagnostic or prognostic process, based on clinical data from a given patient. This can take the form of a risk, a preventive score, or the implementation of a diagnostic assistance system. These studies are part of the individualized medicine approach, with the goal of inferring relevant information at the level of individual patient’s files.
  • Medical informatics : Methodological or tool oriented. These studies aim to improve the understanding and capacity for action of researchers and clinicians. They include the evaluation of a decision support tool, the extraction of information from unstructured data, or automatic phenotyping methods.

Studies were classified according to this nomenclature based on their title and description.

Fig 2 summarizes the development state of progress of CDWs in France. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The results are described for all projects that are at least in the prospective stage minus the 3 that we were unable to interview after multiple reminders (Orléans, Metz, and Caen), resulting in a denominator of 21 university hospitals.

thumbnail

Base map and data from OpenStreetMap and OpenStreetMap Foundation. Link to the base layer of the map: https://github.com/mapnik/mapnik . CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g002

Fig 3 shows the history of the implementation of CDWs. A distinction must be made between the first works—in blue—, which systematically precede the regulatory authorization—in green—from the French Commission on Information Technology and Liberties (CNIL).

thumbnail

CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g003

The CDWs have so far been initiated by 1 or 2 people from the hospital world with an academic background in bioinformatics, medical informatics, or statistics. The sustainability of the CDW is accompanied by the construction of a cooperative environment between different actors: Medical Information Department (MID), Information Systems Department (IT), Clinical Research Department (CRD), clinical users, and the support of the management or the Institutional Medical Committee. It is also accompanied by the creation of a team, or entity, dedicated to the maintenance and implementation of the CDW. More recent initiatives, such as those of the HCL (Hospitals of the city of Lyon) or the Grand-Est region, are distinguished by an initial, institutional, and high-level support.

The CDW has a federating potential for the different business departments of the hospital with the active participation of the CRD, the IT Department, and the MID. Although there is always an operational CDW team, the human resources allocated to it vary greatly: from half a full-time equivalent to 80 people for the AP-HP, with a median of 6.0 people. The team systematically includes a coordinating physician. It is multidisciplinary with skills in public health, medical informatics, informatics (web service, database, network, infrastructure), data engineering, and statistics.

Historically, the first CDWs were based on in-house solution development. More recently, private actors are offering their services for the implementation and implementation of CDWs (15/21). These services range from technical expertise in order to build up the data flows and data cleaning up to the delivery of a platform integrating the different stages of data processing.

Management of studies

Before starting, projects are systematically analyzed by a scientific and ethical committee. A local submission and follow-up platform is often mentioned (12/21), but its functional scope is not well defined. It ranges from simple authorization of the project to the automatic provision of data into a Trusted Research Environment (TRE) [ 44 ]. The processes for starting a new project on the CDW are always communicated internally but rarely documented publicly (8/21).

Transparency

Ongoing studies in CDWs are unevenly referenced publicly on hospital websites. Some institutions have comprehensive study portals, while others list only a dozen studies on their public site while mentioning several hundreds ongoing projects during interviews. In total, we found 8 of these portals out of 14 CDWs in production. Uses other than ongoing scientific studies are very rarely documented. The publication of the list of ongoing studies is very heterogeneous and fragmented between several sources: clinicaltrials.gov, the mandatory project portal of the Health Data Hub [ 45 ] or the website of the hospital data warehouse.

Strong dependance to the HIS.

CDW data reflect the HIS used on a daily basis by hospital staff. Stakeholders point out that the quality of CDW data and the amount of work required for rapid and efficient reuse are highly dependent on the source HIS. The possibility of accessing data from an HIS in a structured and standardized format greatly simplifies its integration into the CDW and then its reuse.

Categories of data.

Although the software landscape is varied across the country, the main functionalities of HIS are the same. We can therefore conduct an analysis of the content of the CDWs, according to the main categories of common data present in the HIS.

The common base for all CDWs is constituted by data from the Patient Administrative Management software (patient identification, hospital movements) and the billing codes. Then, data flows are progressively developed from the various softwares that make up the HIS. The goal is to build a homogeneous data schema, linking the sources together, controlled by the CDW team. The prioritization of sources is done through thematic projects, which feed the CDW construction process. These projects improve the understanding of the sources involved, by confronting the CDW team with the quality issues present in the data.

Table 1 presents the different ratio of data categories integrated in French CDWs. Structured biology and texts are almost always integrated (20/21 and 20/21). The texts contain a large amount of information. They constitute unstructured data and are therefore more difficult to use than structured tables. Other integrated sources are the hospital drug circuit (prescriptions and administration, 16/21), Intense Care Unit (ICU, 2/21), or nurse forms (4/21). Imaging is rarely integrated (4/21), notably for reasons of volume. Genomic data are well identified, but never integrated, even though they are sometimes considered important and included in the CDW work program.

thumbnail

https://doi.org/10.1371/journal.pdig.0000298.t001

Data reuse.

Today, the main use put forward for the constitution of CDWs is that of scientific research.

The studies are mainly observational (non-interventional). Fig 4 presents the distribution of the 6 categories defined in Quantitative methods for 231 studies collected on the study portals of 9 hospitals. The studies focus first on population characterization (25%), followed by the development of decision support processes (24%), the study of risk factors (18%), and the treatment effect evaluations (16%).

thumbnail

https://doi.org/10.1371/journal.pdig.0000298.g004

The CDWs are used extensively for internal projects such as student theses (at least in 9/21) and serve as an infrastructure for single-service research: their great interest being the de-siloing of different information systems. For most of the institutions interviewed, there is still a lack of resources and maturity of methods and tools for conducting inter-institutional research (such as in the Grand-Ouest region of France) or via European calls for projects (EHDEN). These 2 research networks are made possible by supra-local governance and a common data schema, respectively, eHop [ 46 ] and OMOP [ 47 ]. The Paris hospitals, thanks to its regional coverage and the choice of OMOP, is also well advanced in multicentric research. At the same time, the Grand-Est region is building a network of CDW based on the model of the Grand-Ouest region, also using eHop.

CDW are used for monitoring and management (16/21).

The CDW have sometimes been initiated to improve and optimize billing coding (4/21). The clinical texts gathered in the same database are queried using keywords to facilitate the structuring of information. The data are then aggregated into indicators, some of which are reported at the national level. The construction of indicators from clinical data can also be used for the administrative management of the institution. Finally, closer to the clinic, some actors state that the CDW could also be used to provide regular and appropriate feedback to healthcare professionals on their practices. This feedback would help to increase the involvement and interest of healthcare professionals in CDW projects. The CDW is sometimes of interest for health monitoring (e.g., during COVID-19) or pharmacovigilance (13/21).

Strong interest for CDW in the context of care (13/21).

Some CDWs develop specific applications that provide new functionalities compared to care software. Search engines can be used to query all the hospital’s data gathered in the CDW, without data compartmentalization between different softwares. Dedicated interfaces can then offer a unified view of the history of a patient’s data, with inter-specialty transversality, which is particularly valuable in internal medicine. These cross-disciplinary search tools also enable healthcare professionals to conduct rapid searches in all the texts, for example, to find similar patients [ 32 ]. Uses for prevention, automation of repetitive tasks, and care coordination are also highlighted. Concrete examples are the automatic sorting of hospital prescriptions by order of complexity or the setting up of specialized channels for primary or secondary prevention.

Technical architecture

The technical architecture of modern CDWs has several layers:

  • Data processing: connection and export of source data, diverse transformation (cleaning, aggregation, filtering, standardization).
  • Data storage: database engines, file storage (on file servers or object storage), indexing engines to optimize certain queries.
  • Data exposure: raw data, APIs, dashboards, development and analysis environments, specific web applications.

Supplementary cross-functional components ensure the efficient and secure operation of the platform: identity and authorization management, activity logging, automated administration of servers and applications.

The analysis environment (Jupyterhub or RStudio datalabs) is a key component of the platform, as it allows data to be processed within the CDW infrastructure. A few CDWs had such operational datalab at the time of our study (6/21) and almost all of them have decided to provide it to researchers. Currently, clinical research teams are still often working on data extractions in less secure environments.

Data quality, standard formats

Quality tools..

Systematic data quality monitoring processes are being built in some CDWs. Often (8/21), scripts are run at regular intervals to detect technical anomalies in data flows. Rare data quality investigation tools, in the form of dashboards, are beginning to be developed internally (3/21). Theoretical reflections are underway on the possibility of automating data consistency checks, for example, demographic or temporal. Some facilities randomly pull records from the EHR to compare them with the information in the CDW.

Standard format.

No single standard data model stands out as being used by all CDWs. All are aware of the existence of the OMOP (research standard) [ 47 ] and HL7 FHIR (communication standard) models [ 48 ]. Several CDWs consider the OMOP model to be a central part of the warehouse, particularly for research purposes (9/21). This tendency has been encouraged by the European call for projects EHDEN, launched by the OHDSI research consortium, the originator of this data model. In the Grand-Ouest region of France, the CDWs use the eHop warehouse software. The latter uses a common data model also named eHop. This model will be extended with the future warehouse network of the Grand Est region also choosing this solution. Including this grouping and the other establishments that have chosen eHop, this model includes 12 establishments out of the 32 university hospitals. This allows eHop adopters to launch ambitious interregional projects. However, eHop does not define a standard nomenclature to be used in its model and is not aligned with emerging international standards.

Documentation.

Half of the CDWs have put in place documentation accessible within the organization on data flows, the meaning and proper use of qualified data (10/21 mentioned). This documentation is used by the team that develops and maintains the warehouse. It is also used by users to understand the transformations performed on the data. However, it is never publicly available. No schema of the data once it has been transformed and prepared for analysis is published.

Principal findings

We give the first overview of the CDWs in university hospitals of France with 32 hospitals reviewed. The implementation of CDW dates from 2011 and accelerated in the late 2020. Today, 24 of the university hospitals have an ongoing CDW project. From this case study, some general considerations can be drawn that should be valuable to all healthcare system implementing CDWs on a national scale.

As the CDW becomes an essential component of data management in the hospital, the creation of an autonomous internal team dedicated to data architecture, process automation, and data documentation should be encouraged [ 44 ]. This multidisciplinary team should develop an excellent knowledge of the data collection process and potential reuses in order to qualify the different flows coming from the source IS, standardize them towards a homogenous schema and harmonize the semantics. It should have a sound knowledge of public health, as well as the technical and statistical skills to develop high-quality software that facilitates data reuse.

The resources specific to the warehouse are rare and often taken from other budgets or from project-based credits. While this is natural for an initial prototyping phase, it does not seem adapted to the perennial and transversal nature of the tool. As a research infrastructure of growing importance, it must have the financial and organizational means to plan for the long term.

The governance of the CDW has multiple layers: local within the university hospital, interregional, and national/international. The first level allow to ensure the quality of data integration as well as the pertinence of data reuse by clinicians themselves. The interregional level is well adapted for resources mutualization and collaboration. Finally, the national and international levels assure coordination, encourage consensus for committing choices such as metadata or interoperability, and provide financial, technical, and regulatory support.

Health technology assessment agencies advocate for public registration of comparative observational study protocols before conducting the analysis [ 8 , 17 , 49 ]. They often refer to clinicaltrials.gov as potential but not ideal registration portal for observational studies. The research community advocates for public registrations of all observational studies [ 50 , 51 ]. More recently, it emphasizes the need for more easy data access and the publication of study code [ 29 , 52 , 53 ]. We embrace these recommendations and we point to the unfortunate duplication of these study reporting systems in France. One source could be favored at the national level and the second one automatically fed from the reference source, by agreeing on common metadata.

From a patient’s perspective, there is currently no way to know if their personal data is included for a specific project. Better patient information about the reuse of their data is needed to build trust over the long term. A strict minimum is the establishment and update of the declarative portals of ongoing studies at each institution.

Data and data usage

When using CDW, the analyst has not defined the data collection process and is generally unaware of the context in which the information is logged. This new dimension of medical research requires a much greater development of data science skills to change the focus from the implementation of the statistical design to the data engineering process. Data reuse requires more effort to prepare the data and document the transformations performed.

The more heterogeneous a HIS system is, the less qualitative would be the CDW built on top of it. There is a need for increasing interoperability, to help EHR vendors interfacing the different hospital softwares, thus facilitating CDW development. One step in this direction would be the open source publication of HIS data schema and vocabularies. At the analysis level, international recommendations insist on the need for common data formats [ 52 , 54 ]. However, there is still a lack of adoption of research standards from hospital CDWs to conduct robust studies across multiple sites. Building open-source tools on top of these standards such as those of OHDSI [ 41 ] could foster their adoption. Finally, in many clinical domains, sufficient sample size is hard to obtain without international data-sharing collaborations. Thus, more incitation is needed to maintain and update the terminology mappings between local nomenclatures and international standards.

Many ongoing studies concern the development of decision support processes whose goal is to save time for healthcare professionals. These are often research projects, not yet integrated into routine care. The analysis of study portals and the interviews revealed that data reuse oriented towards primary care is still rare and rarely supported by appropriate funding. The translation from research to clinical practice takes time and need to be supported on the long run to yield substantial results.

Tools, methods, and data formats of CDW lack harmonization due to the strong technical innovation and the presence of many actors. As suggested by the recent report on the use of data for research in the UK [ 44 ], it would be wise to focus on a small number of model technical platforms.

These platforms should favor open-source solutions to assure transparency by default, foster collaboration and consensus, and avoid technological lock-in of the hospitals.

Data quality and documentation

Quality is not sufficiently considered as a relevant scientific topic itself. However, it is the backbone of all research done within a CDW. In order to improve the quality of the data with respect to research uses, it is necessary to conduct continuous studies dedicated to this topic [ 52 , 54 – 56 ]. These studies should contribute to a reflection on methodologies and standard tools for data quality, such as those developed by the OHDSI research network [ 41 ].

Finally, there is a need for open-source publication of research code to ensure quality retrospective research [ 55 , 57 ]. Recent research in data analysis has shown that innumerable biases can lurk in training data sets [ 58 , 59 ]. Open publication of data schemas is considered an indispensable prerequisite for all data science and artificial intelligence uses [ 58 ]. Inspired by data set cards [ 58 ] and data set publication guides, it would be interesting to define a standard CDW card documenting the main data flows.

Limitations

The interviews were conducted in a semi-structured manner within a limited time frame. As a result, some topics were covered more quickly and only those explicitly mentioned by the participants could be recorded. The uneven existence of study portals introduces a bias in the recording of the types of studies conducted on CDW. Those with a transparency portal already have more maturity in use cases.

For clarity, our results are focused on the perimeter of university hospitals. We have not covered the exhaustive healthcare landscape in France. CDW initiatives also exist in primary care, in smaller hospital groups and in private companies.

Conclusions

The French CDW ecosystem is beginning to take shape, benefiting from an acceleration thanks to national funding, the multiplication of industrial players specializing in health data and the beginning of a supra-national reflection on the European Health Data Space [ 60 ]. However, some points require special attention to ensure that the potential of the CDW translates into patient benefits.

The priority is the creation and perpetuation of multidisciplinary warehouse teams capable of operating the CDW and supporting the various projects. A combination of public health, data engineering, data stewardship, statistics, and IT competences is a prerequisite for the success of the CDW. The team should be the privileged point of contact for data exploitation issues and should collaborate closely with the existing hospital departments.

The constitution of a multilevel collaboration network is another priority. The local level is essential to structure the data and understand its possible uses. Interregional, national, and international coordination would make it possible to create thematic working groups in order to stimulate a dynamic of cooperation and mutualization.

A common data model should be encouraged, with precise metadata allowing to map the integrated data, in order to qualify the uses to be developed today from the CDWs. More broadly, open-source documentation of data flows and transformations performed for quality enhancement would require more incentives to unleash the potential for innovation for all health data reusers.

Finally, the question of expanding the scope of the data beyond the purely hospital domain must be asked. Many risk factors and patient follow-up data are missing from the CDWs, but are crucial for understanding pathologies. Combining city data and hospital data would provide a complete view of patient care.

Supporting information

S1 table. list of interviewed stakeholders with their teams..

https://doi.org/10.1371/journal.pdig.0000298.s001

S2 Table. Interview form.

https://doi.org/10.1371/journal.pdig.0000298.s002

S1 Text. Study data tables.

https://doi.org/10.1371/journal.pdig.0000298.s003

Acknowledgments

We want to thanks all participants and experts interviewed for this study. We also want to thanks other people that proof read the manuscript for external review: Judith Fernandez (HAS), Pierre Liot (HAS), Bastien Guerry (Etalab), Aude-Marie Lalanne Berdouticq (Institut Santé numérique en Société), Albane Miron de L’Espinay (ministère de la Santé et de la Prévention), and Caroline Aguado (ministère de la Santé et de la Prévention). We also thank Gaël Varoquaux for his support and advice.

  • View Article
  • PubMed/NCBI
  • Google Scholar

Data Topics

  • Data Architecture
  • Data Literacy
  • Data Science
  • Data Strategy
  • Data Modeling
  • Governance & Quality
  • Education Resources For Use & Management of Data

Case Study: Cornell University Automates Data Warehouse Infrastructure

Cornell University is a privately endowed research university founded in 1865. Ranked in the top one percent of universities in the world, Cornell is made up of 14 colleges and schools serving roughly 22,000 students. Jeff Christen, data warehousing manager at Cornell University and adjunct faculty in Information Science, and Chris Stewart, VP and general […]

case study for data warehousing

Cornell University is a privately endowed research university founded in 1865. Ranked in the top one percent of universities in the world, Cornell is made up of 14 colleges and schools serving roughly 22,000 students.

case study for data warehousing

The Primary Issue

Cornell was using Cognos Data Manager to transform and merge data into an Oracle Data Warehouse. IBM purchased Data Manager and decided to end support for the product. “Unfortunately, we had millions of lines of code written in Data Manager, so we had to shop around for a replacement,” said Christen. He looked at it as an opportunity to add new functionality so that their data warehouse ran more efficiently.

The Assessment

Christen’s IT team had to confine processing to hours when the university was closed, so batch processing from financial, PeopleSoft, or student records couldn’t start warehouse processing until the end of normal operations and had to be completely finished by 8:00 a.m. when staff arrived as they needed access to the warehouse.

“It was getting really close. We were frequently bumping into that time,” said Christen. Because their processing window was so short, errors and issues could be very disruptive.

“Our old tool would just log it if there was an issue, but then we couldn’t load the warehouse, because some network glitch that probably took seconds was enough to take out our nightly ETL processing,” elaborated Christen.

Outdated documentation was also a problem. Stewart said that they joke with their customers about documenting a data warehouse. “There are two types of documentation: nonexistent and wrong. People laugh, but nobody ever argues that point because it’s the thing that people don’t like to do, so it rarely gets done,” said Stewart.

Because it is an academic institution, licensing and staffing costs were important factors for Cornell. Stewart often sees this in government and in higher education organizations where the administration has increasing data needs, yet the pool of available people is small, like Christen’s staff of four.

Stewart said that automation can lift much of that workload so staff can get more accomplished in a shorter amount of time. “You can’t just go out and add two more people. If you have more work, you need to get more out of your existing staff,” said Stewart.

Finding a Solution

Christen started to shop around for ETL tools , with an eye to adding some improvements. There were several key areas he focused on when evaluating vendors: documentation, licensing costs, improving performance and being able to work within existing staffing levels. In 2014, Christen attended the Higher Education Data Warehousing conference to research options.

WhereScape was one of the exhibitors at the conference and one of the features that caught his attention was its approach to documentation. “Our customers were used to having outdated and incomplete documentation, and that was something WhereScape definitely had a handle on,” he said.

Most of the products Cornell considered required licensing by CPU, which could prove cost-prohibitive as Cornell’s extensive data warehouse environment was scaled for end-user query performance.

“We have a ton of CPUs,” Christen said. CPU-based licensing costs would be significant, and they found themselves trying to figure out how to re-architect the entire system to reduce the CPU footprint enough so that the licensing could work, a process that would create other limitations. WhereScape’s license model is a developer seat license, so with four full-time warehouse developers, they only needed to purchase four named user licenses.

“There’s no separate license for the CPU run-time environment with WhereScape, so if we’re successful, we’ll get everything converted, but there’s no penalty for how we configure the warehouse for end-user performance or query performance,” Christen said.

Being able to integrate and use the product without increasing the number of developers was a clear advantage. “That’s has been a key driver for organizations evaluating automation for their teams,” Stewart added.

Cornell didn’t just rely on marketing material to make their decision. They did an on-site proof of concept where one of their developers worked with the product on a portion of their primary general ledger model. They discovered that WhereScape was intuitive enough that one of their ETL developers was able to code a parallel environment in the proof of concept with minimal assistance from WhereScape. The developer hadn’t gone through any formal training, which proved that the learning curve would be manageable. \

The proof of concept allowed them to get a nearly apples-to-apples comparison, which showed “huge improvements” in load time performance compared to Data Manager. “So, it was a robust enough tool, but also intuitive enough that it could be mastered in a few weeks,” said Christen.

About WhereScape

WhereScape helps IT organizations of all sizes leverage automation to design, develop, deploy and operate data infrastructure faster.

“We realized long ago that there were patterns in data warehousing that really transcend any industry vertical or any size of company,” said Stewart.

Because the process of building a data warehouse out is primarily mechanical, and much of that is common among data warehousing organizations, WhereScape automates both the design and modeling of the data warehouse, all the way through to the physical build.

“Even deployments, as you’re moving a project from development to quality assurance environment (QA), and then on to production, we’re scripting all that out as well,” said Stewart. These are all processes companies usually use multiple tools to address – a resource-heavy process that can create a silo for each tool.

“We have one tool suite that covers data warehousing end-to-end and it’s just one set of tools to learn,” said Stewart. Instead of licensing separate tools to for each part of building a data warehouse, then finding a place to install all those tools, and spending weeks for staff training and management – teams have just one tool to learn and use. Handing off the build to WhereScape’s automated process frees up time and energy so that the business can take advantage of that data and produce useful analytics.

The initial wins of the conversion from their traditional ETL tool to WhereScape allowed Cornell to cut their nightly refresh times in half, or better, in some cases. Although they didn’t start that way, they are now a 100 percent WhereScape solution, with 100 percent Amazon-hosting as well.

“We did a major conversion which took a few years to get to WhereScape from our old tool, but that’s behind us. We’re running WhereScape on Amazon Web Services in their Oracle RDS service,” said Christen.

Although they just finished this conversion in the last year, since 2014 when they purchased WhereScape, all new developments and enhancements have been done in WhereScape.

“There’s actually an option to fix the problem, restart it, and still complete before business hours, which is a big win for our customers,” said Christen. “Essentially, we’ve cut our refresh times in half, so not only can the team complete all the processing they need with their batch windows, we’re not brushing up against business hours anymore.”

By automatically generating documentation, WhereScape solved the problem of outdated and incomplete documentation.

What’s Next?

To take full advantage of the automated documentation process, Cornell decided to build in some new subject areas, but the speed of the tool outstripped their internal modified waterfall approval process. Christen believes they can speed up their process now that they can quickly put out a prototype. They can start receiving feedback immediately from customers within days rather than weeks, and from there, refine the model until they’re ready for production.

“So, it’s changing our practices now that we have some new abilities with WhereScape,” said Christen. One of the next steps is to more fully leverage and market the documentation so they can start providing their customers with more information about the attributes that are available in the warehouse.

An unexpected benefit is that Christen’s Business Intelligence Systems students get to use WhereScape to learn Dimensional Data Modeling, ETL concepts, and Data Visualization hands-on with real datasets.

“We’re teaching the concepts of automation so they learn the hard way, with SQL statements, and then we use WhereScape and they can see how quickly they can create these structures to build out real dimensional model data warehouses,” explained Christen.

Stewart noted that they’ve had inquiries from other universities that have heard about Christen’s use of WhereScape in the classroom and are interested in incorporating WhereScape into their curriculum, so the students can get more work done in a semester.

“It’s a similar benefit to what our customers are receiving in their ‘real-world’ application of automation, and it is giving students the chance to understand the full data warehousing lifecycle,” said Stewart.

Image used under license from Shutterstock.com

Leave a Reply Cancel reply

You must be logged in to post a comment.

  • Bahasa Indonesia
  • Sign out of AWS Builder ID
  • AWS Management Console
  • Account Settings
  • Billing & Cost Management
  • Security Credentials
  • AWS Personal Health Dashboard
  • Support Center
  • Expert Help
  • Knowledge Center
  • AWS Support Overview
  • AWS re:Post

Customer Stories / Consumer Packaged Goods (CPG)

Company Logo

Coca-Cola Andina Builds Data Lake on AWS, Increases Analytics Productivity by 80% for More Data-Driven Decision-Making

Overview | Opportunity | Solution | Outcome | AWS Services Used

Increased analytics team productivity by 80%  

Unified 95% of the data from different areas of business in a single data lake

Adopted advanced capabilities

Adopted artificial intelligence, machine learning, and other advanced capabilities

Reliable, unified data

Enabled to make decisions based on reliable, unified data

Streamlined business practices

Increasing profitability

Coca-Cola Andina  has the vision of promoting the profitable growth of its business, supporting its customers, and guaranteeing its more than 54 million consumers in Chile, Argentina, Brazil, and Paraguay the best possible experience. To achieve this, it develops world-class processes to increase its productivity and quality of service. One of the initiatives adopted to rise to this challenge was the development of a data lake on Amazon Web Services (AWS). By adopting storage, databases, computing, and analytical capabilities backed by AWS technology, Coca-Cola Andina managed to increase the productivity of the analytics team by 80 percent, allowing both the company itself and its customers to make decisions based on reliable data, promoting joint growth of the entire ecosystem, maintaining its competitive advantage, and increasing the company's revenue.

Coca-Cola employee

Opportunity  

Coca-Cola Andina produces and distributes products licensed by The Coca-Cola Company within South America. The company has 17,500 employees and a presence in part of Chile, Argentina, Brazil, and the whole of Paraguay, serving more than 267,000 customers and refreshing 54 million consumers. “We understand that Coca-Cola Andina's vision goes beyond obtaining profitability, and that the benefits we generate must reach all of society, both for current and future generations. We are sure that, through innovation and incorporation of new capabilities, such as data lakes and analytics, we will achieve sustainable growth for the benefit of our customers, consumers, and the communities where we operate," says Miguel Angel Peirano, executive vice president of Coca-Cola Andina. As a consumer packaged goods (CPG) company, Coca-Cola Andina has a direct relationship with customers and consumers. "Our customers are our company’s partners, since they are a fundamental part of the distribution and sales chain. That is why we want them to grow with us—for them to have the necessary stock and to offer good service to consumers", explains Luis Valderrama, regional CTO of Coca-Cola Andina. In fact, the CPG industry generates massive volumes of data, often stored in different systems which are cut off from one other, making it difficult to analyze the information. Coca-Cola Andina uses SAP as a transactional core with data on customers, sales, products, etc. It also has RPA systems and B2B solutions with which it can engage with these customers. To ensure a more personalized and social experience, the company uses its CRM and smartphone applications, as well as other methods, to interact with consumers. Both cases have a common concept: data. The company brings the data closer to its partners to enable its teams and partners to make decisions based on it. "However, having data in different systems or traditional data warehouses made it very complex," says Valderrama. Coca-Cola Andina’s challenge was to collect all relevant information on the company, customers, logistics, coverage, and assets together within a single accurate source. This led the company to decide to build a data lake.

Coca-Cola Andina wanted an architecture that was easy to access, with reliable data and no limits on storage, response, or processing capacity. "This was the layer that would allow us to unite our traditional world with the digital world, in addition to making it possible to bring cognitive technologies, such as artificial vision, machine learning, natural language processing, voice processing, robotics, etc. to the business," Valderrama says.  The company chose Amazon Web Services (AWS) as the provider of all the technology and architecture for its data lake. "AWS was the cloud solution that would meet all the expectations defined for our data lake," says Valderrama, adding that the architecture needed to include a platform as a service (PaaS) to allow solutions to be developed and dismantled quickly and economically, and that he was happy with the decision because the company has a culture of learning by doing.

kr_quotemark

We are sure that, through innovation and incorporation of new capabilities, such as data lakes and analytics, we will achieve sustainable growth for the benefit of our customers, consumers, and the communities where we operate."

Miguel Angel Peirano Executive Vice President, Coca-Cola Andina

Solution | Architecture & Services

The data lake became the single source of data generated by SAP ERP, CSV files, and legacy databases. Coca-Cola Andina was able to implement a technical architecture that covers the whole spectrum from data entry to exploitation, through analytics and machine learning tools. The data lake uses Amazon Simple Storage Service (Amazon S3) to securely store its raw data for analytics, machine learning, and other applications.  It also uses services such as Amazon QuickSight and Amazon Athena in the consumer layer; cognitive technologies, such as Amazon Personalize and Amazon SageMaker , for machine learning; AWS Lambda for serverless compute; Amazon DynamoDB as a key-value and document database; and Amazon Redshift to create data warehouses when necessary. "The architecture we built on AWS fulfills the expectation of having a data lake based on a PaaS," says Valderrama. To ensure the best processes, and the best use and integration of these solutions, the company had the support of the AWS Professional Services team. “During 2020, Coca-Cola Andina worked hard to incorporate data lake and analytics knowledge shared by AWS Professional Services, managing to generate the tools and capabilities to become a “data-driven decision” company, focus on improving the experience and relationship with its consumers and customers, and generae productivity and efficiency in its processes,” says Nicolás Nazario Condado, digital transformation manager for Coca-Cola Andina.  Additionally, as part of the overall digital strategy, Coca-Cola Andina implemented a wider cloud infrastructure on AWS, in addition to the data lake, and began developing other digital products and solutions to address various strategic verticals for customers, consumers, and internal processes.

Outcome  

Coca-Cola Andina has created a multidisciplinary team with partners from the business and technology world to combine knowledge and had training provided by AWS Professional Services.  With the new cloud structure and more than 300 hours of training provided by AWS, Coca-Cola Andina acquired the necessary capabilities to become a data-driven decision company, increasing productivity and efficiency in decision-making across the different areas of the business. In fact, the cloud infrastructure allowed Coca-Cola Andina to improve and implement new products and services, customizing the different value propositions for its more than 260,000 customers. This led to an increase in the company’s revenue by improving the efficiency of promotions, reducing stock shortages—and thus improving the shopping experience of its customers, and increasing the productivity of the analysis team by 80 percent. Coca-Cola Andina managed to ingest more than 95 percent of the data from its different areas of interest, which allows it to build excellence reports in just a few minutes and implement advanced analytics. With all the resources and functionality that the data lake enables, Coca-Cola Andina ensures its partners and customers have access to reliable information for making strategic decisions for the business. In this way, the company united the traditional world with the digital world, allowing teams and partners to make decisions based on data.

Coca-Cola Andina plans to develop new applications and solutions on its AWS infrastructure. These include self-management applications, dynamic pricing strategies, and machine learning models, among others.

Future Plan

About Coca-Cola Andina

Coca Cola Andina  is a Chilean company with more than 17,500 employees, licensed to produce and market The Coca-Cola Company's products in parts of Argentina, Brazil, and Chile and all of Paraguay. It is one of the leading bottlers in Latin America and one of the seven largest Coca-Cola bottlers in the world.

AWS Services Used

Amazon athena.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Learn more »

Amazon Lambda

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes.

Amazon SageMaker

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

Amazon Professional Services

The AWS Professional Services organization is a global team of experts that can help you realize your desired business outcomes when using the AWS Cloud.

Explore Coca-Cola's journey of innovation using AWS

Learn more »

More Coca-Cola Stories

no items found 

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.

deprecated-browser pixel tag

Ending Support for Internet Explorer

Advertisement

Advertisement

Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution

  • Transactional Processing Systems
  • Published: 19 February 2018
  • Volume 42 , article number  59 , ( 2018 )

Cite this article

case study for data warehousing

  • Abderrazak Sebaa   ORCID: orcid.org/0000-0002-8742-1240 1 ,
  • Fatima Chikh 2 ,
  • Amina Nouicer 1 &
  • AbdelKamel Tari 1  

3286 Accesses

38 Citations

1 Altmetric

Explore all metrics

The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data. However, without an efficient system design and architecture, these performances will not be significant and valuable for medical managers. In this paper, we provide a short review of the literature about research issues of traditional data warehouses and we present some important Hadoop-based data warehouses. In addition, a Hadoop-based architecture and a conceptual data model for designing medical Big Data warehouse are given. In our case study, we provide implementation detail of big data warehouse based on the proposed architecture and data model in the Apache Hadoop platform to ensure an optimal allocation of health resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA) Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

case study for data warehousing

An Overview of Big Data Architectures in Healthcare

Design and development of a medical big data processing system based on hadoop.

Qin Yao, Yu Tian, … Jing-Song Li

case study for data warehousing

Artificial Intelligence Medical Construction and Data Mining Based on Cloud Computing Technology

Kuo, M.H., Sahama, T., Kushniruk, A.W., Borycki, E.M., and Grunwell, D.K., Health big data analytics: Current perspectives, challenges and potential solutions. Int. J. Big Data Intell. 1(1–2):114–126, 2014. https://doi.org/10.1504/IJBDI.2014.063835 .

Article   Google Scholar  

Cuzzocrea, A., Warehousing and Protecting Big Data: State-Of-The-Art-Analysis, Methodologies, Future Challenges. In Proceedings of the International Conference on Internet of things and Cloud Computing (p. 14). ACM, 2016. https://doi.org/10.1145/2896387.2900335

White, T., Hadoop: The definitive guide (third edition). O’Reilly, 2012. ISBN: 978-1-449-322252-0.

Sumathi, S., and Esakkirajan, S., Fundamentals of relational database management systems (Vol. 47). Springer, 2007. ISBN: 978 3 540 48397 7.

Ewen, E.F., Medsker, C.E., and Dusterhoft, L.E., Data warehousing in an integrated health system: building the business case. In Proceedings of the 1st ACM international workshop on Data warehousing and OLAP (pp. 47–53). ACM, 1998. https://doi.org/10.1145/294260.294271

Pedersen, T.B., and Jensen, C.S., Research issues in clinical data warehousing. In Scientific and Statistical Database Management. Proceedings. Tenth international conference on (pp. 43–52). IEEE, 1998. https://doi.org/10.1109/SSDM.1998.688110

Guérin, E., Moussouni, F., Courselaud, B., and Loréal, O., UML modeling of Gedaw: A gene expression data warehouse specialised in the liver. In The 3rd French bioinformatics conference proceeding: JOBIM 2002 (pp. 319–334), Saint-Malo, France, 2002.

Banek, M., Tjoa, A.M., and Stolba, N., Integrating different grain levels in a medical data warehouse federation. In International Conference on Data Warehousing and Knowledge Discovery (pp. 185–194). Springer Berlin Heidelberg, 2006. https://doi.org/10.1007/11823728_18

Kerkri, E.M., Quantin, C., Allaert, F.A., Cottin, Y., Charve, P., Jouanot, F., and Yétongnon, K., An approach for integrating heterogeneous information sources in a medical data warehouse. J. Med. Syst. 25(3):167–176, 2001. https://doi.org/10.1023/A:1010728915998 .

Article   CAS   PubMed   Google Scholar  

Pavalam, S.M., Jawahar, M., and Akorli, F.K., Data warehouse based Architecture for Electronic Health Records for Rwanda. In Education and Management Technology (ICEMT) International Conference on (pp. 253–255). IEEE, 2010. https://doi.org/10.1109/ICEMT.2010.5657660

Sebaa, A., Nouicer, A., Tari, A., Ramtani, T., and Ouhab, A., Decision support system for health care resources allocation. Electron. Physician . 9(6):4661–4668, 2017. https://doi.org/10.19082/4661 .

Article   PubMed   PubMed Central   Google Scholar  

Sebaa, A., Nouicer, A., Tari, A., Ramtani, T., and Ouhab, A., Decision support system for Health Care Resources allocation. Abstracts Book of ICHSMT’16- International Conference on Health Sciences and Medical Technologies; 2016 Sep 27-29; Tlemcen, Algeria. Mehr publishing. p. 8, 2016. ISBN: 978-600-96661-0-2.

Sebaa, A., Tari, A., Ramtani, T., and Ouhab, A., DW RHSB: A framework for optimal allocation of health resources. Int. J. Comput. Sci. Commun Inf. Technol . 2(1):12–17, 2015.

Google Scholar  

Wang, L., and Alexander, C.A., Big data in medical applications and health care. Am. Med. J. 6(1):1, 2015. https://doi.org/10.3844/amjsp.2015.1.8 .

Cuzzocrea, A., Song, I.Y., and Davis, K.C., Analytics over large-scale multidimensional data: the big data revolution. In Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP. pp. 101–104. ACM, 2011. https://doi.org/10.1145/2064676.2064695

Sebaa, A., Nouicer, N., Chikh, F., and Tari, A., Big Data Technologies to Improve Medical Data Warehousing. In Proceedings of 2nd international conference on Big Data, Cloud and Applications. ACM, 2017. https://doi.org/10.1145/3090354.3090376

Yao, Q., Tian, Y., Li, P.F., Tian, L.L., Qian, Y.M., and Li, J.S., Design and development of a medical big data processing system based on Hadoop. J. Med. Syst. 39(3):23, 2015. https://doi.org/10.1007/s10916-015-0220-8 .

Article   PubMed   Google Scholar  

Istephan, S., and Siadat, M.R., Unstructured medical image query using big data–an epilepsy case study. J. Biomed. Inform. 59:218–226, 2016. https://doi.org/10.1016/j.jbi.2015.12.005 .

Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., and Saltz, J., Hadoop GIS: a high performance spatial data warehousing system over Map-Reduce. VLDB Endowment . 6(11):1009–1020, 2013. https://doi.org/10.14778/2536222.2536227 .

Saravanakumar, N.M., Eswari, T., Sampath, P., and Lavanya, S., Predictive methodology for diabetic data analysis in big data. In 2nd ISBCC. Procedia Computer Science . 50:203–208, 2015. https://doi.org/10.1016/j.procs.2015.04.069 .

Rodger, J.A., Discovery of medical big data analytics: Improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid Hadoop hive. Informatics in Medicine Unlocked . 1:17–26, 2015. https://doi.org/10.1016/j.imu.2016.01.002 .

Sundvall, E., Wei-Kleiner, F., Freire, S.M., and Lambrix, P., Querying archetype-based electronic health records using Hadoop and Dewey encoding of openEHR models. Stud. Health Technol. Inform. 235:406, 2017. https://doi.org/10.3233/978-1-61499-753-5-406 .

PubMed   Google Scholar  

Raja, P.V., and Sivasankar, E., Modern Framework for Distributed Healthcare Data Analytics Based on Hadoop. In Information and Communication Technology-EurAsia Conference (pp. 348–355). Springer Berlin Heidelberg, 2014. https://doi.org/10.1007/978-3-642-55032-4_34

Yang, C.T., Liu, J.C., Chen, S.T., and Lu, H.W., Implementation of a big data accessing and processing platform for medical records in cloud. J. Med. Syst. 41(10):149, 2017. https://doi.org/10.1007/s10916-017-0777-5 .

Sebaa, A., Chick, F., Nouicer, A., and Tari, A., Research in big data warehousing using Hadoop. J. Inform. Syst. Eng. Manag. 2(2), 2017. https://doi.org/10.20897/jisem.201710 .

Dean, J., and Ghemawat, S., MapReduce: A flexible data processing tool. CACM . 53(1):72–77, 2010. https://doi.org/10.1145/1629175.1629198 .

Wu, S., Li, F., Mehrotra, S., and Ooi, B.C., Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing (p. 12). ACM, 2011. https://doi.org/10.1145/2038916.2038928

Apache Hadoop: http://hadoop.apache.org/ , Viewed in 02/2015.

Taylor, R.C., An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC bioinform. 11(12):S1, 2010. https://doi.org/10.1186/1471-2105-11-S12-S1 .

Apache Hive: https://hive.apache.org/ , Viewed in 02/2015.

Liu, X., Thomsen, C., and Pedersen, T.B., ETLMR: a highly scalable dimensional ETL framework based on mapreduce. In Transactions on Large-Scale Data-and Knowledge-Centered Systems VIII (pp. 1–31). Springer Berlin Heidelberg, 2013. https://doi.org/10.1007/978-3-642-37574-3_1

Gao, S., Li, L., Li, W., Janowicz, K., and Zhang, Y., Constructing gazetteers from volunteered big geo-data based on Hadoop. Comput. Environ. Urban . Syst. 61:172–186, 2017. https://doi.org/10.1016/j.compenvurbsys.2014.02.004 .

Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., et al., Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endowment . 2(2):1626–1629, 2009. https://doi.org/10.14778/1687553.1687609 .

Ross, J., The use of economic evaluation in health care: Australian decision makers' perceptions. Health Policy . 31(2):103–110, 1995. https://doi.org/10.1016/0168-8510(94)00671-7 .

ANDI: National Agency for Investment Development of Algeria, http://www.andi.dz/index.php/en/secteur-de-sante , Viewed in 02/2015.

Download references

Acknowledgements

This work was partially supported by the Ministry of Higher Education and Scientific Research of Algeria and the University of Bejaia, under the project CNEPRU (Ref. B*00620140066/2015-2018).

Author information

Authors and affiliations.

LIMED Laboratory, Faculty of Exact Sciences, University of Bejaia, Bejaia, Algeria

Abderrazak Sebaa, Amina Nouicer & AbdelKamel Tari

Department of Computer Science, Faculty of Exact Sciences, University of Bejaia, Bejaia, Algeria

Fatima Chikh

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Abderrazak Sebaa .

Ethics declarations

Conflict of interest.

Authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

This article is part of the Topical Collection on Transactional Processing Systems

Rights and permissions

Reprints and permissions

About this article

Sebaa, A., Chikh, F., Nouicer, A. et al. Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution. J Med Syst 42 , 59 (2018). https://doi.org/10.1007/s10916-018-0894-9

Download citation

Received : 23 September 2016

Accepted : 08 January 2018

Published : 19 February 2018

DOI : https://doi.org/10.1007/s10916-018-0894-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data warehouse
  • Decision support
  • Medical resources allocation
  • Find a journal
  • Publish with us
  • Track your research
  • Case studies

log in help

  • Pages & Files

Data Warehousing tags changed

  • 214.617.1389

case study for data warehousing

Data Management

  • Cloud Data Management Solutions & Consulting
  • Enterprise Cloud Data Management Strategy
  • Big Data Consulting
  • Data Lake Consulting & Development
  • Data Cleansing Services
  • Data Integration Services
  • Master Data Management (MDM) Consulting

BI & Analytics

  • Data Analytics Consulting
  • Business Intelligence Consulting Services
  • Data Visualization
  • Enterprise Business Intelligence (EBI) Consulting Services
  • Cloud Business Intelligence Consultants
  • On-Site BI Architect Session
  • Business Intelligence Audit

Custom Development Services

  • Cloud Data Management
  • Data Warehouse Modeling
  • Enterprise Data Strategy
  • Data Lake Consulting
  • Data Cleansing
  • Integration Services
  • Master Data Management Consulting
  • Customer Data Platforms Solutions

Support & Training

  • AWS Solutions
  • Azure Solutions
  • Power BI Datamarts
  • SAP Business Objects
  • Case Studies
  • Learning Center

Creating an Enterprise Data Warehouse for a World Leader in the Fluid Motions Industry

Creating An Enterprise Data Warehouse For A World Leader In The Fluid Motions Industry At Wci Consulting

Standardizing 42 Disparate ERP Instances and Establishing an  Enterprise Data Warehouse

“Comments have been made about how we haven’t uncovered any issues in the project thus far. That’s unusual for large software related initiatives. That speaks to the upfront architecture and planning and the subsequent execution and partnership from WCI.” 

– VP of IT, Fluid Motions Company

The company (a world leader in manufacturing industrial pumps, valves, actuators, controls, seals and managed services in industries such as power, gas, chemical, water and others) needed help extracting data from several disparate ERP systems. They wanted to extract data from 42 ERP instances, standardize flat files, and get all of the information into one  data warehouse . What complicated the situation even more is that the ERP systems were from all sorts of different vendors (Oracle, SAP, BAAN, Microsoft, PRMS). Another thing that was very important was to have a core set of metrics and a central dashboard that combined all of the information from the various locations around the world.

The project was a result of a surge in demand for enterprise data from the executive level. The company knew that it needed a central repository for information that encompassed all the data from its locations around the world. The requests often came from the top down and when data administrators needed to find the pertinent information, the logistical implications of coordinating the data extraction efforts resulted in a very time-intensive effort. The company knew it needed to standardize data accessibility.

With this increase in demand, along with a desire for consistent analytics, the company decided to search out a partner that could advise and guide their internal team. They also wanted to invest in an Enterprise Data Warehouse with someone who had knowledge of how to consolidate disparate ERP systems.  Due to WCI’s experience with large-scale projects of this nature as well as their delivery and strategy expertise within data warehousing, WCI was chosen for this project.

“We chose WCI to extract and map data for our company because they’ve been around for a while and have a good reputation. They know technology and the business. They speak our language and engage well with ERP system owners.”

– VP of IT, Fluid Motions Company

Planning & Strategy WCI Consulting knew that there was a big project ahead to achieve all of the goals the company had in mind so several phases were planned out over the course of 2 years. WCI architected a roadmap that would take ERP data from 8 main databases and put it into the Enterprise Data Warehouse. This entailed integrating the 5 Oracle ERP instances with the 3 SAP ERPs. Rapid Marts were also implemented in the Oracle ERP systems to improve the flow of the project.

Creating a Team This was a large undertaking so, along with WCI’s resources, WCI helped the client add their own capable resources to this project. This was done in order to build firsthand knowledge within the company and ensure long-term success.

“In the last year, we’ve settled down and established a good team with the help of WCI. 3 years ago, we did not even have a BI team in the company.”

Standard Data Definition Templates One of the main hurdles was that there was no standardization of fields or data definitions across the ERP systems. To fix this issue WCI developed a data services tool to reach into the backend of the database and bring the data forth in a way that could be used. The company now knows what fields to go after and how to establish them each time a new ERP instance is encountered. These data definition templates have been the cornerstone of this project and have completely re-hauled the way the client’s data is treated.

“Now my team completely understands how to pull a new ERP instance with the help of these templates.”

By early 2015 the data from the Tier 1 ERP systems will be ready to be consumed by the business. The company will have 1 common and consistent way to obtain key metrics.

The long-term effect of the project, once completed, will be the ease of flow when it comes to obtaining information. What once was a long and incongruent process to get relevant information at an aggregated level will now be streamlined as all of the pertinent data will be stored in one central data warehouse with one team that controls it.

“WCI has done such good work for us. Their data and business intelligence knowledge is extensive and their integrity and ethics just speak volumes. They’re just very easy and excellent to work with.”

Download PDF Version

“All data demands will go through one common process and team. It will be a streamlined process versus a fire drill that involves multiple teams. With a quick turnaround, executives can get a thorough level of data from the entire enterprise. Access to key metrics will be as easy as a few clicks.”

See what more WCI clients have to say about our services .

Related Posts

Mediant Health Resources Case Study At Wci Consulting

Mediant Health Resources Case Study

The Client Mediant Health Resources specializes in providing IT subject matter solutions. Mediant's focus is on driving value through data…

Creating A Bi Strategy For An Emergency Healthcare Company At Wci Consulting

Creating a BI Strategy for An Emergency Healthcare Company

Amalgamating Disparate Systems and Creating a Long-Term BI Strategy for an Emergency Healthcare Company "We were very impressed with the…

Get Started

Want a  FREE  on-site discovery session with your team?  Receive a  FREE data management evaluation session with one of our veteran data architects. Reap the benefits of an expert’s outlook on taking control of your business insights and data. Stop missing out on data-driven opportunities,  and start making smarter, more profitable decisions today.

case study for data warehousing

case study for data warehousing

  • Free Case Studies
  • Business Essays

Write My Case Study

Buy Case Study

Case Study Help

  • Case Study For Sale
  • Case Study Service
  • Hire Writer

Case Study on Data Warehousing

Data warehousing case study:.

Data warehouse is the data base, which is created for the reporting and business analysis for the decision making in the organization. The information, which comes into the data base as a rule is available only for reading. The data is copied into the system in such a way to avoid any problematic situations with the analysis of the information and not to disturb the stability of the data base. As a rule, the information is put into the data base periodically, that is why its relevance can lag behind the online transaction processing system. There are several principles of the organization of the data warehouse. First of all, the data is united according to the field or themes they are describing.

Then, the data is united in such a way to satisfy the requirements of the organization in the whole, but not a certain single function of business.The data which comes into the warehouse can not be corrected. It comes from the outer sources and can not be changed or deleted. The information which is kept in the data warehouse can remain relevant only if it is bounded to the certain period of time. There are several sources of data: the traditional systems of the registration of operations; the definite documents; sets of data, etc. There are such operations, which can be practised in the warehouse: extraction of data, processing of data (preparation of the data for the storage), downloading, analysis and presentation of the results of the analysis.

We Will Write a Custom Case Study Specifically For You For Only $13.90/page!

Writing a data warehousing case study is quite a difficult process, because one should improve his knowledge on the whole topic and only than he will manage to complete a good case study on the definite problem. In order to research the problem which occurred in the case a student should collect information about the case site. When there is background information, a student will be able to analyze the reason of the problem and weigh its consequences. One should demonstrate his professional and critical thinking skills and brainstorm the most effective solutions to the problem. The solutions can be alternative to the ones, suggested in the case study.

Case study writing is quite a problematic process, because there is a list of the special requirements which should be met in the successful paper. It is obvious that an inexperienced student will never succeed in the writing process, so he should take advantage of the Internet and read a free sample case study on data warehousing there. Nearly every high-quality free sample case study on data warehousing and data mining is prepared by the professional writer online, so it is wise to improve one’s knowledge with this assistance.

Related posts:

  • Data Warehousing in the Cloud
  • Case Study on Data Analysis
  • Case Study on Data Mining
  • Data Motors Case Study
  • Case Study on Data Communication and Networking
  • Mcvh Case Study Data Governance
  • Case study: Jaeger uses data mining to reduce losses

' src=

Quick Links

Privacy Policy

Terms and Conditions

Testimonials

Our Services

Case Study Writing Service

Case Studies For Sale

Our Company

Welcome to the world of case studies that can bring you high grades! Here, at ACaseStudy.com, we deliver professionally written papers, and the best grades for you from your professors are guaranteed!

[email protected] 804-506-0782 350 5th Ave, New York, NY 10118, USA

Acasestudy.com © 2007-2019 All rights reserved.

case study for data warehousing

Hi! I'm Anna

Would you like to get a custom case study? How about receiving a customized one?

Haven't Found The Case Study You Want?

For Only $13.90/page

  • Technologies
  • Microsoft Azure
  • Google Cloud
  • Azure Cloud
  • Records Management
  • Warehousing and Storage
  • Schedule a Call

Top 10 Data Warehouse Challenges and Solutions

December 13, 2023

In the ever-expanding realm of data engineering services , the importance of data warehouses cannot be overstated. As organizations strive to harness the power of their data for strategic decision-making, data warehouses serve as the backbone, housing and managing vast volumes of information. However, with great capabilities come great challenges. This comprehensive guide will explore strategic data warehouse problems and solutions to the top 10 data warehouse challenges businesses face today. Focusing on the critical aspect of data quality governance, we’ll navigate through the complexities of data warehousing, catering to the personas of higher management, chief people officers, managing directors, and country managers.

Role of Data Warehousing

Data warehousing is pivotal in modern data management, serving as a centralized repository that consolidates and transforms data from diverse sources. Its primary function is to support informed decision-making by providing a unified view of organizational data. This facilitates efficient historical data analysis, enhances query performance, and supports strategic planning and reporting. 

Furthermore, data warehouses contribute to operational efficiency by alleviating the burden on transactional databases, ensuring a seamless balance between day-to-day operations and analytical processes. Data warehousing serves as a cornerstone for data quality governance, offering a standardized environment for implementing checks, validations, and governance frameworks to maintain high data quality standards aligned with organizational objectives.

Data Warehouse Challenges and Solutions

Data quality concerns.

According to Gartner, poor data quality is a common issue for organizations, with the research firm estimating that the average financial impact of poor data quality on businesses is $15 million per year .

Data quality is the bedrock of any successful data warehouse strategy. Inaccurate or inconsistent data undermines the integrity of analyses and decision-making processes. Poor data quality can lead to misguided insights, eroding stakeholders’ trust in the data warehouse.

Organizations must institute robust data quality governance practices to address data quality concerns. Regular data profiling, cleansing, and validation processes should be implemented to maintain high data accuracy and reliability. By establishing clear data quality standards, organizations can ensure the data within the warehouse is a trustworthy foundation for decision-making.

Scalability Issues

The global cloud-based data warehousing market is expected to grow at a CAGR of over 22.3% from 2020 to 2025, indicating a significant shift towards scalable cloud data warehousing solutions.

As data volumes grow exponentially, traditional on-premise data warehouses may struggle to scale effectively. This can result in performance bottlenecks, delays in data processing, and increased costs associated with hardware upgrades.

Cloud-based data warehousing solutions offer a scalable alternative. Leveraging the elasticity of cloud infrastructure, organizations can seamlessly scale their data warehouses based on demand. This addresses immediate scalability concerns and provides a cost-effective solution, allowing businesses to pay only for the resources they consume.

Integration Complexities

According to a survey, 94% of IT decision-makers reported that they faced data integration challenges, highlighting the prevalence of this issue.

The modern data landscape is diverse, with data coming from various sources in various formats and structures. Integrating this disparate data seamlessly into a data warehouse can be complex and time-consuming.

Implementing data integration maze tools and middleware becomes crucial in overcoming integration complexities. These tools facilitate the extraction, transformation, and loading (ETL) processes, ensuring that data from different sources is harmonized and compatible within the data warehouse. This streamlining of integration processes enhances overall efficiency and accuracy.

Data Security and Privacy

IBM’s Cost of Data Breach Report estimates the average total cost of a data breach to be $3.86 million, a 15% increase over 3 years , underscoring the financial impact of inadequate data security.

Ensuring the security and privacy of sensitive data within the data warehouse is paramount. Unauthorized access, data breaches, or non-compliance with data protection regulations pose significant risks.

The solution lies in implementing robust security protocols. Encryption mechanisms should be employed to safeguard data in transit and at rest. Access controls must be rigorously enforced, limiting data access based on roles and responsibilities. Furthermore, compliance with data protection regulations, such as GDPR or HIPAA, is essential to mitigate legal and reputational risks.

Lack of Data Governance Strategy

A survey by Collibra found that 87% of surveyed organizations identified data governance as a critical initiative, emphasizing the growing recognition of its importance.

The absence of a comprehensive data governance strategy can result in unstructured data management, leading to governance gaps, inconsistent practices, and a lack of accountability.

Developing and implementing a robust data governance framework is imperative. This involves defining clear policies, roles, and responsibilities for data management. Data stewardship and ownership should be established, ensuring accountability throughout the data lifecycle. A well-structured data governance strategy forms the backbone of effective data quality governance.

Performance Tuning Challenges

According to a study by Panoply, over 80% of data professionals reported performance challenges with their data warehouses, indicating a widespread concern.

Poorly tuned data warehouses may experience slow query performance, affecting real-time analytics and decision-making. Inefficient database designs, indexing, or suboptimal configurations contribute to this challenge.

Regular data warehouse performance tuning is essential to address these challenges. This includes optimizing queries, indexing, and partitioning strategies. Organizations can fine-tune their data warehouse to deliver optimal performance by understanding the data access patterns and workload demands, ensuring timely and efficient data processing.

Meeting Business Requirements

A survey by TDWI revealed that only 40% of organizations feel their data warehousing projects consistently deliver business value, highlighting the gap in meeting data warehouse business requirements.

Aligning the data warehouse with evolving business needs is an ongoing challenge. The static nature of some data warehouses may result in a mismatch between data capabilities and the dynamic requirements of the business.

Establishing clear communication channels between data and business teams is pivotal. Regular feedback loops involving stakeholders at various levels help continuously refine and update data warehouse requirements. This ensures that the data warehouse remains agile and responsive to the organization’s ever-changing needs.

Data Warehouse Strategy Alignment

According to a report by Nucleus Research, companies that align their data strategy with business goals achieve a 23% increase in ROI .

The data warehouse strategy may not align with broader organizational goals, resulting in a lack of synergy. This misalignment can lead to missed opportunities for leveraging data as a strategic asset.

Ensuring alignment between the data warehouse strategy and overall business objectives is paramount. Strategic insights should emphasize the impact of effective data warehousing on organizational success. This alignment fosters a data-driven culture, ensuring that data is leveraged as a valuable resource across all facets of the organization.

Adoption and User Training

A study found that organizations that invest in employee training have 24% higher profit margins than those that don’t, emphasizing the positive impact of training on adoption.

Lack of user adoption and insufficient training can hinder the effective utilization of the data warehouse. Users may struggle to leverage the system’s full potential, limiting its impact on decision-making.

Investment in user training programs is a critical aspect of addressing this challenge. Providing comprehensive training ensures that stakeholders at all levels understand the value of the data warehouse and are proficient in its usage. This enhances user adoption and maximizes the value derived from the data warehouse.

Cost Management

Forbes reports that organizations spend, on average, 7.6% of IT budgets on data warehousing, showcasing the significance of cost management in this domain.

Managing the costs associated with data warehousing, including hardware, software, and maintenance, can be a significant challenge. Organizations must balance the need for performance with budgetary constraints.

Exploring cost-effective cloud-based solutions is a strategic move. Cloud platforms offer flexibility and scalability, allowing organizations to optimize resource usage based on actual needs. Periodic reassessment of infrastructure needs ensures that the costs associated with data warehousing remain aligned with the organization’s budgetary considerations.

How can Brickclay Help?

Brickclay, as a leading data engineering services provider, is poised to assist businesses in overcoming the myriad challenges associated with data warehousing. Leveraging our expertise, we offer tailored solutions that align with each organization’s unique needs. Here’s how Brickclay can help businesses navigate and conquer the top 10 data warehouse challenges:

  • Data Quality Governance: Brickclay specializes in establishing and maintaining robust data quality governance practices, ensuring that the warehouse’s data meets the highest accuracy and reliability standards.
  • Continuous Monitoring and Improvement: We provide continuous monitoring mechanisms to promptly identify and rectify data quality issues, ensuring a sustained focus on maintaining high-quality data.
  • Cloud-Based Solutions: Brickclay recommends and implements cloud-based data warehousing solutions that offer scalability on demand. This ensures seamless expansion in response to evolving data requirements, preventing performance bottlenecks.
  • Integration Complexities: Our team possesses extensive expertise in integrating diverse data sources. We utilize advanced integration tools and middleware to streamline the process, ensuring a harmonized data flow from various formats and structures.
  • Robust Security Protocols: Brickclay prioritizes data security and privacy. We implement robust security protocols, encryption, and access controls to safeguard sensitive data within the warehouse, ensuring compliance with data protection regulations.
  • Comprehensive Data Governance Framework: Brickclay assists organizations in developing and implementing a comprehensive data governance framework. Focusing on data quality governance, we ensure that policies, processes, and accountability structures are well-defined.
  • Optimize Performance: Brickclay specializes in performance tuning for data warehouses. We employ strategies such as indexing, partitioning, and caching to enhance query performance and optimize the overall efficiency of the data warehouse.
  • Data Warehouse Strategy Alignment: Brickclay offers strategic guidance to align data warehouse strategies with broader organizational goals. We emphasize the importance of fostering a data-driven organizational culture for optimal strategic alignment.

By choosing Brickclay as your data engineering services partner, businesses can confidently navigate the complexities of data warehousing, leveraging our expertise to turn challenges into opportunities for growth and innovation.

Ready to transform your data warehousing strategy and overcome challenges with Brickclay’s tailored data engineering solutions? Contact us today, and let’s journey towards optimized data management and business success.

Like what you see ? Share with a friend.

About brickclay.

Brickclay is a digital solutions provider that empowers businesses with data-driven strategies and innovative solutions. Our team of experts specializes in digital marketing, web design and development, big data and BI. We work with businesses of all sizes and industries to deliver customized, comprehensive solutions that help them achieve their goals.

Recommended Reading

data processing techniques

automation in data engineering

Data Engineering in Microsoft Fabric Design: Create and Maintain Data Management

cloud data protection

cloud computing and data security

Cloud Data Protection: Challenges and Best Practices

Data Modernization

The Advantages and Current Trends in Data Modernization

Stay connected.

Get the latest blog posts delivered directly to your inbox.

Follow us for the latest updates

Have any feedback or questions.

IMAGES

  1. data_warehousing_case_study

    case study for data warehousing

  2. Data Warehouse Architecture

    case study for data warehousing

  3. What is Data Warehouse: Concepts and Benefits

    case study for data warehousing

  4. 2 Cases to know why you need to implement data warehouse

    case study for data warehousing

  5. Data warehousing and analytics

    case study for data warehousing

  6. Data Warehousing in SAP HANA

    case study for data warehousing

VIDEO

  1. Data Warehousing Case study 3 part 1

  2. (Mastering JMP) Visualizing and Exploring Data

  3. Qualitative Research Designs

  4. Data Mining : Data Warehousing and Online Analytical Processing ch4

  5. The Future of Data Warehousing

  6. DATA WAREHOUSING DESIGNING APPROACHES

COMMENTS

  1. Successful Data Warehousing in Real Life

    Case 1: How the Amazon Service Does Data Warehousing. Amazon is one of the world's largest and most successful companies with a diversified business: cloud computing, digital content, and more. As a company that generates vast amounts of data (including data warehousing services), Amazon needs to manage and analyze its data effectively.

  2. Real World Data Warehousing Examples: Use Cases and Applications

    On that note, data warehouses are used for business analysis, data and market analytics, and business reporting. Data warehouses typically store historical data by integrating copies of transaction data from disparate sources. Data warehouses can also use real-time data feeds for reports that use the most current, integrated information.

  3. Data Warehouse

    Based on my prior experience as Data Engineer and Analyst, I will explain Data Warehousing and Dimensional modeling using an e-Wallet case study. — Manoj. Data Warehouse. A data warehouse is a large collection of business-related historical data that would be used to make business decisions.

  4. Case studies & examples

    Department of Transportation Case Study: Enterprise Data Inventory. In response to the Open Government Directive, DOT developed a strategic action plan to inventory and release high-value information through the Data.gov portal. The Department sustained efforts in building its data inventory, responding to the President's memorandum on ...

  5. Real-Time Data Warehouse Examples (Real World Applications)

    Real-Time Data Warehouse: 3 Real-Life Examples For Enhanced Business Analytics. To truly highlight the importance of real-time data warehouses, let's discuss some real-life case studies. Case Study 1: Beyerdynamic Beyerdynamic, an audio product manufacturer from Germany, was facing difficulties with its previous method of analyzing sales data ...

  6. 10 Use Cases for Data Warehouses

    A data warehouse is a data management system used primarily for business intelligence (BI) and analytics. Data warehouses store large amounts of historical data from a wide range of sources and make it available for queries and analysis. These systems are capable of storing large amounts of unstructured data, unlike traditional relational ...

  7. Case Studies: Cloud-native Data Streaming for Data Warehouse ...

    Data streaming plays a vital role in these initiatives to integrate with legacy and cloud-native data sources, continuous streaming ETL, true decoupling between the data sources, and multiple data sinks (lakes, warehouses, business applications). The case studies of Confluent, Paypal, Shippeo, and Sykes Cottages showed their different success ...

  8. Good practices for clinical data warehouse implementation: A case study

    Author summary Reusing routine care data does not come free of charges. Attention must be paid to the entire life cycle of the data to create robust knowledge and develop innovation. Building upon the first overview of CDWs in France, we document key aspects of the collection and organization of routine care data into homogeneous databases: governance, transparency, types of data, data reuse ...

  9. A Data Warehouse Implementation on AWS

    In past posts, I've been talking about Data Warehouses, their basic architecture, and some basic principles that can help you to build one. Today, I want to show you an implementation of Data Warehouse on AWS based on a case study performed a couple of months ago. This implementation uses AWS S3 as the Data Lake (DL). AWS Glue as the Data ...

  10. Amazon Redshift and the Case for Simpler Data Warehouses

    Amazon Redshift provides a free trial allowing customers to evaluate the service for 60 days using up to 160GB of compressed SSD data at no charge. For subsequent experiments, they can spin up a cluster with no commitments for $0.25/hour/node. This is inclusive of hardware, software, maintenance and management. 2.

  11. PDF Data Warehouse Portfolio

    Data Warehouse Portfolio. Energy trading systems for a variety of submarkets were not integrated. The majority of reporting features were detailed and fragmented. The business needed a consolidated, multi-level aggregation of trading transaction data, as well as a full array of risk measures for Value at Risk and Market-to-Market analysis.

  12. Case Study: Cornell University Automates Data Warehouse Infrastructure

    Cornell was using Cognos Data Manager to transform and merge data into an Oracle Data Warehouse. IBM purchased Data Manager and decided to end support for the product. "Unfortunately, we had millions of lines of code written in Data Manager, so we had to shop around for a replacement," said Christen. He looked at it as an opportunity to add ...

  13. Build Data Lake using AWS

    Coca-Cola Andina Builds Data Lake on AWS, Increases Analytics Productivity by 80% for More Data-Driven Decision-Making. Coca-Cola Andina allowed both the company itself and its customers to make decisions based on reliable data, promoting joint growth of the entire ecosystem, maintaining its competitive advantage, and increasing the company's ...

  14. Medical Big Data Warehouse: Architecture and System Design, a Case

    The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over ...

  15. Data Warehousing Failures: Case Studies and Findings

    Data Warehousing Failures. Eight studies of data warehousing failures are presented. They were written based on interviews with people who were associated with the projects. The extent of the failure varies with the organization, but in all cases, the project was at least a disappointment. Read the cases and prepare a one or two page discussion ...

  16. Data Silos: A Business Case for a New Data Warehouse

    A. Elimination of Data Silos: Building a new data warehouse will lead to the removal of data silos, allowing for a unified view of the organisation's data. This centralised approach will foster ...

  17. How to Prepare for Data Warehousing Interview Questions

    2. Behavioral questions. 3. Scenario-based questions. Be the first to add your personal experience. 4. Case study questions. Be the first to add your personal experience. 5.

  18. Data Warehouse A case study Data for Data Warehouse as a Real Time

    The case study provides foundation knowledge of data warehouse as a real time. The case study reveals that the data ware house maximizes bus iness profitability, and support managers making ...

  19. Case studies / Data Warehousing

    This is a 'model case study' relating to the Business Intelligence Maturity Model. New Management. BI Maturity Model Stage 1 . ... the project team decided that a Data Warehouse approach would be best for their institution. They invited a shortlist of data warehouse BI providers to demonstrate their systems and found that it would be possible ...

  20. Enterprise Data Warehouse Case Study

    WCI architected a roadmap that would take ERP data from 8 main databases and put it into the Enterprise Data Warehouse. This entailed integrating the 5 Oracle ERP instances with the 3 SAP ERPs. Rapid Marts were also implemented in the Oracle ERP systems to improve the flow of the project. Creating a Team. This was a large undertaking so, along ...

  21. Case Study on Data Warehousing

    Data Warehousing Case Study: Data warehouse is the data base, which is created for the reporting and business analysis for the decision making in the organization. The information, which comes into the data base as a rule is available only for reading. The data is copied into the system in such a way to avoid any problematic situations with the analysis of the information and not to disturb ...

  22. Top 10 Data Warehouse Challenges and Solutions

    Here's how Brickclay can help businesses navigate and conquer the top 10 data warehouse challenges: Data Quality Governance: Brickclay specializes in establishing and maintaining robust data quality governance practices, ensuring that the warehouse's data meets the highest accuracy and reliability standards.

  23. PDF A Case Study in Data Warehousing and Data Mining Using the SAS® System

    DESIGN. A data warehouse designed for data mining needs 1) a central repository that contains detailed data, 2) a hardware investment for the central repository that supports a variety of tools, and 3) regular use to measure the effectiveness of campaigns, especially those based on results from data mining.