Data management is the process of collecting, storing, securing, and using an organization’s data. While organizations have several different data sources today, they have to analyze and integrate the data to derive business intelligence for strategic planning. Data management includes all the policies, tools, and procedures that improve data usability within the bounds of laws and regulations.","id":"seo-faq-pairs#what-is-data-management","customSort":"1"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"Why is data management important?","faqAnswer":"
Data is considered to be a valuable resource for modern organizations. With access to large volumes and different data types, organizations invest significantly in data storage and management infrastructure. They use data management systems to run business intelligence and data analytics operations more efficiently. We give some benefits of data management below. \n
Increase revenue and profit \nData analysis gives deeper insights into all aspects of a business. You can action these insights to optimize business operations and reduce costs. Data analysis can also predict the future impact of decisions, improving decision making and business planning. Hence, organizations experience significant revenue growth and profits by improving their data management techniques. \n
Reduce data inconsistency \nA data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments. \n
Meet regulatory compliance \nLaws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations: \n
\n - Capture data without consent \n
- Exercise poor control over data location and use \n
- Store data in spite of erasure requests \n \n
Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.","id":"seo-faq-pairs#why-is-dm-important","customSort":"2"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are the areas of focus for data management?","faqAnswer":"
The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data. \n
Data quality management \nUsers of data expect the data to be sufficiently reliable and consistent for each use case. \n
Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following: \n
\n - Is key information missing or is the data complete? (for example, customer leaves out key contact information) \n
- Does the data meet basic data check rules? (for example, a phone number should be 10 digits) \n
- How often does the same data appear in the system? (for example, duplicate data entries of the same customer) \n
- Is the data accurate? (for example, customer enters the wrong email address) \n
- Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset) \n \n
Data distribution and consistency \nEndpoints for data distribution \nFor most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue. \nData distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases. \n
Data replication mechanisms and impact on consistency \nData distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management. \n
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data. \n
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously. \n
Comparing streaming with batch updates \nData streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed. \n
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics. \n
Big data management \nBig data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as: \n \n - Structured data that represents well in tabular format \n
- Unstructured data like documents, images, and videos \n
- Semistructured data that combines the preceding two types \n \n
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis. \n
Data architecture and data modeling \nData architecture \nData architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy. \n
Data modeling \nData modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage. \n
Data governance \nData governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include: \nRegulatory compliance \nData governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes. \n
Data security and access control \nData governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following: \n
\n - Preventing accidental data movement or deletion \n
- Securing network access to reduce the risk of network attacks \n
- Verifying that the physical data centers that store data meet security requirements \n
- Keeping data secure even when employees access it from personal devices \n
- User authentication, authorization, and the setting and enforcement of access permissions for data \n
- Ensuring that the stored data complies with the laws in the country where the data is stored
\n \n","id":"seo-faq-pairs#what-are-types-of-dm","customSort":"3"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management challenges?","faqAnswer":"
The following are common data management challenges. \n
Scale and performance \nOrganizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially. \n
Changing requirements \nCompliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs. \n
Employee training \nGetting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.","id":"seo-faq-pairs#what-is-data-governance","customSort":"4"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management best practices?","faqAnswer":"
Data management best practices form the basis of a successful data strategy. The following are common best practices. \n
Team collaboration \nBusiness users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects. \n
Automation \nA successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling. \nCloud computing \nBusinesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.","id":"seo-faq-pairs#how-does-dm-work","customSort":"5"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}},{"fields":{"faqQuestion":"How can AWS help with data management?","faqAnswer":"
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security. \nGet started with data management on AWS by creating an AWS account today.","id":"seo-faq-pairs#how-can-aws-support-dm-needs","customSort":"6"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}}]},"metadata":{"auth":{},"pagination":{"empty":false,"present":true},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/data-management/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"3dea65b485","rt-faq.rtl.css":"75bc12ff4b","rt-faq.css":"b00bda11a1","rt-faq.css.js":"0af1d62724","rt-faq.js":"da177bdd5f","rt-faq.rtl.css.js":"a89cd83194"},"settings":{"templateMappings":{"question":"faqQuestion","answer":"faqAnswer"}}}
What is Data Management?
Data management is the process of collecting, storing, securing, and using an organization’s data. While organizations have several different data sources today, they have to analyze and integrate the data to derive business intelligence for strategic planning. Data management includes all the policies, tools, and procedures that improve data usability within the bounds of laws and regulations.
Why is data management important?
Data is considered to be a valuable resource for modern organizations. With access to large volumes and different data types, organizations invest significantly in data storage and management infrastructure. They use data management systems to run business intelligence and data analytics operations more efficiently. We give some benefits of data management below.
Increase revenue and profit
Data analysis gives deeper insights into all aspects of a business. You can action these insights to optimize business operations and reduce costs. Data analysis can also predict the future impact of decisions, improving decision making and business planning. Hence, organizations experience significant revenue growth and profits by improving their data management techniques.
Reduce data inconsistency
A data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments.
Meet regulatory compliance
Laws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations:
- Capture data without consent
- Exercise poor control over data location and use
- Store data in spite of erasure requests
Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.
What are the areas of focus for data management?
The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data.
Data quality management
Users of data expect the data to be sufficiently reliable and consistent for each use case.
Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following:
- Is key information missing or is the data complete? (for example, customer leaves out key contact information)
- Does the data meet basic data check rules? (for example, a phone number should be 10 digits)
- How often does the same data appear in the system? (for example, duplicate data entries of the same customer)
- Is the data accurate? (for example, customer enters the wrong email address)
- Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset)
Data distribution and consistency
Endpoints for data distribution
For most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue.
Data distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases.
Data replication mechanisms and impact on consistency
Data distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management.
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data.
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously.
Comparing streaming with batch updates
Data streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed.
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics.
Big data management
Big data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as:
- Structured data that represents well in tabular format
- Unstructured data like documents, images, and videos
- Semistructured data that combines the preceding two types
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis.
Data architecture and data modeling
Data architecture
Data architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy.
Data modeling
Data modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage.
Data governance
Data governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include:
Regulatory compliance
Data governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes.
Data security and access control
Data governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following:
- Preventing accidental data movement or deletion
- Securing network access to reduce the risk of network attacks
- Verifying that the physical data centers that store data meet security requirements
- Keeping data secure even when employees access it from personal devices
- User authentication, authorization, and the setting and enforcement of access permissions for data
- Ensuring that the stored data complies with the laws in the country where the data is stored
What are some data management challenges?
The following are common data management challenges.
Scale and performance
Organizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially.
Changing requirements
Compliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs.
Employee training
Getting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.
What are some data management best practices?
Data management best practices form the basis of a successful data strategy. The following are common best practices.
Team collaboration
Business users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects.
Automation
A successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling.
Cloud computing
Businesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.
How can AWS help with data management?
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security.
Get started with data management on AWS by creating an AWS account today.
A data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments. \n
Meet regulatory compliance \nLaws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations: \n
\n - Capture data without consent \n
- Exercise poor control over data location and use \n
- Store data in spite of erasure requests \n \n
Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.","id":"seo-faq-pairs#why-is-dm-important","customSort":"2"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are the areas of focus for data management?","faqAnswer":"
The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data. \n
Data quality management \nUsers of data expect the data to be sufficiently reliable and consistent for each use case. \n
Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following: \n
\n - Is key information missing or is the data complete? (for example, customer leaves out key contact information) \n
- Does the data meet basic data check rules? (for example, a phone number should be 10 digits) \n
- How often does the same data appear in the system? (for example, duplicate data entries of the same customer) \n
- Is the data accurate? (for example, customer enters the wrong email address) \n
- Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset) \n \n
Data distribution and consistency \nEndpoints for data distribution \nFor most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue. \nData distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases. \n
Data replication mechanisms and impact on consistency \nData distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management. \n
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data. \n
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously. \n
Comparing streaming with batch updates \nData streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed. \n
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics. \n
Big data management \nBig data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as: \n \n - Structured data that represents well in tabular format \n
- Unstructured data like documents, images, and videos \n
- Semistructured data that combines the preceding two types \n \n
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis. \n
Data architecture and data modeling \nData architecture \nData architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy. \n
Data modeling \nData modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage. \n
Data governance \nData governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include: \nRegulatory compliance \nData governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes. \n
Data security and access control \nData governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following: \n
\n - Preventing accidental data movement or deletion \n
- Securing network access to reduce the risk of network attacks \n
- Verifying that the physical data centers that store data meet security requirements \n
- Keeping data secure even when employees access it from personal devices \n
- User authentication, authorization, and the setting and enforcement of access permissions for data \n
- Ensuring that the stored data complies with the laws in the country where the data is stored
\n \n","id":"seo-faq-pairs#what-are-types-of-dm","customSort":"3"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management challenges?","faqAnswer":"
The following are common data management challenges. \n
Scale and performance \nOrganizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially. \n
Changing requirements \nCompliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs. \n
Employee training \nGetting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.","id":"seo-faq-pairs#what-is-data-governance","customSort":"4"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management best practices?","faqAnswer":"
Data management best practices form the basis of a successful data strategy. The following are common best practices. \n
Team collaboration \nBusiness users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects. \n
Automation \nA successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling. \nCloud computing \nBusinesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.","id":"seo-faq-pairs#how-does-dm-work","customSort":"5"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}},{"fields":{"faqQuestion":"How can AWS help with data management?","faqAnswer":"
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security. \nGet started with data management on AWS by creating an AWS account today.","id":"seo-faq-pairs#how-can-aws-support-dm-needs","customSort":"6"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}}]},"metadata":{"auth":{},"pagination":{"empty":false,"present":true},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/data-management/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"3dea65b485","rt-faq.rtl.css":"75bc12ff4b","rt-faq.css":"b00bda11a1","rt-faq.css.js":"0af1d62724","rt-faq.js":"da177bdd5f","rt-faq.rtl.css.js":"a89cd83194"},"settings":{"templateMappings":{"question":"faqQuestion","answer":"faqAnswer"}}}
What is Data Management?
Data management is the process of collecting, storing, securing, and using an organization’s data. While organizations have several different data sources today, they have to analyze and integrate the data to derive business intelligence for strategic planning. Data management includes all the policies, tools, and procedures that improve data usability within the bounds of laws and regulations.
Why is data management important?
Data is considered to be a valuable resource for modern organizations. With access to large volumes and different data types, organizations invest significantly in data storage and management infrastructure. They use data management systems to run business intelligence and data analytics operations more efficiently. We give some benefits of data management below.
Increase revenue and profit
Data analysis gives deeper insights into all aspects of a business. You can action these insights to optimize business operations and reduce costs. Data analysis can also predict the future impact of decisions, improving decision making and business planning. Hence, organizations experience significant revenue growth and profits by improving their data management techniques.
Reduce data inconsistency
A data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments.
Meet regulatory compliance
Laws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations:
- Capture data without consent
- Exercise poor control over data location and use
- Store data in spite of erasure requests
Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.
What are the areas of focus for data management?
The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data.
Data quality management
Users of data expect the data to be sufficiently reliable and consistent for each use case.
Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following:
- Is key information missing or is the data complete? (for example, customer leaves out key contact information)
- Does the data meet basic data check rules? (for example, a phone number should be 10 digits)
- How often does the same data appear in the system? (for example, duplicate data entries of the same customer)
- Is the data accurate? (for example, customer enters the wrong email address)
- Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset)
Data distribution and consistency
Endpoints for data distribution
For most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue.
Data distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases.
Data replication mechanisms and impact on consistency
Data distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management.
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data.
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously.
Comparing streaming with batch updates
Data streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed.
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics.
Big data management
Big data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as:
- Structured data that represents well in tabular format
- Unstructured data like documents, images, and videos
- Semistructured data that combines the preceding two types
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis.
Data architecture and data modeling
Data architecture
Data architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy.
Data modeling
Data modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage.
Data governance
Data governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include:
Regulatory compliance
Data governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes.
Data security and access control
Data governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following:
- Preventing accidental data movement or deletion
- Securing network access to reduce the risk of network attacks
- Verifying that the physical data centers that store data meet security requirements
- Keeping data secure even when employees access it from personal devices
- User authentication, authorization, and the setting and enforcement of access permissions for data
- Ensuring that the stored data complies with the laws in the country where the data is stored
What are some data management challenges?
The following are common data management challenges.
Scale and performance
Organizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially.
Changing requirements
Compliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs.
Employee training
Getting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.
What are some data management best practices?
Data management best practices form the basis of a successful data strategy. The following are common best practices.
Team collaboration
Business users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects.
Automation
A successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling.
Cloud computing
Businesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.
How can AWS help with data management?
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security.
Get started with data management on AWS by creating an AWS account today.
Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.","id":"seo-faq-pairs#why-is-dm-important","customSort":"2"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are the areas of focus for data management?","faqAnswer":"
The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data. \n
Data quality management \nUsers of data expect the data to be sufficiently reliable and consistent for each use case. \n
Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following: \n
\n - Is key information missing or is the data complete? (for example, customer leaves out key contact information) \n
- Does the data meet basic data check rules? (for example, a phone number should be 10 digits) \n
- How often does the same data appear in the system? (for example, duplicate data entries of the same customer) \n
- Is the data accurate? (for example, customer enters the wrong email address) \n
- Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset) \n \n
Data distribution and consistency \nEndpoints for data distribution \nFor most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue. \nData distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases. \n
Data replication mechanisms and impact on consistency \nData distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management. \n
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data. \n
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously. \n
Comparing streaming with batch updates \nData streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed. \n
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics. \n
Big data management \nBig data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as: \n \n - Structured data that represents well in tabular format \n
- Unstructured data like documents, images, and videos \n
- Semistructured data that combines the preceding two types \n \n
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis. \n
Data architecture and data modeling \nData architecture \nData architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy. \n
Data modeling \nData modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage. \n
Data governance \nData governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include: \nRegulatory compliance \nData governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes. \n
Data security and access control \nData governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following: \n
\n - Preventing accidental data movement or deletion \n
- Securing network access to reduce the risk of network attacks \n
- Verifying that the physical data centers that store data meet security requirements \n
- Keeping data secure even when employees access it from personal devices \n
- User authentication, authorization, and the setting and enforcement of access permissions for data \n
- Ensuring that the stored data complies with the laws in the country where the data is stored
\n \n","id":"seo-faq-pairs#what-are-types-of-dm","customSort":"3"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management challenges?","faqAnswer":"
The following are common data management challenges. \n
Scale and performance \nOrganizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially. \n
Changing requirements \nCompliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs. \n
Employee training \nGetting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.","id":"seo-faq-pairs#what-is-data-governance","customSort":"4"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management best practices?","faqAnswer":"
Data management best practices form the basis of a successful data strategy. The following are common best practices. \n
Team collaboration \nBusiness users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects. \n
Automation \nA successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling. \nCloud computing \nBusinesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.","id":"seo-faq-pairs#how-does-dm-work","customSort":"5"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}},{"fields":{"faqQuestion":"How can AWS help with data management?","faqAnswer":"
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security. \nGet started with data management on AWS by creating an AWS account today.","id":"seo-faq-pairs#how-can-aws-support-dm-needs","customSort":"6"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}}]},"metadata":{"auth":{},"pagination":{"empty":false,"present":true},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/data-management/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"3dea65b485","rt-faq.rtl.css":"75bc12ff4b","rt-faq.css":"b00bda11a1","rt-faq.css.js":"0af1d62724","rt-faq.js":"da177bdd5f","rt-faq.rtl.css.js":"a89cd83194"},"settings":{"templateMappings":{"question":"faqQuestion","answer":"faqAnswer"}}}
What is Data Management?
Data management is the process of collecting, storing, securing, and using an organization’s data. While organizations have several different data sources today, they have to analyze and integrate the data to derive business intelligence for strategic planning. Data management includes all the policies, tools, and procedures that improve data usability within the bounds of laws and regulations.
Why is data management important?
Data is considered to be a valuable resource for modern organizations. With access to large volumes and different data types, organizations invest significantly in data storage and management infrastructure. They use data management systems to run business intelligence and data analytics operations more efficiently. We give some benefits of data management below.
Increase revenue and profit
Data analysis gives deeper insights into all aspects of a business. You can action these insights to optimize business operations and reduce costs. Data analysis can also predict the future impact of decisions, improving decision making and business planning. Hence, organizations experience significant revenue growth and profits by improving their data management techniques.
Reduce data inconsistency
A data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments.
Meet regulatory compliance
Laws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations:
- Capture data without consent
- Exercise poor control over data location and use
- Store data in spite of erasure requests
Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.
What are the areas of focus for data management?
The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data.
Data quality management
Users of data expect the data to be sufficiently reliable and consistent for each use case.
Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following:
- Is key information missing or is the data complete? (for example, customer leaves out key contact information)
- Does the data meet basic data check rules? (for example, a phone number should be 10 digits)
- How often does the same data appear in the system? (for example, duplicate data entries of the same customer)
- Is the data accurate? (for example, customer enters the wrong email address)
- Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset)
Data distribution and consistency
Endpoints for data distribution
For most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue.
Data distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases.
Data replication mechanisms and impact on consistency
Data distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management.
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data.
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously.
Comparing streaming with batch updates
Data streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed.
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics.
Big data management
Big data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as:
- Structured data that represents well in tabular format
- Unstructured data like documents, images, and videos
- Semistructured data that combines the preceding two types
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis.
Data architecture and data modeling
Data architecture
Data architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy.
Data modeling
Data modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage.
Data governance
Data governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include:
Regulatory compliance
Data governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes.
Data security and access control
Data governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following:
- Preventing accidental data movement or deletion
- Securing network access to reduce the risk of network attacks
- Verifying that the physical data centers that store data meet security requirements
- Keeping data secure even when employees access it from personal devices
- User authentication, authorization, and the setting and enforcement of access permissions for data
- Ensuring that the stored data complies with the laws in the country where the data is stored
What are some data management challenges?
The following are common data management challenges.
Scale and performance
Organizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially.
Changing requirements
Compliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs.
Employee training
Getting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.
What are some data management best practices?
Data management best practices form the basis of a successful data strategy. The following are common best practices.
Team collaboration
Business users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects.
Automation
A successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling.
Cloud computing
Businesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.
How can AWS help with data management?
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security.
Get started with data management on AWS by creating an AWS account today.
Data distribution and consistency \nEndpoints for data distribution \nFor most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue. \nData distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases. \n
Data replication mechanisms and impact on consistency \nData distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management. \n
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data. \n
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously. \n
Comparing streaming with batch updates \nData streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed. \n
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics. \n
Big data management \nBig data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as: \n \n - Structured data that represents well in tabular format \n
- Unstructured data like documents, images, and videos \n
- Semistructured data that combines the preceding two types \n \n
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis. \n
Data architecture and data modeling \nData architecture \nData architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy. \n
Data modeling \nData modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage. \n
Data governance \nData governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include: \nRegulatory compliance \nData governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes. \n
Data security and access control \nData governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following: \n
\n - Preventing accidental data movement or deletion \n
- Securing network access to reduce the risk of network attacks \n
- Verifying that the physical data centers that store data meet security requirements \n
- Keeping data secure even when employees access it from personal devices \n
- User authentication, authorization, and the setting and enforcement of access permissions for data \n
- Ensuring that the stored data complies with the laws in the country where the data is stored
\n \n","id":"seo-faq-pairs#what-are-types-of-dm","customSort":"3"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management challenges?","faqAnswer":"
The following are common data management challenges. \n
Scale and performance \nOrganizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially. \n
Changing requirements \nCompliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs. \n
Employee training \nGetting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.","id":"seo-faq-pairs#what-is-data-governance","customSort":"4"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"
data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management best practices?","faqAnswer":"
Data management best practices form the basis of a successful data strategy. The following are common best practices. \n
Team collaboration \nBusiness users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects. \n
Automation \nA successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling. \nCloud computing \nBusinesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.","id":"seo-faq-pairs#how-does-dm-work","customSort":"5"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}},{"fields":{"faqQuestion":"How can AWS help with data management?","faqAnswer":"
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security. \nGet started with data management on AWS by creating an AWS account today.","id":"seo-faq-pairs#how-can-aws-support-dm-needs","customSort":"6"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":"data-management","metadata":{}}]}}]},"metadata":{"auth":{},"pagination":{"empty":false,"present":true},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/data-management/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"3dea65b485","rt-faq.rtl.css":"75bc12ff4b","rt-faq.css":"b00bda11a1","rt-faq.css.js":"0af1d62724","rt-faq.js":"da177bdd5f","rt-faq.rtl.css.js":"a89cd83194"},"settings":{"templateMappings":{"question":"faqQuestion","answer":"faqAnswer"}}}
What is Data Management?
Data management is the process of collecting, storing, securing, and using an organization’s data. While organizations have several different data sources today, they have to analyze and integrate the data to derive business intelligence for strategic planning. Data management includes all the policies, tools, and procedures that improve data usability within the bounds of laws and regulations.
Why is data management important?
Data is considered to be a valuable resource for modern organizations. With access to large volumes and different data types, organizations invest significantly in data storage and management infrastructure. They use data management systems to run business intelligence and data analytics operations more efficiently. We give some benefits of data management below.
Increase revenue and profit
Data analysis gives deeper insights into all aspects of a business. You can action these insights to optimize business operations and reduce costs. Data analysis can also predict the future impact of decisions, improving decision making and business planning. Hence, organizations experience significant revenue growth and profits by improving their data management techniques.
Reduce data inconsistency
A data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments.
Meet regulatory compliance
Laws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations:
- Capture data without consent
- Exercise poor control over data location and use
- Store data in spite of erasure requests
Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.
What are the areas of focus for data management?
The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data.
Data quality management
Users of data expect the data to be sufficiently reliable and consistent for each use case.
Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following:
- Is key information missing or is the data complete? (for example, customer leaves out key contact information)
- Does the data meet basic data check rules? (for example, a phone number should be 10 digits)
- How often does the same data appear in the system? (for example, duplicate data entries of the same customer)
- Is the data accurate? (for example, customer enters the wrong email address)
- Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset)
Data distribution and consistency
Endpoints for data distribution
For most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue.
Data distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases.
Data replication mechanisms and impact on consistency
Data distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management.
Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data.
Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously.
Comparing streaming with batch updates
Data streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed.
Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics.
Big data management
Big data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as:
- Structured data that represents well in tabular format
- Unstructured data like documents, images, and videos
- Semistructured data that combines the preceding two types
Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis.
Data architecture and data modeling
Data architecture
Data architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy.
Data modeling
Data modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage.
Data governance
Data governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include:
Regulatory compliance
Data governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes.
Data security and access control
Data governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following:
- Preventing accidental data movement or deletion
- Securing network access to reduce the risk of network attacks
- Verifying that the physical data centers that store data meet security requirements
- Keeping data secure even when employees access it from personal devices
- User authentication, authorization, and the setting and enforcement of access permissions for data
- Ensuring that the stored data complies with the laws in the country where the data is stored
What are some data management challenges?
The following are common data management challenges.
Scale and performance
Organizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially.
Changing requirements
Compliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs.
Employee training
Getting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.
What are some data management best practices?
Data management best practices form the basis of a successful data strategy. The following are common best practices.
Team collaboration
Business users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects.
Automation
A successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling.
Cloud computing
Businesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.
How can AWS help with data management?
AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security.
Get started with data management on AWS by creating an AWS account today.
For most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue. \n Data distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases. \n Data replication mechanisms and impact on consistency \n Data distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management. \n Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data. \n Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously. \n Comparing streaming with batch updates \n Data streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed. \n Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics. \n Big data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as: \n Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis. \n Data architecture \n Data architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy. \n Data modeling \n Data modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage. \n Data governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include: \n Regulatory compliance \n Data governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes. \n Data security and access control \n Data governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following: \n ","id":"seo-faq-pairs#what-are-types-of-dm","customSort":"3"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":" data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management challenges?","faqAnswer":" The following are common data management challenges. \n Organizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially. \n Compliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs. \n Getting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.","id":"seo-faq-pairs#what-is-data-governance","customSort":"4"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":" data-management","metadata":{}}]}},{"fields":{"faqQuestion":"What are some data management best practices?","faqAnswer":" Data management best practices form the basis of a successful data strategy. The following are common best practices. \n Business users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects. \n A successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling. \n Businesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.","id":"seo-faq-pairs#how-does-dm-work","customSort":"5"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":" data-management","metadata":{}}]}},{"fields":{"faqQuestion":"How can AWS help with data management?","faqAnswer":" AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security. \n Get started with data management on AWS by creating an AWS account today.","id":"seo-faq-pairs#how-can-aws-support-dm-needs","customSort":"6"},"metadata":{"tags":[{"id":"seo-faq-pairs#faq-collections#data-management","name":"data-management","namespaceId":"seo-faq-pairs#faq-collections","description":" data-management","metadata":{}}]}}]},"metadata":{"auth":{},"pagination":{"empty":false,"present":true},"testAttributes":{}},"context":{"page":{"locale":null,"site":null,"pageUrl":"https://aws.amazon.com/what-is/data-management/","targetName":null,"pageSlotId":null,"organizationId":null,"availableLocales":null},"environment":{"stage":"prod","region":"us-east-1"},"sdkVersion":"1.0.115"},"refMap":{"manifest.js":"3dea65b485","rt-faq.rtl.css":"75bc12ff4b","rt-faq.css":"b00bda11a1","rt-faq.css.js":"0af1d62724","rt-faq.js":"da177bdd5f","rt-faq.rtl.css.js":"a89cd83194"},"settings":{"templateMappings":{"question":"faqQuestion","answer":"faqAnswer"}}}
Data management is the process of collecting, storing, securing, and using an organization’s data. While organizations have several different data sources today, they have to analyze and integrate the data to derive business intelligence for strategic planning. Data management includes all the policies, tools, and procedures that improve data usability within the bounds of laws and regulations. Data is considered to be a valuable resource for modern organizations. With access to large volumes and different data types, organizations invest significantly in data storage and management infrastructure. They use data management systems to run business intelligence and data analytics operations more efficiently. We give some benefits of data management below. Data analysis gives deeper insights into all aspects of a business. You can action these insights to optimize business operations and reduce costs. Data analysis can also predict the future impact of decisions, improving decision making and business planning. Hence, organizations experience significant revenue growth and profits by improving their data management techniques. A data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments. Laws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations: Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy. The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data. Users of data expect the data to be sufficiently reliable and consistent for each use case. Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following: Endpoints for data distribution For most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue. Data distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases. Data replication mechanisms and impact on consistency Data distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management. Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data. Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously. Comparing streaming with batch updates Data streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed. Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics. Big data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as: Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis. Data architecture Data architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy. Data modeling Data modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage. Data governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include: Regulatory compliance Data governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes. Data security and access control Data governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following: The following are common data management challenges. Organizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially. Compliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs. Getting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort. Data management best practices form the basis of a successful data strategy. The following are common best practices. Business users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects. A successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling. Businesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account. AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security. Get started with data management on AWS by creating an AWS account today.Big data management \n
\n
Data architecture and data modeling \n
Data governance \n
\n
\n \nScale and performance \n
Changing requirements \n
Employee training \n
Team collaboration \n
Automation \n
Cloud computing \n
What is Data Management?
Why is data management important?
Increase revenue and profit
Reduce data inconsistency
Meet regulatory compliance
What are the areas of focus for data management?
Data quality management
Data distribution and consistency
Big data management
Data architecture and data modeling
Data governance
What are some data management challenges?
Scale and performance
Changing requirements
Employee training
What are some data management best practices?
Team collaboration
Automation
Cloud computing
How can AWS help with data management?