-
Type: structured, semi-structured, or unstructured.
-
Data stores : File stores ,Databases
-
file Formats :
- Delimited text files
- JavaScript Object Notation (JSON)
- Extensible Markup Language (XML)
-
Optimized File Format
- Avro: is a row-based format. It was created by Apache
- ORC (Optimized Row Columnar format): organizes data into columns developed by HortonWorks for Apache Hive
- Parquet is another columnar data format. It was created by Cloudera and Twitter very efficient compression and encoding schemes. -databases
- Relational databases
- tables that represent entities
- entity is assigned a primary key that uniquely identifies it
- keys = Normalization = Remove Dups
- use SQL
- Non-relational databases
- Key-value databases
- Document databases (JSON)
- Column family databases (key , Columns )
- Graph databases,
-
Transactional data processing
- specific events organization wants to track
- Used in Money / Goods / services
- The work performed by transactional systems is Online Transactional Processing (OLTP).
- DB optimized are both read and write operations
- CRUD operation
- OLTP systems shd enforce ACID (Atomicity(entire transction happens at once or nothing happens),Consistency,Isolation,Durability (persisted)
-
analytical data processing
- read-mostly
- files / OLTP > datalake > ETL > Warehouse-OLAP (Facts and Dim) > Analytics
-
Data lakes
- file-based data must be collected and analyzed.
-
Data warehouses
- relational schema
- read operations
- Uses denormalizationfor performance ( Star Scheme)
-
Key job roles
- Database administrators - manage databases, permissions, backup ,policies
- Data engineers - data integration, data cleaning routines,governance rules, and pipelines
- Data analysts -create visualizations
- Data scientists - Build Model
-
collections of entities from the real world as table
-
entity =record of information ie., objects and events
-
Relational tables = structured data ie., each row in a table has the same columns
-
Normalization
- Separate each entity into its own -TABLE.
- Separate each discrete attribute into its own -COLUMN.
- Uniquely identify each entity row using a -PRIMARY KEY.
- FOREIGN KEY columns to link related entities.
- Types:
- 1st Normal Form (1NF): Make all cols only 1 value
- 2nd Normal Form (2NF): Remove duplicates, create Foreign Key
- 3rd Normal Form (3NF): move cols out of the tables which are not depending on primary key
-
SQL (Structured Query Language)
- communicate with a relational database.
-
Type of SQl Language
- Transact-SQL :Microsoft SQL Server and Azure SQL services.
- pgSQL- PostgreSQL.
- PL/SQL- Oracle.
-
Statement types
- DDL (Data Definition Language) ie.,create,alter,rename,drop (dangerous)
- DML (Data Manipluation) ie., select,insert,update ,delete
- DCL (Data Control) ie., Grant,deny,revoke
- TCL (Transaction control) ie., rollback
-
Columns marked as NOT NULL are referred to as mandatory columns
-
Views
-
virtual table
-
It is used to save a complex query and complex tables
CREATE VIEW Deliveries AS SELECT o.OrderDate,c.FirstName FROM Order AS o JOIN Customer AS c ON o.CustomerID = c.ID; SELECT OrderDate, FirstName FROM Deliveries
-
-
Stored Procedure
-
SQL code that you can save, so the code can be reused
CREATE PROCEDURE procedure_name2 @ProductID INT AS SELECT * FROM SalesLT.Product where ProductID=@ProductID GO; EXEC procedure_name2 680;
-
-
Index
- index = index at the back of a book.
- database management system use index to fetch data quickly
- Primary key will have index hence faster
- table can have > 1 indexes
- Index = 1 or more cols
- insert, update, or delete data , indexes table must be changed
- 2 types of Indexes:
-
Clustered - Primary Key of a table
- Table can have only 1 clustered index
- Data is stored in the order of index
-
non clustered - Created Seperatly
- Index stored separatly with pointers
- create nonclustered index index_name on table (col_name); -Eg : CREATE INDEX idx_ProductName ON Product(Name);
-
-
Unnormalized
- Data duplication exists
- Because of Duplication it is hard to change data
- not Recommended
- Azure SQL VM (IAAS) - Fully compatible with on-premises
- Azure Managed Services (PAAS)
- Azure SQL Database
- Azure SQL Managed Instance
- Azure Database open-source ie., My SQL, MariaDB,PostgreSQL
- Azure SQL Edge (IOT)
- virtual machine with installation of SQL Server
- Self Administered (OS , Updates etc., )
- "lift and shift" migration of existing on-premises
- Full Control - Mem ,CPU etc
- Search for "Virtual Machine" in portal
- New > Select images (SQL)
- U can ssh to virtualMachine in Cloudshell or
- Search box "Azure SQL"
- Under "SQL virtual machines"> select "Free SQL Server Server 2019 Ubuntu "
- Subscripption= Free, egion =Free,Image = Free ,Size = Least
- name = my-sql-server-vm
- create
- Save certificate to a folder when asked
- Passed
- To connect to server from local : ck on to resources page > settings >click on Connect
- PAAS
- All features of SQL server + Extra features
- Recommended in migration from on-prem
- Cross Database queries
- send emails
- SQL server Agents Eg: run jobs
- Virtual network (only for vCore)
- Supports Analysis , Reporting
- Access to SQL Server Agent , Database Mail
- Linked servers, Service Broker
- platform as a service (PaaS)
- High availability
- Automatic Updates
- Autobackups
- Serverless
- Flexible
- Auto Encryption( TDE)
- Authentication
- Read Replica DB can be created ( for reporting against reporting Db)
- low Cost
- isn't fully compatible with on-premises SQL
- types :
- Single Database
- Default
- charged per hour for the resources
- Serverless
- shared by databases belonging to other Azure subscribers
- automatically scales
- Elastic Pool
- multiple databases can share the same resources
- multiple-tenancy
- For databases with resource requirements that vary over time, and can help you to reduce costs
- Single Database
- testdeepak/deepak/test-123 , SELECT * FROM SalesLT.Product;
- enable organizations to move to Azure
- mySQL (oracle) : LAMP apps ie., linux ,Apache, MySQL, and PHP
- Maria DB: Fork of MySQL
- Postgres ( uses pgsql , support custom datatype,manipulate geometric data)
- Types
- Single Server :Basic, General Purpose, and Memory Optimized
- Flexible Server :more control and server customizations,better cost optimization
- Hyperscale :database is split across nodes. Data is split into chunks based on the value of a partition key or sharding key
- ports :
- SQL - 1433
- MYSQL = 3306
- Postgrs = 5432
- Security
- Transparent data encryption (TDE)- encrypting data at rest.
- Transport Layer Security (TLS) to encrypt data that is transmitted across a network
- Dynamic data masking (DDM) limits sensitive data exposure by masking it to non-privileged users.
- Azure Blob (Binary large objects Storage)
- store massive amounts of unstructured data as binary large objects, or blobs
- Uses container
- can read and write blobs inside a container
- virtual folders simialr to file system
- types : Block , Page , Append
- Block (100 MB) = Max of 50k Blocks = 4.7 TB
- Page = Into Pages , Fast Read and Write , Max 8 TB
- Append = Used for append operation ie.,logs . 4MB - 195 GB
- 3 Access
- hot = high performance
- cold = low performance , less charges
- Archive = lowest storage cost , long time to access data
- Azure DataLake Storage Gen2
- newer ver of Gen1
- Advantages = blob storage , cost-control of storage tiers, hierarchical namespace , analytics
- analytical data lakes
- Integrate with Azure HDInsight, Azure Databricks, and Azure Synapse Analytics
- Data Lake Storage Gen2 = Azure Storage + hierarchical namespace checkbox (Under Advanced)
- hierarchical namespace = Folder operations (rename , delete , copy etc.,)
- By creating the storage account, or you can upgrade an existing Azure Storage account to support Data Lake Gen2
- After Upgrading you can’t revert it to a flat namespace.
- Azure Files (File Share)
- 100 TB of data in a single storage account.
- maximum size of a single file is 1 TB
- 2000 Concurrent operations
- upload files using Azure portal, or AzCopy utility.
- Azure File Sync service to synchronize local copy
- 2 Types:
- Standard :hard disk-based hardware in a datacenter,
- Premium :uses solid-state disks , greater throughput
- 2 file sharing protocols
- Server Message Block (SMB) : used in all OS
- Network File System (NFS) : Only in some mac and linux (Premium)
- Azure Table (No SQL , prefer cosmos instead)
- Row = Key , Value
- each row holding the entire data for a logical entity
- must have a unique key
- no concept of foreign keys, relationships, stored procedures, views,
- denormalized
- The number of fields in each row can be different
- Partition Key
-
used to group related data
-
rows with same partition key are stored together
-
They are independent in size
-
Items in the same partition are stored in row key order.
-
used for faster search and range queries fetch contiguous block of rows
-
Ex
+------+---------+------------+-----------+------------------+ | Pkey | RowKey | TimeStamp | Property1 | Property2 | +------+---------+------------+-----------+------------------+ | 1 | unique1 | 2022-01-01 | Deepak | Bangalore, India | +------+---------+------------+-----------+------------------+ | 2 | unique2 | 2022-01-01 | John | | +------+---------+------------+-----------+------------------+ | 2 | unique3 | 2022-01-01 | | Newyork,India | +------+---------+------------+-----------+------------------+
-
- NoSQL DB
- Horz scaling ,High scalability
- Use Api to query Data
- no administraton
- Auto Scale (no limit)
- All features similar to SQL Server
- Can use SQL language to Query NoSQL data
- multi-region writes for globally distributed user local replica
- Creation:Account > DB > Container >item Note :Seperate account is needed for each
- testdeepak/deepak/test-123 , SELECT * FROM SalesLT.Product;
- Documents :
- Types :
- Core SQL : JSON , SQL syntx
- Mongo (Binary JSON - BSON )
- Ex: Person(name ,Address()).In RDBMS address will be seperate table
- Core SQL :DB > Container > Item
- Mongo : DB >Collection >Doc
- Types :
- Table API
- greater scalability and performance than Azure Table Storage.
- Stucture similar to Azure Table
- GET - https://endpoint/Customers(PartitionKey='1',RowKey='124')
- Column (Cassandra)
- rows and columns
- Has 1 Primary Key Column
- Other columns are grouped as 1 column
- Eg: (Pk=1 ,col2=(name="deepak",ph="123 )), (Pk=2 ,col2=(name="ABC",email="abc@gmail" ))
- compatible with Apache Cassandra,
- not mandatory for every row to have the same columns.
- KeySpace > Table >Row
- SQL to Query (SELECT * FROM Employees WHERE ID = 2)
- Key Val :Map or Dict
- Gremlin (graph)
- used to store complex relationship
- Uses nodes and edges
- Eg:Organization Charts
- Azure Cosmos , Gremlin
- DB > Garph > Node , Edge
- Data ingestion pipelines
- Data ingestion and processing -ETL,ELT
- Analytical data store - data warehouses, file-system based data lakes, and hybrid architectures
- Analytical data model - cube , Facts and Dim ,Star Scheme
- Data visualisation -comparisons, dashboards , and key performance indicators (KPIs),reports
- Data ingestion pipelines
- Azure Data Factory or Azure Synapse Analytics
- one or more activities
- Activities = data flow that incrementally manipulates the data until an output dataset is produced.
- Pipelines consist of one or more activities (built-in activities, linked service )
- Eg : Azure Blob Store linked service to ingest the input dataset, Azure SQL Database to run a stored procedure, Run task on Azure Databricks or Azure HDInsight, or apply custom logic using an Azure Function.
- Data warehouse
- relational database ie ., stored in a schema
- optimized for data analytics rather than transactional
- denormalised into a schema
- Facts and Dim
- Data lake
- tabular schemas on semi-structured data files
- structured, semi-structured, and even unstructured data
- Hybrid approaches
- Used in Spark
- data lakes and data warehouses in a lake database or data lakehouse.
- stored as files in a data lake > expose them as tables (templates ie., struct) > queried using SQL using SQL pools (Azure Synapse Analytics )
- PolyBase : query external tables
- Azure services for analytical stores
- Azure Synapse Analytics
- unified data analytics solution
- single service interface for multiple analytical capabilities
- Pipelines similar to Azure Data Factory.
- Use SQL , Datawarehouse
- Apache Spark
- Azure Synapse Data Explorer - data analytics using Kusto Query Language (KQL).
- Not Support data Extraction from Multiple sources (use Datafactory)
- MPP - Massive Parallel Process - Computer nodes
- Azure Databricks
- Apache Spark data processing platform with SQL database semantics
- notebook to query
- visualize data in web-based interface.
- Multiple sources
- Azure HDInsight
- Apache Spark
- Apache Hadoop - a distributed system that uses MapReduce jobs
- Apache HBase - an open-source large-scale NoSQL
- Apache Kafka - a message broker for data stream processing.
- Apache Storm - open-source for real-time data processing
- Azure Synapse Analytics
-
2 Types:
- Batch processing
- multiple data records are collected and stored processed 1 operation.
- Eg : postpaid Bill
- Adv : Large Volume of Data , Time can be schdeuled , For Complex Operations
- DisAdv : Latency , Input Data needs to be prepared
- Stream processing
- source data is constantly monitored and processed in real time
- Adv : No Latency ,Real Time , No prep of data
- Disav : Not for High Volume , Recent Data , Simple Operations
- Batch processing
-
General architecture for stream processing
- An event generates some data Eg: a social media posts , a logs
- data is captured Eg: folder in a cloud data store or a table in a database , the source may be a "queue"
- The event data is processed by a perpetual query running in time window Eg:Count the number of sensor emissions per minute.
- The results stored output sink Eg: file, database table etc.,
-
Azure Stream Analytics (PAAS)
- Ingest data from an input eg : Azure event hub, Azure IoT Hub, or Azure Storage blob container.
- Process the data by using a SQL query
- Write the results - Gen 2, Azure SQL Database, Synapse , Azure Functions, event hub, Power BI, or others.
- created using Stream Analytics job
- Stream Analysis cluster = dedicated tenant for Stream Analytics job
-
Spark Structured Streaming :
- develop streaming for Apache Spark based services : Synapse , Databricks, HDInsight.
-
Delta Lake
- Datalake +transactional consistency + schema enforcement
- streaming and batch
- Can be used as Source / Sink
-
Azure Data Explorer:
-
database ,
-
analytics
-
ingesting querying batch , streaming data with a time-series element
-
a standalone Azure service
-
Azure Synapse Data Explorer runtime in an Azure Synapse Analytics workspace.
-
Uses Kusto Query Language (KQL)
-
Query telemetry data that includes a timestamp attribute.
-
Example :
LogEvents | where StartTime > datetime(2021-12-31) | where EventType == 'Error' | project StartTime, EventType , Message
-
-
Microsoft Power BI
- data analysts can use to build interactive data visualisations for business users
- Dashboard - Single page coll of reports
- Reports - Collection of visuals with more than 1 page
- Powr BI report Builder = Author and publish paginated reports
- Powr Bi Desktop - create interactive report for Dashboard
- Power Bi Service = Desktop + Dashboard
- Power BI Desktop
- import data from a wide range of data sources
- combine and organize into analytics data model
- create reports that contain interactive visualizations of the data.
- Power BI service
- reports can be published and interacted with by business users
- basic data modeling using a web browser (limited functionality)
- Power BI phone app.
- Users can consume reports, dashboards, and apps in the Power BI service
-
Concepts of data modeling
- Facts and Dimensions
- numeric is called measures,entities is called dimensions
- Eg : table containing numeric measures for sales (such as revenue or quantity) and dimensions for products, customers, and time.
- model forms a multidimensional structure called cube
- Denormalised Star Scheme (Best practise)
- star Schema is fact table is related to one or more dimension tables
- Dim = Person Details , Fact = Transaction done by Person
- hierarchies = drill-up or drill-down to find aggregated values at different levels
- model tab of Power BI Desktop to define your analytical model
-
Visualizations
- Tables and text: simplest way to communicate data
- Bar and column: compare numeric values for discrete categories.
- Line: examine trends, often over time.
- Pie: visually compare categorized values as proportions of a total.
- Scatter: compare two numeric measures, identify relationship or correlation
- Maps: compare values for different geographic areas or locations.
-
Create Reports
- PowerBi Destop > GetData > Web > add URl> OK
- Model Tab to Create Model ie., Format , Heirachy
- enable Visualizations: File > Options and Settings> Security section> Use Map and Filled Map visuals> OK.
- Fields Pane contain the Model created
-
Visualizaton which can be pinned (gives No additional Data) = textbox ,Images,videos,streaming Data , Webcontent
-
cannot be pinned in PowerBI = Interactive Reports , Datasets,Dashboards,Xlsx ,SSRS
================================== REFERENCES
Descriptive - averge Revenue Diagnostic - Why avg rev low ? Predictive - Avg revenue happens when covid Presciptive - Youtube recommenede videos Cognative - AI , Self Driving Car
RDBMS SQL Server in VM - User installs the SQL server in VM (Apply image on Container) SQL Managed Instance - Fully Managed by Azure SQL database (PaaS) Azure Database ( other DBs than - MariaDB , PostgreSQL )
DP-900 | Azure Data Fundamentals Certification (https://www.youtube.com/watch?v=0f9JLKgfFXM)
- RDBMS
- NoSQL
- Analytical DB
- Object/Blob/File
- Buying infra is Expensive
- Utilization of entire Infra efficiently all the time ?
- Team need to maintain
- On Demand provisioning ( pay per use)
- Scaling easier
- High Availiablity
- low Latency by adding more data center( users from other parts of the world)
- Maintainablity
- crash backup
- Going Global and CDS
- Security
-
IAAS (Infra as Service)
- SQL (image) installed to VM by user
- User is responsible for everything
-
Paas (Platform as A Service) - Recommended
- Cloud is responsible for everything
- Eg: Azure SQL Db, Cosmos etc.,
- Create SQL Db
- Search for SQL Database in portal
- Create new resource grp
- Select eveything free ,
- COnfigure db>Basic or (General Purpose > Serverless)
- Select Local redundant
- (usn,admin)my-paas/Password123
- Networking > Public (firewall settings)
- Networking > Add current client IP address and Allow Azure services and resources to access this server = YES
- Search for SQL Databases in portal or "Azure SQL" > SQL Databses
- Azure provides QUery Editor
-
SAAS (Software as A Service)
- Online Excel , Outlook , CRM , Box etc .,
- Serice Provider is responsible for runtime.
Structured (RDBMS)
- tables ,rows and cols
- Schema
- Relationship between other tables
- index = primary key , query efficiently
- Constraint = not null
- Used in OLTP (online transaction)
- Eg: online payments
- Heavy Writes
- Azure SQL DB , DB for MySQL , DB for Postgresql
- OLAP (Online Analytics Processin)
- Eg :ETL , Warehouse
- data from other Dbs
- Azure Synapse Analytics
- OLTP vs OLAP
- OLTP is row data storage , OLAP is column wise storage (column is compressed bcoz of same type of data)
- OLTP data cannot be distributed , OLAP data (column data) can be distributed across mulitple nodes
Semi (JSON , key value )
- Fexible schema (schema not verified , no constraints)
- Horizontal Scaling possible
- Cosmos DB used for NoSQL , SemiStructured
- Create SQL server
- Free
- servername : my-sql-database-server-new ,c azuser/Password123
- Configure DbServer
- General Purpose > Serverless
- locally Redundant
- Networking (public , all access , add client)
- Download Azure Data Studio
- ck on new connection
- server( server name from resource dashboard)
- enter usn /password
- connect
- Data Stored as JSON but use SQL to fetch data
- Create
- Free and Local redundant
- account name = any unique name(cosmos-sql-udemy)
- East Asia
- Select limit the total amount option
- Goto resource > Add Container
- Mandatory =
- Db id
- throughput= manual
- Container id
- partition key column = "/pk"
- Ro Run , open Dataexplorer
- cosmos-sql-udemy-db > core-sql-container > items > new item
- save
- Query "select * from c where c.name="deepak"
- use mongo filter to pull data
- container is called collection in Cosmos Mongo
- Uses sharding (similar to partition)
- sharding is split data across multiple shards
- shard key = partition key
- filter ="{"key":value}"
- RU is billed per hr , Serverless = usage
- RU no autoscale ,serverless= Autoscale
- RU =mulitregion ,Serverless= No
- RU = data unlimited , Serverless= 50Gb
- Similar to Harddisk,Fileshare
- Azure Disks : additional storage acts like harddisk
- Azure Files (File Storage) : File share
- Azure Blob (Object Storage) : REST API objects ,upload vis REST etc.,
- define and schedule data pipelines
- integrate your pipelines with other Azure services
- persist the results in another data store.
- ETL
- Streaming ETL Engine
- in Azure Synapse Analytics.
- querying of log and telemetry data
- Runtime
- map your data and track data lineage across multiple data sources and systems
- analytical data modeling and reporting
- Power BI reports can be created by using the Power BI Desktop application
- published and delivered through Power BI service and Power BI mobile app
- Region is geo location to host service
- Region > Zone > Datacenter
- Part of a region
- Region can have 1 or many zones at same or different location with 1 or more seperate Data Center
-
- vCore
- Serverless
- Auto Scale
- charged only on usage and data stored
- Provisioned
- Can be configured
- charged based on memory and Compute
-
- DTU :
- Bundle (compute and Memory)
- compute and Memory cannot be configured independently
- LRS (Local Reduncany Storage)- 3 syned copies maintined in same Data Center
- ZRS (Zone)- 3 synced copies in different Availialibity Zones
- GRS (Geo)- LRS + 1 Async copy to secondary region
- RA-GRS : GRS + read access for primary and secondary
- GZRS (Geo Zone)- ZRS + 1 Async copy to secondary region (most expensive)
- Harddisk to VM (SSD , HDD , premium SSD , )
- Use Azure Disk
- Add Resource > Virtual Machines > Add > Disks
- (alternate )Resources >Storage > select any disk name created > Datastorage >Containers (different from docker/kubetl)
- types : managed (By azure) , Unmanaged (By user)
Azure Event Hubs: manage queues of event data Azure IoT Hub: Internet-of-things (IoT) devices. Azure Data Lake Store Gen 2: A highly scalable storage service for batch /streaming data. Apache Kafka: Use Apache Spark and also use Azure HDInsight to create a Kafka cluster.
Azure Event Hubs: Azure Data Lake Store Gen 2 or Azure blob storage: Azure SQL Database or Azure Synapse Analytics, or Azure Databricks Microsoft Power BI
Azure Synapse Analytics Azure Databricks Azure HDInsight
- Descriptive- what,SQL,averge Revenue
- Diagnostic - why,Why avg rev low ?
- Predictive (ML) - what will happen ,Avg revenue happens when covid
- Prescriptive - Use Predictive Analysis and take nest step),Youtube recommenede videos ,Auto Complete
- Cognitive ( AI + ML) ,AI , Self Driving Car
- 3V = Volumne,Varity, Velocity
- Storage
- DataWarehouse
- Processed Data
- SQL
- DataLake
- Raw Data (Google Cloud , Azure Data Lake Storgae Gen2)
- Blob Storage . Gen 2
- parq format
- Synapse can directly read from Datalake
- DataWarehouse
- Synapse (End to End Analytics)
- Dataintegration + Load DW + Data Analytics
- Sql
- Spark jobs
- Data Factory
- Manged Serice
- Serverless
- ETL
- Data pielines
- Power BI ( Visualization)
- Hadoop
- Support variety of data
- HDFS
- Map Reduce / parallelization in nodes
- java Python
- Hive (query)
- spark ( In Memory)
- Databricks
- Web Based managed service for spark
- Storage Format
- opensource
- Column storage
- High Compression
- Azure Data Factory , Azure Data lake ,Blob Storage
- Azure HDInsight (old)
- Azure Databricks - new (managed service)
- Read Data from Azure SQL Database, Cosmos,Event Hubs
- Synapse
- Datafactory
- Create
- Search Databricks > Give Resource Group and name > Create >Goto Resource> Launch Workspace
- Name , Terminate after 10mins , workers=1
- Create Notebook > Name
- sample dataset - open browser "https://docs.microsoft.com/en-us/azure/open-datasets/dataset-catalog"
- createorreplacetempview()
- Use SQL query for Analysis
- compute across muliuplpe nodes
- Spark ,Synapse
Batch Piplines
- Huge
- In groups
- Hours or No of Records
- high Latency
Streaming pipelines
- Small Number
- Use Event Hubs
- Realtime
- Low Latency
- Target Data store itself is a transformation Engine
- Synapse Analytcs
- Linked Service ,Control Flow ,datMovement
- End to End Analytics
- SQL
- SPARK (spark jobs)
- Pipelines
- PowerBI,Cosmos,Azure ML,Azure Data Lake ,GCp
- parq ,csv ,JSON
- Consumption Model : Dedicated ,Serverless
- Polybase : run SQL on external DataSources
- Data Explorer pools can be used to run near real-time analytics on large volumes of logs
-
ARM Template (JSON) - Copy paste configuration of environments
-
Synapse - Not Support data Extraction from Multiple sources
-
SQL mnaged Instance DB = Point to site VPN connections and Private endpoint
-
Databricks = no streaming ,User cannot start stop cluster
-
object Store - File / Audio ,Video Eg: Blob
-
Control Node - responsible for intreacting with app
-
Azure Cosmos DB Data Migration tool - Migaration