### Data Candidates for Cloud - ### Conversation - cloud db options - azure VM - more expensive, requires to manage many more stuff - azure sql database - pull multiple DB instances together to make a server - azure sql managed instance - create a server that we - has tools for server migration - can do linked servers, ssis agent jobs - pricing - existign sql contracts discount - longer term contracts discount - (josh) in cloud we'd need to rethink data manipulation, warehousing, history, replication - no need to adopt new ETL tools - benefits to cloud - low maintenance. (josh) willing to pay a higher price to not have to server migrations and think about updates - we miss on some of the value proposition of scalability - (adriana) redefine roles we'll need - infrastructure still - more focus on security as we are not protected by the firewall - training - (josh) I'm expecting infrastructure to find someone with cloud experience to fill in for Theresa and Bill. This will limit need for infra to touch our servers and setup new ones. - no sense for cloud storing archival data - computational only data load - types of data: transactional, archival, computational - Josh prefers full cloud or full on-prem - for machine learning sandbox - draw from common cloud source will cache data to run learning - (arash) guessing exploratory data analysis that would work better on the cloud - (arash) analytical machine learning on-prem - (adriana) high performance computing you need the data in RAM - (arash) our scale -> on-prem analysis will be more cost friendly - how to try this out with minimal risk - ### My Notes - benefits - don't manage infrastructure (easier): Database as a Service aspects: updates, security patches, monitoring, and backups - reduced dependence on infrastructure? - scalability: dynamically growing capacity to meet demand - flexibility: - not limited by physical hardware, licenses. - more easily deploy new services. - New toys like NoSQL, serverless, cloud-based IDE - integration with other cloud resources: AI/ML, monitoring, analytics, cloud backups and restore, PowerBI, Data Sources (CRM, sharepoint) - costs - makes me nervous, I don't know how to confidently answer the question of how much this will cost - how much are we paying now? - what are the expectations of decision makers around cloud costs? - concern it's going to be harder to do my job - 66% of engineering teams report a lack of visibility into cloud costs causing some level of disruption to their work - on-premise: hardware (servers, networks, firewalls) , licensing, additional software (redgate, backup, recovery), maintenance. ![The State Of Cloud Cost In 2024](https://www.cloudzero.com/wp-content/uploads/2024/03/summarize-state-of-your-could-costs.svg) - common cases where cloud doesn't make sense - extreme low latency needs - need to comply with strict compliance regulations: data residency, regulated data - companies with substantial initial on-prem investments - orgs with stable and predictable workloads - initial work - configure db: deployment scripts - instance type: choose compute capacity (cpu, ram) and storage options. - autoscaling options - ~~high availability: configure multi-az deployment or replication for fault tolerance.~~ - backups: set up automated backups and retention policies. - Azure Database Migration Service: move data from on-premise to cloud - enable monitoring for key metrics and range based notifications - cpu usage - memory usage - disk i/o - latency + execution time - network traffic - ongoing effort - security tasks: maintain secure access controls - regularly verify backup and restore - periodic performance tuning tasks - configuration tasks - ETL transition to cloud - Azure Data Factory - Redefine ETL processes to leverage cloud-native services (e.g., serverless functions, data lakes) - Use cloud-native scheduling tools to manage ETL jobs. - what cloud resources are you most interested in utilizing at work? - compute: VMs, containers, serverless(functions as a service) - storage: blob, persistent disks, files - databases: relational, NoSQL, warehouse, ETL - networking: environment (VPCs , subnets), load balancers, content delivery networks - security: identity management, encryption, secrets, firewalls - analytics: data lakes, machine learning, visualization, reporting, - monitoring: notifications, logging