diff --git a/README.md b/README.md index 06a97fcf..76c776cd 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,9 @@ In this sample you will use Microsoft Graph Data Connect to analyze emails from In this sample you will use Microsoft Graph Data Connect brings Microsoft 365 data and Azure resources to independent software vendors (ISVs). This system enables ISVs to build intelligent applications with Microsoft's most valuable data and best development tools. Microsoft 365 customers will gain innovative or industry-specific applications that enhance their productivity while keeping full control over their data. +## [Sales Analytics using M365 and Salesforce data](solutions/graph-data-sales-analytics/README.md) +This repository houses the Microsoft Data Graph Connect Solution Accelerator, which delivers a range of advantages, notably heightened insights encompassing email analytics, account sentiment analysis, and more. These insights are derived from the synergistic amalgamation of Salesforce Sales data (Opportunity) and Microsoft 365 data (Email). +
# Give us your feedback diff --git a/solutions/graph-data-sales-analytics/README.md b/solutions/graph-data-sales-analytics/README.md new file mode 100644 index 00000000..6db07edb --- /dev/null +++ b/solutions/graph-data-sales-analytics/README.md @@ -0,0 +1,86 @@ +# Microsoft Data Graph Connect Solution Accelerator + +## Contents + +1. [Solution Overview](#solution-overview) +2. [Solution Architecture](#solution-architecture) + - [Azure Services Used](#azure-services-used) +3. [Directory Structure](#directory-structure) +4. [Getting Started](#getting-started) + - [Pre-requisites](#pre-requisites) + - [Infrastructure Deployment](#infrastructure-deployment) + - [Synapse Pipeline](#synapse-pipeline) + - [PBI Report Report](#pbi-report-report) +5. [Feedback & Considerations](#feedback-considerations) + +## Solution Overview +This repository houses the Microsoft Data Graph Connect Solution Accelerator, which delivers a range of advantages, notably heightened insights encompassing email analytics, account sentiment analysis, and more. These insights are derived from the synergistic amalgamation of Salesforce Sales data (Opportunity) and Microsoft 365 data (Email). + +This solution encompasses essential components, including Azure resource deployment, Syanapse data pipelines, and a Power BI dashboard. These elements collectively empower organizations to seamlessly integrate and construct within their individual tenant environments. + +## Solution Architecture + +Upon completion of all steps, you will have a comprehensive end-to-end solution with the following architecture: + +![Architecture](docs/media/Architecture.PNG) + +### Azure Services Used + +The solution leverages the following core Azure components: + +- **Azure Synapse**: This analytics service caters to data warehouses and big data systems, centralizing data in the cloud for easy access. It offers a range of pipelines and activities, such as Data Flow and Custom activities, to connect to source data and copy it into Data Lake Storage. +- **Azure Datalake Storage**: A scalable data lake designed for high-performance analytics workloads. It stores input data and contextualized data in Delta tables within this solution. +- **Azure SQL Server and Database**: This component stores metadata used to extract data from Microsoft 365 to an ADLS (Azure Data Lake Storage) for additional processing. +- **Managed Identity**: Enables Azure resources to authenticate with cloud services. +- **Service Principal**: An Azure Active Directory application serving as the security principal for executing the data extraction process. It is responsible for creating, running, and approving data pipelines in Synapse for data extraction from Microsoft 365 to an ADLS. +- **Azure DevOps**: This component houses the source code, Infrastructure templates, and deployment pipeline files. Azure resources are deployed to an Azure resource group using deployment pipeline files. +- **Azure Key Vault**: A secure repository for storing secrets and keys crucial to the solution's operation. + +## Directory Structure + +Explore the following directories to gain insights into solution components within the solution framework: + +- **[iac](iac)**: This directory contains Bicep files required for deploying the infrastructure. + +- **[powerbi](powerbi)**: Here, you will find scripts pertinent to the solution. + +- **[synapse](synapse)**: This directory encompasses Synapse files essential for the solution. + +## Getting Started + +To begin, clone or download this repository onto your local machine and meticulously follow the instructions detailed in each of the README files. + +### Pre-requisites + +Before embarking on the setup process, ensure the following pre-requisites are met: + +- Microsoft 365 Data Connection: Establish a connection to your Microsoft 365 data for seamless integration. For detailed instructions, refer to [Microsoft 365 Data Connection Setup](https://learn.microsoft.com/en-us/viva/solutions/data-lakes/microsoft-graph-data-connect). + +- Salesforce Data Connection: Establish a connection to your Salesforce Sales data (Opportunity) for integration. Refer to [Salesforce Data Connection Setup](https://learn.microsoft.com/en-us/azure/data-factory/connector-salesforce?tabs=data-factory). + +- Azure Subscription: Ensure you have an active Azure subscription that is in the same tenant as your Microsoft 365 tenant. Both should be part of the same Azure AD tenancy. + +- Service Principal: Create an Azure Active Directory (Azure AD) Service Principal to securely access Microsoft 365 and Salesforce Data. For guidance, please refer to [Service Principal Setup Guide](https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal#register-an-application-with-azure-ad-and-create-a-service-principal). + +### Infrastructure Deployment + +For comprehensive instructions on infrastructure setup and usage, please consult the documentation [here](iac/README.md). + +### Synapse Pipeline + +For setup instructions and usage guidelines, please refer to the documentation [here](synapse/README.md). + +### Power BI Report + +Discover setup instructions and utilization guidance in the documentation [here](powerbi/README.md). Also, download the pre-created Power BI security report, designed to generate insights from data produced by the Synapse pipeline in Azure storage locations. + +Link to download Power BI template: [SalesSentimentDashboard.pbit](powerbi/SalesSentimentDashboard.pbit) + +## Feedback & Considerations + +We wholeheartedly welcome your feedback as it contributes to the refinement of our solution. + +Kindly note the following considerations: + +- Regular updates may be performed to accommodate adjustments and fixes. +- Network graph visualizations in the Power BI template are limited to 1500 nodes. diff --git a/solutions/graph-data-sales-analytics/docs/media/Architecture.PNG b/solutions/graph-data-sales-analytics/docs/media/Architecture.PNG new file mode 100644 index 00000000..23f7893f Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/Architecture.PNG differ diff --git a/solutions/graph-data-sales-analytics/docs/media/CommunicationAnalysis.png b/solutions/graph-data-sales-analytics/docs/media/CommunicationAnalysis.png new file mode 100644 index 00000000..e5fc08a6 Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/CommunicationAnalysis.png differ diff --git a/solutions/graph-data-sales-analytics/docs/media/DataModel.PNG b/solutions/graph-data-sales-analytics/docs/media/DataModel.PNG new file mode 100644 index 00000000..591476b1 Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/DataModel.PNG differ diff --git a/solutions/graph-data-sales-analytics/docs/media/DataOps.png b/solutions/graph-data-sales-analytics/docs/media/DataOps.png new file mode 100644 index 00000000..89cd66f2 Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/DataOps.png differ diff --git a/solutions/graph-data-sales-analytics/docs/media/DataTransformation.PNG b/solutions/graph-data-sales-analytics/docs/media/DataTransformation.PNG new file mode 100644 index 00000000..dc263ea6 Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/DataTransformation.PNG differ diff --git a/solutions/graph-data-sales-analytics/docs/media/Dataflow.PNG b/solutions/graph-data-sales-analytics/docs/media/Dataflow.PNG new file mode 100644 index 00000000..ec6f1ba3 Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/Dataflow.PNG differ diff --git a/solutions/graph-data-sales-analytics/docs/media/HelpInformation.png b/solutions/graph-data-sales-analytics/docs/media/HelpInformation.png new file mode 100644 index 00000000..c6ac7b0c Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/HelpInformation.png differ diff --git a/solutions/graph-data-sales-analytics/docs/media/LogicalArchitecture.png b/solutions/graph-data-sales-analytics/docs/media/LogicalArchitecture.png new file mode 100644 index 00000000..b7f2273b Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/LogicalArchitecture.png differ diff --git a/solutions/graph-data-sales-analytics/docs/media/OpportunitySummary.PNG b/solutions/graph-data-sales-analytics/docs/media/OpportunitySummary.PNG new file mode 100644 index 00000000..d99bed00 Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/OpportunitySummary.PNG differ diff --git a/solutions/graph-data-sales-analytics/docs/media/Parameters.PNG b/solutions/graph-data-sales-analytics/docs/media/Parameters.PNG new file mode 100644 index 00000000..80eb55c0 Binary files /dev/null and b/solutions/graph-data-sales-analytics/docs/media/Parameters.PNG differ diff --git a/solutions/graph-data-sales-analytics/iac/README.md b/solutions/graph-data-sales-analytics/iac/README.md new file mode 100644 index 00000000..18396459 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/README.md @@ -0,0 +1,89 @@ +# IAC Directory Structure + +## Directories and Files + +- **[arm](arm)**: Contains Bicep and JSON scripts for various Azure resources. + +- **[iac](../Graph-Data-Connect-Solution/iac/)**: Holds deployment templates and pipeline files for the resources. + + - **[bicep](../iac/bicep/)**: Contains templates for each resource, including the main deployment file "main.bicep". + + - **[modules](../iac/bicep/modules/)**: Contains individual Bicep files for each resource, organized in separate subfolders. + + - **[main.bicep](../iac/bicep/main.bicep)**: Encompasses deployment code for all resources present in the "modules" directory. + + - **[main.parameters.json](../iac/bicep/main.parameters.json)**: Holds necessary parameters to execute "main.bicep". These parameters are retrieved from variable groups to avoid hardcoding. + + - **[pipelines](../iac/pipelines/)**: Inside the "pipelines" directory, you'll find YAML files for deployment pipelines. + + - **[azure_deploy.yml](.../iac/pipelines/azure_deploy.yml)**: Outlines steps to execute "main.bicep". It specifies branch triggers, the variable group supplying values for "main.parameters.json", and deployment stages for different environments. + +# Azure Resources Deployment + +The resources in this folder can be used to deploy the required cloud services into your Azure Subscription. You have two deployment options: + +## Option 1: Deploy from Azure Portal + +To deploy directly to Azure, click the following button: + + Deploy Environment in Azure + +## Option 2: Use Azure DevOps Pipeline + +### Prerequisites +To deploy Azure resources using Azure DevOps, ensure the following prerequisites are met: + +1. **Azure DevOps Project Setup**: Set up an Azure DevOps Project and grant Basic User Access to relevant team members. Detailed instructions can be found in the [Create a project in Azure DevOps](https://learn.microsoft.com/en-us/azure/devops/organizations/projects/create-project?view=azure-devops&tabs=browser) documentation. + +2. **Azure Service Principal Configuration**: Create an Azure Service Principal with Contributor permissions for the target Azure Resource Group. Follow the steps outlined in the [Creating a service principal](https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal) guide. + +3. **Azure DevOps Service Connection**: Establish an Azure DevOps Service Connection for seamless interaction. Learn how to set up service connections in [Service Connections in Azure Pipeline](https://learn.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml). + +4. **Bicep Files**: Ensure your Bicep files are ready in your repository. The templates can be found in the provided repository ([link](../iac/bicep/)). + +5. **Pipeline Configuration**: Configure Azure DevOps pipelines using the provided YAML files. + +6. **Azure Subscription**: Verify that you have an active Azure subscription where you intend to deploy your resources. If not, you can create a new [Azure Subscription](https://azure.com/free). + +7. **Azure Resource Group**: Create an Azure Resource Group in your subscription. + +8. **Azure DevOps Variables**: Define Azure DevOps variables for your deployment parameters. More information can be found in the [Azure DevOps Variables](https://learn.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch) documentation. + +### Deployment Process + +To deploy resources to your Azure Resource Group or Subscription using Azure DevOps, follow these steps: + +#### Step 1: Clone the Repository + +Clone the repository to your Azure DevOps workspace. + +#### Step 2: Review Bicep File + +1. Navigate to the `iac/bicep` directory in your cloned repository. +2. Open the `main.bicep` file using a text editor or an IDE. +3. Review the [Bicep](../iac/bicep/) code to understand the Azure resources you are going to deploy. Make any necessary modifications or customizations. + +#### Step 3: Create an Azure DevOps Pipeline + +1. Log in to your Azure DevOps account. +2. Create a new project or use an existing one. +3. Go to the "Pipelines" section and click on "New Pipeline." +4. Choose the source repository (the one you cloned earlier) and configure your pipeline settings. + +#### Step 4: Configure Pipeline Variables + +1. Access the Azure Pipelines Library. +2. Modify Variable Values in the variable group based on deployment requirements. + +#### Step 5: Save and Run the Pipeline + +1. Save your pipeline configuration. +2. Trigger the pipeline execution to deploy Azure Resources. + +#### Step 6: Verify Deployment + +1. Once the pipeline completes successfully, log in to the Azure portal. +2. Navigate to the appropriate resource group and verify that the resources defined in the Bicep file have been deployed. + +By following these steps, you can efficiently deploy Azure resources using Azure DevOps, ensuring a smooth provisioning process and clear monitoring of deployments. + diff --git a/solutions/graph-data-sales-analytics/iac/arm/azure_deploy.json b/solutions/graph-data-sales-analytics/iac/arm/azure_deploy.json new file mode 100644 index 00000000..15d559c7 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/arm/azure_deploy.json @@ -0,0 +1,417 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "vaults_kv_gdcs_cin_name": { + "defaultValue": "[concat('kv-gdcs-cin', substring(uniquestring(resourceGroup().id),0,2))]", + "type": "String" + }, + "workspaces_syngdcscin_name": { + "defaultValue": "[concat('syngdcscin', substring(uniquestring(resourceGroup().id),0,2))]", + "type": "String" + }, + "storageAccounts_adlgdcscin_name": { + "defaultValue": "[concat('adlgdcscin', substring(uniquestring(resourceGroup().id),0,2))]", + "type": "String" + }, + "userAssignedIdentities_id_gdcs_cin_name": { + "defaultValue": "[concat('id-gdcs-cin', substring(uniquestring(resourceGroup().id), 0, 2))]", + "type": "String" + } + }, + "variables": {}, + "resources": [ + { + "type": "Microsoft.KeyVault/vaults", + "apiVersion": "2023-02-01", + "name": "[parameters('vaults_kv_gdcs_cin_name')]", + "location": "[resourceGroup().location]", + "tags": { + "environment": "default", + "location": "default" + }, + "properties": { + "sku": { + "family": "A", + "name": "standard" + }, + "tenantId": "[tenant().tenantId]", + "networkAcls": { + "bypass": "AzureServices", + "defaultAction": "Deny", + "ipRules": [], + "virtualNetworkRules": [] + }, + "accessPolicies": [], + "enabledForDeployment": false, + "enabledForDiskEncryption": false, + "enabledForTemplateDeployment": true, + "enableSoftDelete": false, + "softDeleteRetentionInDays": 7, + "enableRbacAuthorization": false, + "enablePurgeProtection": true, + "vaultUri": "[concat('https://', parameters('vaults_kv_gdcs_cin_name'), '.vault.azure.net/')]", + "provisioningState": "Succeeded", + "publicNetworkAccess": "Enabled" + } + }, + { + "type": "Microsoft.ManagedIdentity/userAssignedIdentities", + "apiVersion": "2023-01-31", + "name": "[parameters('userAssignedIdentities_id_gdcs_cin_name')]", + "location": "[resourceGroup().location]", + "tags": { + "environment": "default", + "location": "default" + } + }, + { + "type": "Microsoft.Storage/storageAccounts", + "apiVersion": "2022-09-01", + "name": "[parameters('storageAccounts_adlgdcscin_name')]", + "location": "[resourceGroup().location]", + "dependsOn": [ + "[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('userAssignedIdentities_id_gdcs_cin_name'))]" + ], + "tags": { + "environment": "default", + "location": "default" + }, + "sku": { + "name": "Standard_LRS", + "tier": "Standard" + }, + "kind": "StorageV2", + "identity": { + "type": "UserAssigned", + "userAssignedIdentities": { + "[resourceID('Microsoft.ManagedIdentity/userAssignedIdentities/',parameters('userAssignedIdentities_id_gdcs_cin_name'))]": {} + } + }, + "properties": { + "dnsEndpointType": "Standard", + "defaultToOAuthAuthentication": false, + "publicNetworkAccess": "Disabled", + "allowCrossTenantReplication": false, + "isSftpEnabled": false, + "minimumTlsVersion": "TLS1_2", + "allowBlobPublicAccess": true, + "allowSharedKeyAccess": true, + "isHnsEnabled": true, + "networkAcls": { + "bypass": "AzureServices", + "virtualNetworkRules": [], + "ipRules": [], + "defaultAction": "Deny" + }, + "supportsHttpsTrafficOnly": true, + "encryption": { + "requireInfrastructureEncryption": false, + "services": { + "file": { + "keyType": "Account", + "enabled": true + }, + "blob": { + "keyType": "Account", + "enabled": true + } + }, + "keySource": "Microsoft.Storage" + }, + "accessTier": "Hot" + } + }, + { + "type": "Microsoft.Storage/storageAccounts/blobServices", + "apiVersion": "2022-09-01", + "name": "[concat(parameters('storageAccounts_adlgdcscin_name'), '/default')]", + "dependsOn": [ + "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]" + ], + "sku": { + "name": "Standard_LRS", + "tier": "Standard" + }, + "properties": { + "containerDeleteRetentionPolicy": { + "enabled": true, + "days": 7 + }, + "cors": { + "corsRules": [] + }, + "deleteRetentionPolicy": { + "allowPermanentDelete": false, + "enabled": true, + "days": 7 + } + } + }, + { + "type": "Microsoft.Storage/storageAccounts/fileServices", + "apiVersion": "2022-09-01", + "name": "[concat(parameters('storageAccounts_adlgdcscin_name'), '/default')]", + "dependsOn": [ + "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]" + ], + "sku": { + "name": "Standard_LRS", + "tier": "Standard" + }, + "properties": { + "protocolSettings": { + "smb": {} + }, + "cors": { + "corsRules": [] + }, + "shareDeleteRetentionPolicy": { + "enabled": true, + "days": 7 + } + } + }, + { + "type": "Microsoft.Storage/storageAccounts/queueServices", + "apiVersion": "2022-09-01", + "name": "[concat(parameters('storageAccounts_adlgdcscin_name'), '/default')]", + "dependsOn": [ + "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]" + ], + "properties": { + "cors": { + "corsRules": [] + } + } + }, + { + "type": "Microsoft.Storage/storageAccounts/tableServices", + "apiVersion": "2022-09-01", + "name": "[concat(parameters('storageAccounts_adlgdcscin_name'), '/default')]", + "dependsOn": [ + "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]" + ], + "properties": { + "cors": { + "corsRules": [] + } + } + }, + { + "type": "Microsoft.Synapse/workspaces", + "apiVersion": "2021-06-01", + "name": "[parameters('workspaces_syngdcscin_name')]", + "location": "[resourceGroup().location]", + "dependsOn": [ + "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]" + ], + "tags": { + "environment": "default", + "location": "default" + }, + "identity": { + "type": "SystemAssigned, UserAssigned", + "userAssignedIdentities": { + "[resourceID('Microsoft.ManagedIdentity/userAssignedIdentities/',parameters('userAssignedIdentities_id_gdcs_cin_name'))]": {} + } + }, + "properties": { + "defaultDataLakeStorage": { + "resourceId": "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]", + "createManagedPrivateEndpoint": true, + "accountUrl": "[concat('https://', parameters('storageAccounts_adlgdcscin_name'), '.dfs.core.windows.net/')]", + "filesystem": "synapse" + }, + "encryption": {}, + "managedVirtualNetwork": "default", + "managedResourceGroupName": "synapseworkspace-managedrg", + "sqlAdministratorLogin": "sqladminuser", + "privateEndpointConnections": [], + "managedVirtualNetworkSettings": { + "preventDataExfiltration": false, + "allowedAadTenantIdsForLinking": [] + }, + "publicNetworkAccess": "Enabled", + "azureADOnlyAuthentication": false, + "trustedServiceBypassEnabled": false + } + }, + { + "type": "Microsoft.Synapse/workspaces/auditingSettings", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/Default')]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "properties": { + "retentionDays": 0, + "auditActionsAndGroups": [], + "isStorageSecondaryKeyInUse": false, + "isAzureMonitorTargetEnabled": false, + "state": "Disabled", + "storageAccountSubscriptionId": "00000000-0000-0000-0000-000000000000" + } + }, + { + "type": "Microsoft.Synapse/workspaces/azureADOnlyAuthentications", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/default')]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "properties": { + "azureADOnlyAuthentication": false + } + }, + { + "type": "Microsoft.Synapse/workspaces/bigDataPools", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/synspgdcscin')]", + "location": "[resourceGroup().location]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "tags": { + "environment": "default", + "location": "default" + }, + "properties": { + "sparkVersion": "3.3", + "nodeCount": 0, + "nodeSize": "Medium", + "nodeSizeFamily": "MemoryOptimized", + "autoScale": { + "enabled": true, + "minNodeCount": 3, + "maxNodeCount": 6 + }, + "autoPause": { + "enabled": true, + "delayInMinutes": 15 + }, + "isComputeIsolationEnabled": false, + "sessionLevelPackagesEnabled": true, + "dynamicExecutorAllocation": { + "enabled": false + }, + "isAutotuneEnabled": false, + "provisioningState": "Succeeded" + } + }, + { + "type": "Microsoft.Synapse/workspaces/dedicatedSQLminimalTlsSettings", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/default')]", + "location": "[resourceGroup().location]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "properties": { + "minimalTlsVersion": "1.2" + } + }, + { + "type": "Microsoft.Synapse/workspaces/extendedAuditingSettings", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/Default')]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "properties": { + "retentionDays": 0, + "auditActionsAndGroups": [], + "isStorageSecondaryKeyInUse": false, + "isAzureMonitorTargetEnabled": false, + "state": "Disabled", + "storageAccountSubscriptionId": "00000000-0000-0000-0000-000000000000" + } + }, + { + "type": "Microsoft.Synapse/workspaces/firewallRules", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/allow all ')]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "properties": { + "startIpAddress": "0.0.0.0", + "endIpAddress": "255.255.255.255" + } + }, + { + "type": "Microsoft.Synapse/workspaces/integrationruntimes", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/AutoResolveIntegrationRuntime')]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "properties": { + "type": "Managed", + "typeProperties": { + "computeProperties": { + "location": "AutoResolve" + } + }, + "managedVirtualNetwork": { + "referenceName": "default", + "type": "ManagedVirtualNetworkReference", + "id": "ed04dd60-ffa6-4e43-86a9-930279a6dad7" + } + } + }, + { + "type": "Microsoft.Synapse/workspaces/securityAlertPolicies", + "apiVersion": "2021-06-01", + "name": "[concat(parameters('workspaces_syngdcscin_name'), '/Default')]", + "dependsOn": [ + "[resourceId('Microsoft.Synapse/workspaces', parameters('workspaces_syngdcscin_name'))]" + ], + "properties": { + "state": "Disabled", + "disabledAlerts": [ + "" + ], + "emailAddresses": [ + "" + ], + "emailAccountAdmins": false, + "retentionDays": 0 + } + }, + { + "type": "Microsoft.Storage/storageAccounts/blobServices/containers", + "apiVersion": "2022-09-01", + "name": "[concat(parameters('storageAccounts_adlgdcscin_name'), '/default/raw')]", + "dependsOn": [ + "[resourceId('Microsoft.Storage/storageAccounts/blobServices', parameters('storageAccounts_adlgdcscin_name'), 'default')]", + "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]" + ], + "properties": { + "immutableStorageWithVersioning": { + "enabled": false + }, + "defaultEncryptionScope": "$account-encryption-key", + "denyEncryptionScopeOverride": false, + "publicAccess": "None" + } + }, + { + "type": "Microsoft.Storage/storageAccounts/blobServices/containers", + "apiVersion": "2022-09-01", + "name": "[concat(parameters('storageAccounts_adlgdcscin_name'), '/default/synapse')]", + "dependsOn": [ + "[resourceId('Microsoft.Storage/storageAccounts/blobServices', parameters('storageAccounts_adlgdcscin_name'), 'default')]", + "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_adlgdcscin_name'))]" + ], + "properties": { + "immutableStorageWithVersioning": { + "enabled": false + }, + "defaultEncryptionScope": "$account-encryption-key", + "denyEncryptionScopeOverride": false, + "publicAccess": "None" + } + } + ] +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/iac/bicep/main.bicep b/solutions/graph-data-sales-analytics/iac/bicep/main.bicep new file mode 100644 index 00000000..4f18a93c --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/main.bicep @@ -0,0 +1,227 @@ +/* +------------------------------------------------------------------------------ +PARAMETERS FOR DATAOPS RESOURCES +------------------------------------------------------------------------------ +*/ + + +param isSQLResourceExists bool + + +@description('Name of the Resource Group') +param resourceGroupName string = resourceGroup().name +//param resourceGroupName string + +@description('Location for all resources.') +param location string = resourceGroup().location + +@description('region for all resources.') +param region string + +@description('The administrator username of the SQL logical server') +param sqlAdministratorLogin string + +@description('The administrator password of the SQL logical server.') +@secure() +param sqlAdministratorLoginPassword string + +@description('Tags to add to the resources') +param tag1 string + +@description('Tags to add to the resources') +param tag2 string + +@description('project name') +param project string + +@description('deployment environment for the resources') +param env string + +@description('Value of the Subscription Id') +param subscriptionId string = subscription().subscriptionId + +// @description('Name of the Virtual Network') +// param vnetName string + +// @description('Name of the Subnet') +// param subnetName string + +@description('Name of the sql login administrator') +param administratorLogin string + +@description('Object Id of the service principle for sql login') +param administratorSid string + +@description('Tenant Id') +param tenantId string = tenant().tenantId + + +/* +------------------------------------------------------------------------------ +VARIABLES FOR DATA LANDING ZONE RESOURCES +------------------------------------------------------------------------------ +*/ +var name = concat('${project}-${region}-${env}') + +/* +------------------------------------------------------------------------------ +MODULE FOR CREATING AZURE KEY VAULT +------------------------------------------------------------------------------ +*/ + +module keyvault './modules/keyvault/keyvault.bicep' = { + name: 'kv-${name}-deployment' + params: { + location: location + project: project + region: region + //env: env + tag1: tag1 + tag2: tag2 + } +} + +/* + + +/* +------------------------------------------------------------------------------ +MODULE FOR CREATING DATA LAKE STORAGE +------------------------------------------------------------------------------ +*/ + +module DataLakeStorageModule 'modules/datalake/datalake.bicep' = { + name: 'adl-${name}-deployment' + scope: resourceGroup(resourceGroupName) + // dependsOn: [ + // SynapseModule + // ] + params: { + location: location + tag1: tag1 + tag2: tag2 + region: region + project: project + env: env + subscriptionId: subscriptionId + resourceGroupName: resourceGroupName + managed_identity_name: UserIdentityDeploy.outputs.managed_identity_name + // vnetName: vnetName + // subnetName: subnetName + + } +} + + +/* +------------------------------------------------------------------------------ +MODULE FOR CREATING INITIAL SQL SERVER AND DATABASE +------------------------------------------------------------------------------ +*/ + +module sqlServerModule 'modules/sqlserver/sqlserver.bicep' = if(!isSQLResourceExists) { + name: 'sql-${name}-deployment' + params: { + sqlAdministratorLogin: sqlAdministratorLogin + sqlAdministratorLoginPassword: sqlAdministratorLoginPassword + location: location + region: region + tag1: tag1 + tag2: tag2 + project: project + env: env + subscriptionId: subscriptionId + resourceGroupName: resourceGroupName + managed_identity_name: UserIdentityDeploy.outputs.managed_identity_name + // administratorLogin: administratorLogin + // administratorSid: administratorSid + // tenantId: tenantId + // vnetName: vnetName + // subnetName: subnetName + } +} + +/* +------------------------------------------------------------------------------ +MODULE FOR CREATING INCREMENTAL SQL SERVER AND DATABASE +------------------------------------------------------------------------------ +*/ + +module sqlServerModuleinc 'modules/sqlserver/sqlserverinc.bicep' = if(isSQLResourceExists) { + name: 'sql-${name}-deployment' + params: { + sqlAdministratorLogin: sqlAdministratorLogin + sqlAdministratorLoginPassword: sqlAdministratorLoginPassword + location: location + tag1: tag1 + tag2: tag2 + region: region + project: project + env: env + subscriptionId: subscriptionId + resourceGroupName: resourceGroupName + managed_identity_name: UserIdentityDeploy.outputs.managed_identity_name + administratorLogin: administratorLogin + administratorSid: administratorSid + tenantId: tenantId + // vnetName: vnetName + // subnetName: subnetName + } +} + +/* +------------------------------------------------------------------------------ +MODULE FOR CREATING SYNAPSE WORKSPACE +------------------------------------------------------------------------------ +*/ + +module synapseModule 'modules/synapse/synapse.bicep' = { + name: 'syn-${name}-deployment' + params: { + sqlAdministratorLogin: sqlAdministratorLogin + sqlAdministratorLoginPassword: sqlAdministratorLoginPassword + location: location + tag1: tag1 + tag2: tag2 + region: region + project: project + env: env + subscriptionId: subscriptionId + resourceGroupName: resourceGroupName + managed_identity_name: UserIdentityDeploy.outputs.managed_identity_name + adls_name: DataLakeStorageModule.outputs.adls_name + // vnetName: vnetName + // subnetName: subnetName + // resourceId: subscriptionResourceId('Microsoft.Storage/storageAccounts@2022-05-01', DataLakeStorageModule.outputs.adls_resource_id) + + } + dependsOn: [ + DataLakeStorageModule + ] +} + + +/* +-------------------------------------------------------------------------------------------------------- +CREATION OF USER ASSIGNED MANAGED IDENTITY +-------------------------------------------------------------------------------------------------------- +*/ + +module UserIdentityDeploy 'modules/managedidentity/managedidentity.bicep' = { + name: 'id-${name}-deployment' + scope: resourceGroup(resourceGroupName) + params:{ + project : project + env : env + location:location + tag1: tag1 + tag2: tag2 + region: region + } +} + +/* +------------------------------------------------------------------------------ +END OF MODULES +------------------------------------------------------------------------------ +*/ diff --git a/solutions/graph-data-sales-analytics/iac/bicep/main.parameters.json b/solutions/graph-data-sales-analytics/iac/bicep/main.parameters.json new file mode 100644 index 00000000..224a1923 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/main.parameters.json @@ -0,0 +1,39 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "project": { + "value": "gdcs" + }, + "region": { + "value": "eus2" + }, + "env": { + "value": "dev" + }, + "sqlAdministratorLogin": { + "value": "sqladminuser" + }, + "sqlAdministratorLoginPassword": { + "value": "Test@dmin@12345" + }, + "tag1": { + "value": "default" + }, + "tag2": { + "value": "default" + }, + "administratorLogin": { + "value": "default" + }, + "administratorSid": { + "value": "default" + }, + "isSQLResourceExists": { + "value": true + } + } +} + + + diff --git a/solutions/graph-data-sales-analytics/iac/bicep/modules/datalake/datalake.bicep b/solutions/graph-data-sales-analytics/iac/bicep/modules/datalake/datalake.bicep new file mode 100644 index 00000000..d7ea1183 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/modules/datalake/datalake.bicep @@ -0,0 +1,213 @@ +/* +------------------------------------------------------------------------------ +PARAMETERS FOR AZURE DATA LAKE STORAGE +------------------------------------------------------------------------------ +*/ +targetScope = 'resourceGroup' +@description('Region for deployment of resource') +param location string = resourceGroup().location + +@description('region for all resources.') +param region string + +@description('Tags to add to the resources') +param tag1 string + +@description('Tags to add to the resources') +param tag2 string + +@description('project name') +param project string + +@description('deployment environment for the resources') +param env string + +@description('Value of the Subscription Id') +param subscriptionId string = subscription().subscriptionId + +@description('Name of the resource group') +param resourceGroupName string = resourceGroup().name + +@description('Name of the Managed Identity') +param managed_identity_name string + +/* +------------------------------------------------------------------------------ +VARIABLES FOR AZURE DATA LAKE STORAGE +------------------------------------------------------------------------------ +*/ + +// Create a short, unique suffix, that will be unique to each resource group +var uniqueSuffix = substring(uniqueString(resourceGroup().id), 0, 2) + +@description('Name of the data lake storage resource') +var storageAccounts_datalake_name = concat('adls-${project}-${region}-${env}-${uniqueSuffix}') + +@description('Cleaned Name of the data lake storage resource') +var storageNameCleaned = replace(storageAccounts_datalake_name, '-', '') + + + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE DATA LAKE STORAGE +------------------------------------------------------------------------------ +*/ + +resource storage 'Microsoft.Storage/storageAccounts@2022-05-01' = { + name: storageNameCleaned + location: location + tags: { + environment: tag1 + location: tag2 + } + identity: { + type: 'UserAssigned' + userAssignedIdentities: { + '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${managed_identity_name}': {} + } + } + sku: { + name: 'Standard_LRS' + } + kind: 'StorageV2' + properties: { + dnsEndpointType: 'Standard' + defaultToOAuthAuthentication: false + allowCrossTenantReplication: false + isSftpEnabled: false + minimumTlsVersion: 'TLS1_2' + allowBlobPublicAccess: true + allowSharedKeyAccess: true + isHnsEnabled: true + networkAcls: { + bypass: 'AzureServices' + virtualNetworkRules: [] + ipRules: [] + defaultAction: 'Deny' + } + supportsHttpsTrafficOnly: true + encryption: { + requireInfrastructureEncryption: false + services: { + file: { + keyType: 'Account' + enabled: true + } + blob: { + keyType: 'Account' + enabled: true + } + } + keySource: 'Microsoft.Storage' + } + accessTier: 'Hot' + publicNetworkAccess:'Disabled' + } +} + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE DATA LAKE STORAGE BLOB SERVICES +------------------------------------------------------------------------------ +*/ + +resource storageAccounts_datalake_name_default 'Microsoft.Storage/storageAccounts/blobServices@2021-08-01' = { + parent: storage + name: 'default' + sku: { + name: 'Standard_LRS' + tier: 'Standard' + } + properties: { + containerDeleteRetentionPolicy: { + enabled: true + days: 7 + } + cors: { + corsRules: [] + } + deleteRetentionPolicy: { + enabled: true + days: 7 + } + } +resource storageAccounts_container_raw_name 'containers@2021-08-01' = { + name:'raw' + properties:{ + publicAccess:'None' + } +} +resource storageAccounts_container_synapse_name 'containers@2021-08-01' = { + name:'synapse' + properties:{ + publicAccess:'None' + } +} +} + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE DATA LAKE STORAGE FILE SERVICES +------------------------------------------------------------------------------ +*/ + +resource Microsoft_Storage_storageAccounts_fileServices_storageAccounts_datalake_name_default 'Microsoft.Storage/storageAccounts/fileServices@2021-08-01' = { + parent: storage + name: 'default' + sku: { + name: 'Standard_LRS' + tier: 'Standard' + } + properties: { + cors: { + corsRules: [] + } + shareDeleteRetentionPolicy: { + enabled: true + days: 7 + } + } +} + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE DATA LAKE STORAGE QUEUE SERVICES +------------------------------------------------------------------------------ +*/ + +resource Microsoft_Storage_storageAccounts_queueServices_storageAccounts_datalake_name_default 'Microsoft.Storage/storageAccounts/queueServices@2021-08-01' = { + parent: storage + name: 'default' + properties: { + cors: { + corsRules: [] + } + } +} + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE DATA LAKE STORAGE TABLE SERVICES +------------------------------------------------------------------------------ +*/ + +resource Microsoft_Storage_storageAccounts_tableServices_storageAccounts_datalake_name_default 'Microsoft.Storage/storageAccounts/tableServices@2021-08-01' = { + parent: storage + name: 'default' + properties: { + cors: { + corsRules: [] + } + } +} + + +/* +------------------------------------------------------------------------------ +OUTPUTS +------------------------------------------------------------------------------ +*/ +output adls_resource_id string = storage.id +output adls_name string = storage.name + diff --git a/solutions/graph-data-sales-analytics/iac/bicep/modules/keyvault/keyvault.bicep b/solutions/graph-data-sales-analytics/iac/bicep/modules/keyvault/keyvault.bicep new file mode 100644 index 00000000..29e6bdd6 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/modules/keyvault/keyvault.bicep @@ -0,0 +1,75 @@ +/* +------------------------------------------------------------------------------ +PARAMETERS FOR AZURE KEY VAULT RESOURCE +------------------------------------------------------------------------------ +*/ +@description('The Azure Region to deploy the resources into') +param location string = resourceGroup().location + +@description('region for all resources.') +param region string + +@description('project name') +param project string + +// @description('deployment environment for the resources') +// param env string + +@description('Tags to add to the resources') +param tag1 string + +@description('Tags to add to the resources') +param tag2 string + +// Create a short, unique suffix, that will be unique to each resource group +var uniqueSuffix = substring(uniqueString(resourceGroup().id), 0, 2) + +@description('The name of the Key Vault') +var keyvaultName = concat('kv-${project}-${region}-${uniqueSuffix}') + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE KEY VAULT +------------------------------------------------------------------------------ +*/ +resource keyVault 'Microsoft.KeyVault/vaults@2021-10-01' = { + name: keyvaultName + location: location + tags: { + environment: tag1 + location: tag2 + } + properties: { + enabledForDeployment: false + enabledForDiskEncryption: false + enabledForTemplateDeployment: true + enableRbacAuthorization: false + vaultUri: 'https://${keyvaultName}.vault.azure.net/' + provisioningState: 'Succeeded' + publicNetworkAccess: 'Enabled' + softDeleteRetentionInDays:7 + enableSoftDelete: false + enablePurgeProtection: true + networkAcls: { + bypass: 'AzureServices' + defaultAction: 'Deny' + ipRules: [] + virtualNetworkRules: [ + ] + } + accessPolicies: [] + sku: { + family: 'A' + name: 'standard' + } + //softDeleteRetentionInDays: 7 + tenantId: subscription().tenantId + } +} + +/* +------------------------------------------------------------------------------ +OUTPUTS +------------------------------------------------------------------------------ +*/ +output keyvaultId string = keyVault.id diff --git a/solutions/graph-data-sales-analytics/iac/bicep/modules/keyvault/keyvault_back.bicep b/solutions/graph-data-sales-analytics/iac/bicep/modules/keyvault/keyvault_back.bicep new file mode 100644 index 00000000..9300661d --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/modules/keyvault/keyvault_back.bicep @@ -0,0 +1,323 @@ +//param resource_prefix string +//param sequence_no string +param region_short string + +param tenantId string = subscription().tenantId +param location string = resourceGroup().location +param project_name string +//param vaults_BIKeyVault_name string = 'kv-${project_name}${env}${region_short}${sequence_no}-${resource_prefix}' + +//param servers_admin_name string +@description('Deployment environment') +param env string + +@description('Name of the resource') +param sqldb_metadata_name string = 'sqldb${project_name}${env}${region_short}' +param servers_metadata_name string = 'sqldbserver${project_name}${env}${region_short}' + +param sql_admin_user string +@secure() +param sql_admin_password string +//param servers_admin_sid string +param adls_resource_id string +// param sqldw_admin_user string +// @secure() +// param sqldw_admin_password string +// param sqldw_server_name string +// param sqldw_name string +@secure() +param sf_Sales_Cloud_Password string +@secure() +param sf_Sales_Cloud_SecurityToken string +// @secure() +// param storage_Account_Key string +param client_ID string +@secure() +param client_Secret_Key string +@secure() +param db_Security_Token string +param db_URL string +//param dataFactories_adf_principalId string + +var uniqueSuffix = substring(uniqueString(resourceGroup().id), 0, 2) +var vaults_BIKeyVault_name = 'kv${project_name}${env}${region_short}${uniqueSuffix}' + +// resource servers_metadata_name_resource 'Microsoft.Sql/servers@2019-06-01-preview' = { +// name: servers_metadata_name +// location: location +// tags: {} +// identity: { +// type: 'SystemAssigned' +// } +// properties: { +// administratorLogin: sql_admin_user +// administratorLoginPassword: sql_admin_password +// version: '12.0' +// minimalTlsVersion: '1.2' +// publicNetworkAccess: 'Enabled' +// } +// } + +// resource servers_metadata_name_ActiveDirectory 'Microsoft.Sql/servers/administrators@2019-06-01-preview' = { +// parent: servers_metadata_name_resource +// name: 'ActiveDirectory' +// properties: { +// administratorType: 'ActiveDirectory' +// login: servers_admin_name +// sid: servers_admin_sid +// tenantId: tenantId +// } +// } + + + +// resource servers_metadata_name_sqldb_metadata_name 'Microsoft.Sql/servers/databases@2020-08-01-preview' = { +// parent: servers_metadata_name_resource +// name: sqldb_metadata_name +// location: location +// tags: {} +// sku: { +// name: 'GP_S_Gen5' +// tier: 'GeneralPurpose' +// capacity: 4 +// family:'Gen5' +// } +// properties: { +// collation: 'SQL_Latin1_General_CP1_CI_AS' +// maxSizeBytes: 2147483648 +// catalogCollation: 'SQL_Latin1_General_CP1_CI_AS' +// zoneRedundant: false +// readScale: 'Disabled' +// storageAccountType: 'LRS' +// } +// } + + +resource vaults_kvsynmetadatadev_name_resource 'Microsoft.KeyVault/vaults@2021-11-01-preview' = { + name: vaults_BIKeyVault_name + location: location + properties: { + sku: { + family: 'A' + name: 'standard' + } + tenantId: tenantId + networkAcls: { + bypass: 'AzureServices' + defaultAction: 'Deny' + ipRules: [] + virtualNetworkRules: [ + ] + } + accessPolicies: [ + + // { + // objectId: servers_admin_sid + // permissions: { + // certificates: [ + // 'Get' + // 'List' + // 'Update' + // 'Create' + // 'Import' + // 'Delete' + // 'Recover' + // 'Backup' + // 'Restore' + // 'ManageContacts' + // 'ManageIssuers' + // 'GetIssuers' + // 'ListIssuers' + // 'SetIssuers' + // 'DeleteIssuers' + // ] + // keys: [ + // 'Get' + // 'List' + // 'Update' + // 'Create' + // 'Import' + // 'Delete' + // 'Recover' + // 'Backup' + // 'Restore' + // 'GetRotationPolicy' + // 'SetRotationPolicy' + // 'Rotate' + // ] + // secrets: [ + // 'Get' + // 'List' + // 'Set' + // 'Delete' + // 'Recover' + // 'Backup' + // 'Restore' + // ] + // } + // tenantId: tenantId + // } + // { + // objectId:dataFactories_adf_principalId + // tenantId:tenantId + // permissions:{ + // secrets:[ + // 'Get' + // 'List' + // ] + // } + // } + ] + enabledForDeployment: false + enabledForDiskEncryption: false + enabledForTemplateDeployment: true + // enableSoftDelete: false + enableRbacAuthorization: false + vaultUri: 'https://${vaults_BIKeyVault_name}.vault.azure.net/' + provisioningState: 'Succeeded' + publicNetworkAccess: 'Enabled' + softDeleteRetentionInDays:7 + } +} + +@description('Secret - sfsalescloudpassword') +resource sf_sales_cloud_password_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'sfsalescloudpassword' + properties:{ + contentType:'string' + value: sf_Sales_Cloud_Password + } +} + +@description('Secret - sfsalescloudsecuritytoken') +resource sfsalescloudsecuritytoken_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'sfsalescloudsecuritytoken' + properties:{ + contentType:'string' + value: sf_Sales_Cloud_SecurityToken + } +} + +@description('Secret - storageaccountkey') +resource storageaccountkey_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'storageaccountkey' + properties:{ + contentType:'string' + value: listKeys(adls_resource_id, '2019-04-01').keys[0].value + } +} + +@description('Secret - clientID') +resource clientID_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'clientID' + properties:{ + contentType:'string' + value: client_ID + } +} + + +@description('Secret - ClientSecretKey') +resource ClientSecretKey_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'clientSecretKey' + properties:{ + contentType:'string' + value: client_Secret_Key + } +} + +@description('Secret - dbsecuritytoken') +resource dbsecuritytoken_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'dbsecuritytoken' + properties:{ + contentType:'string' + value: db_Security_Token + } +} + +@description('Secret - sqladminusername') +resource sqladminusername_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'sqladminusername' + properties:{ + contentType:'string' + value: sql_admin_user + } +} + +@description('Secret - sqlpassword') +resource sqlpassword_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'sqlpassword' + properties:{ + contentType:'string' + value: sql_admin_password + } +} + +@description('Secret - TenantId') +resource TenantId_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'TenantId' + properties:{ + contentType:'string' + value: tenantId + } +} + +@description('Secret - dbURL') +resource dbURL_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'dbURL' + properties:{ + contentType:'string' + value: db_URL + } +} + + +resource sqldb_connection_secret_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ + parent:vaults_kvsynmetadatadev_name_resource + name: 'sqlconnectionstring' + properties:{ + contentType:'string' + value: 'Server=${servers_metadata_name}.database.windows.net;Database=${sqldb_metadata_name};User Id=${sql_admin_user};Password=${sql_admin_password}' + } +} + + +// resource sqldw_connection_secret_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ +// parent:vaults_kvadfmetadatadev_name_resource +// name: 'SynDBConnection' +// properties:{ +// contentType:'string' +// value: 'Server=${sqldw_server_name}.database.windows.net;Database=${sqldw_name};User Id=${sqldw_admin_user};Password=${sqldw_admin_password}' +// } + +// } + + + +// resource adlskey_secret_resource 'Microsoft.KeyVault/vaults/secrets@2021-11-01-preview' ={ +// parent:vaults_kvadfmetadatadev_name_resource +// name: 'ADLSKey' +// properties:{ +// contentType:'string' +// value: listKeys(adls_resource_id, '2019-04-01').keys[0].value +// } + +// } + +// output sql_db_name string = servers_metadata_name_resource.name +// output sql_db_resource_id string = servers_metadata_name_resource.id +output sql_server_name string = servers_metadata_name +output vaults_BIKeyVault_name string = vaults_BIKeyVault_name +output sqldb_connection_secret string = sqldb_connection_secret_resource.name +// output sqldw_connection_secret string = sqldw_connection_secret_resource.name +output adlskey_secret string = storageaccountkey_resource.name diff --git a/solutions/graph-data-sales-analytics/iac/bicep/modules/managedidentity/managedidentity.bicep b/solutions/graph-data-sales-analytics/iac/bicep/modules/managedidentity/managedidentity.bicep new file mode 100644 index 00000000..10a4ddd5 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/modules/managedidentity/managedidentity.bicep @@ -0,0 +1,50 @@ +/* +------------------------------------------------------------------------------ +PARAMETERS FOR AZURE MANAGED IDENTITY +------------------------------------------------------------------------------ +*/ + +@description('Location for all resources.') +param location string = resourceGroup().location + +@description('region for all resources.') +param region string + +@description('deployment environment for the resources') +param env string + +@description('project name') +param project string + +@description('Tags to add to the resources') +param tag1 string + +@description('Tags to add to the resources') +param tag2 string + +@description('Name of the Managed Identity') +var managedIdentityName = concat('id-${project}-${region}-${env}01') + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE MANAGED IDENTITY +------------------------------------------------------------------------------ +*/ + +resource managedidentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = { + name: managedIdentityName + location: location + tags: { + environment: tag1 + location: tag2 + } +} + +/* +------------------------------------------------------------------------------ +OUTPUTS +------------------------------------------------------------------------------ +*/ + +output managed_identity_resource_id string = managedidentity.id +output managed_identity_name string = managedidentity.name diff --git a/solutions/graph-data-sales-analytics/iac/bicep/modules/sqlserver/sqlserver.bicep b/solutions/graph-data-sales-analytics/iac/bicep/modules/sqlserver/sqlserver.bicep new file mode 100644 index 00000000..ccd9de4a --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/modules/sqlserver/sqlserver.bicep @@ -0,0 +1,125 @@ +/* +------------------------------------------------------------------------------ +PARAMETERS FOR AZURE SQL SERVER +------------------------------------------------------------------------------ +*/ + +@description('region for all resources.') +param region string + +@description('The administrator username of the SQL logical server') +param sqlAdministratorLogin string + +@description('The administrator password of the SQL logical server.') +@secure() +param sqlAdministratorLoginPassword string + +@description('Location for all resources.') +param location string = resourceGroup().location + +@description('project name') +param project string + +@description('deployment environment for the resources') +param env string + +@description('Tags to add to the resources') +param tag1 string + +@description('Tags to add to the resources') +param tag2 string + +@description('Value of the Subscription Id') +param subscriptionId string = subscription().subscriptionId + +@description('Name of the resource group') +param resourceGroupName string = resourceGroup().name + +@description('Name of the Managed Identity') +param managed_identity_name string + +@description('Resource ID of the managed identity') +param userAssignedIdentityId string = '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${managed_identity_name}' + +/* +------------------------------------------------------------------------------ +VARIABLES FOR AZURE SQL SERVER +------------------------------------------------------------------------------ +*/ + +var uniqueSuffix = substring(uniqueString(resourceGroup().id), 0, 2) + +@description('Name of the sql server') +var sqlServerName = concat('sqlserver-${project}-${region}-${env}-${uniqueSuffix}') + +@description('Name of the sql server') +var sqldb = concat('sqldb-${project}-${region}-${env}-${uniqueSuffix}') + +@description('Name of the sql server database') +var databaseName = '${sqlServerName}/${sqldb}' + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE SQL SERVER +------------------------------------------------------------------------------ +*/ + +resource sqlServer 'Microsoft.Sql/servers@2021-11-01-preview' = { + name: sqlServerName + location: location + tags: { + environment: tag1 + location: tag2 + } + identity: { + type: 'UserAssigned' + userAssignedIdentities: { + '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${managed_identity_name}': {} + } + } + properties: { + administratorLogin: sqlAdministratorLogin + administratorLoginPassword: sqlAdministratorLoginPassword + version: '12.0' + publicNetworkAccess: 'Enabled' + primaryUserAssignedIdentityId: userAssignedIdentityId + } + +} + + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE SQL DATABASE +------------------------------------------------------------------------------ +*/ + +resource database 'Microsoft.Sql/servers/databases@2021-11-01-preview' = { + name: databaseName + location: location + sku: { + name: 'Basic' + tier: 'Basic' + capacity: 5 + } + tags: { + displayName: databaseName + } + properties: { + collation: 'SQL_Latin1_General_CP1_CI_AS' + maxSizeBytes: 104857600 + // sampleName: 'ctsqldb' + } + dependsOn: [ + sqlServer + ] +} + + +/* +------------------------------------------------------------------------------ +OUTPUTS +------------------------------------------------------------------------------ +*/ +output sqlserver_resource_id string = sqlServer.id +output sqlserver_name string = sqlServer.name diff --git a/solutions/graph-data-sales-analytics/iac/bicep/modules/sqlserver/sqlserverinc.bicep b/solutions/graph-data-sales-analytics/iac/bicep/modules/sqlserver/sqlserverinc.bicep new file mode 100644 index 00000000..2e936db9 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/modules/sqlserver/sqlserverinc.bicep @@ -0,0 +1,144 @@ +/* +------------------------------------------------------------------------------ +PARAMETERS FOR AZURE SQL SERVER INCREMENTAL +------------------------------------------------------------------------------ +*/ +@description('region for all resources.') +param region string + +@description('The administrator username of the SQL logical server') +param sqlAdministratorLogin string + +@description('The administrator password of the SQL logical server.') +@secure() +param sqlAdministratorLoginPassword string + +@description('Location for all resources.') +param location string = resourceGroup().location + +@description('project name') +param project string + +@description('deployment environment for the resources') +param env string + +@description('Tags to add to the resources') +param tag1 string + +@description('Tags to add to the resources') +param tag2 string + +@description('Value of the Subscription Id') +param subscriptionId string = subscription().id + +@description('Name of the resource group') +param resourceGroupName string = resourceGroup().name + +@description('Name of the sql login administrator') +param administratorLogin string + +@description('Object Id of the service principle for sql login') +param administratorSid string + +@description('Tenant Id') +param tenantId string = tenant().tenantId + +@description('Name of the Managed Identity') +param managed_identity_name string + +@description('Resource ID of the managed identity') +param userAssignedIdentityId string = '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${managed_identity_name}' + +/* +------------------------------------------------------------------------------ +VARIABLES FOR AZURE SQL SERVER INCREMENTAL +------------------------------------------------------------------------------ +*/ +var uniqueSuffix = substring(uniqueString(resourceGroup().id), 0, 2) + +@description('Name of the sql server') +var sqlServerName = concat('sqlserver-${project}-${region}-${env}-${uniqueSuffix}') + +@description('Name of the sql server') +var sqldb = concat('sqldb-${project}-${region}-${env}-${uniqueSuffix}') + +@description('Name of the sql server database') +var databaseName = '${sqlServerName}/${sqldb}' + + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE SQL SERVER INCREMENTAL +------------------------------------------------------------------------------ +*/ + +resource sqlServerInc 'Microsoft.Sql/servers@2021-11-01-preview' = { + name: sqlServerName + location: location + tags: { + environment: tag1 + location: tag2 + } + identity: { + type: 'UserAssigned' + userAssignedIdentities: { + '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${managed_identity_name}': {} + } + } + properties: { + administratorLogin: sqlAdministratorLogin + administratorLoginPassword: sqlAdministratorLoginPassword + version: '12.0' + publicNetworkAccess: 'Enabled' + primaryUserAssignedIdentityId: userAssignedIdentityId + } + +} + +// SQL server Administrator +resource aad_admin 'Microsoft.Sql/servers/administrators@2022-05-01-preview' = { + name: 'ActiveDirectory' + parent: sqlServerInc + properties: { + administratorType: 'ActiveDirectory' + login: administratorLogin + sid: administratorSid + tenantId: tenantId + } +} + + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE SQL DATABASE +------------------------------------------------------------------------------ +*/ + +resource database 'Microsoft.Sql/servers/databases@2021-11-01-preview' = { + name: databaseName + location: location + sku: { + name: 'Basic' + tier: 'Basic' + capacity: 5 + } + tags: { + displayName: databaseName + } + properties: { + collation: 'SQL_Latin1_General_CP1_CI_AS' + maxSizeBytes: 104857600 + } + dependsOn: [ + sqlServerInc + ] +} + +/* +------------------------------------------------------------------------------ +OUTPUTS +------------------------------------------------------------------------------ +*/ + +output sqlserver_resource_id string = sqlServerInc.id +output sqlserver_name string = sqlServerInc.name diff --git a/solutions/graph-data-sales-analytics/iac/bicep/modules/synapse/synapse.bicep b/solutions/graph-data-sales-analytics/iac/bicep/modules/synapse/synapse.bicep new file mode 100644 index 00000000..b1299719 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/bicep/modules/synapse/synapse.bicep @@ -0,0 +1,202 @@ +/* +------------------------------------------------------------------------------ +PARAMETERS FOR AZURE SYNAPSE WORKSPACE +------------------------------------------------------------------------------ +*/ + +@description('Location for all resources.') +param location string = resourceGroup().location + +@description('region for all resources.') +param region string + +@description('project name') +param project string + +@description('deployment environment for the resources') +param env string + +@description('Tags to add to the resources') +param tag1 string + +@description('Tags to add to the resources') +param tag2 string + +@description('The administrator username of the SQL logical server') +param sqlAdministratorLogin string + +@description('The administrator password of the SQL logical server.') +@secure() +param sqlAdministratorLoginPassword string + +@description('Value of the Subscription Id') +param subscriptionId string = subscription().subscriptionId + +@description('Name of the resource group') +param resourceGroupName string = resourceGroup().name + +@description('Name of the Managed Identity') +param managed_identity_name string + + +@description('Name of the Azure Data Lake fetching from data_lake.bicep') +param adls_name string + +/* +------------------------------------------------------------------------------ +VARIABLES FOR AZURE SYNAPSE WORKSPACE +------------------------------------------------------------------------------ +*/ +var uniqueSuffix = substring(uniqueString(resourceGroup().id), 0, 2) + +@description('Name of the synapse workspace') +var SynapseWorkspace = concat('syn-${project}-${region}-${env}-${uniqueSuffix}') + +@description('Cleaned Name of the synapse workspace') +var SynapseWorkspaceCleaned = replace(SynapseWorkspace, '-', '') + +@description('Name of the Apache spark pool') +var BigDataPoolName = concat('${SynapseWorkspaceCleaned}/synsp-${project}-${region}-${uniqueSuffix}') + +@description('Cleaned Name of the Apache spark pool') +var BigDataPoolNameCleaned = replace(BigDataPoolName, '-', '') + +@description('Name of the Apache spark pool') +var sqlPoolName = concat('${SynapseWorkspaceCleaned}/syndp-${project}-${region}-${uniqueSuffix}') + +@description('Cleaned Name of the Apache spark pool') +var sqlPoolNameNameCleaned = replace(sqlPoolName, '-', '') + +@description('Name of the firewall rules name') +var firewallRulesname = concat('${SynapseWorkspaceCleaned}/allowAll') + +@description('Resource ID of the Azure Data Lake Storage') +var adls_resource_id = '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.Storage/storageAccounts/${adls_name}' + +@description('URL of the Azure Data Lake Storage') +var adlsURL = 'https://${adls_name}.dfs.core.windows.net/' + +/* +------------------------------------------------------------------------------ +CREATION OF AZURE SYNAPSE WORKSPACE +------------------------------------------------------------------------------ +*/ + +resource synapseWorkspace 'Microsoft.Synapse/workspaces@2021-06-01' = { + name: SynapseWorkspaceCleaned + location: location + tags: { + environment: tag1 + location: tag2 + } + identity: { + type: 'SystemAssigned,UserAssigned' + userAssignedIdentities: { + '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${managed_identity_name}': {} + } + } + properties: { + azureADOnlyAuthentication: false + defaultDataLakeStorage: { + accountUrl: adlsURL + createManagedPrivateEndpoint: true + filesystem: 'synapse' + resourceId: adls_resource_id + } + managedVirtualNetwork: 'default' + managedVirtualNetworkSettings: { + allowedAadTenantIdsForLinking: [] + preventDataExfiltration: false + } + publicNetworkAccess: 'Enabled' +// workspace admin id of the user group in active directory + cspWorkspaceAdminProperties: { + initialWorkspaceAdminObjectId: 'cc5f61c9-5e15-4de0-b2ff-30b21f762e17' + } + sqlAdministratorLogin: sqlAdministratorLogin + sqlAdministratorLoginPassword: sqlAdministratorLoginPassword + trustedServiceBypassEnabled: false + } +} + + +resource firewallRules 'Microsoft.Synapse/workspaces/firewallRules@2021-06-01' = { + name: firewallRulesname + dependsOn: [ + synapseWorkspace + ] + properties: { + //startIpAddress: '0.0.0.0' + //endIpAddress: '255.255.255.255' + } +} + +resource bigDataPool 'Microsoft.Synapse/workspaces/bigDataPools@2021-06-01-preview' = { + name: BigDataPoolNameCleaned + location: location + properties: { + sparkVersion: '3.3' + nodeCount: 0 + nodeSize: 'Medium' + nodeSizeFamily: 'MemoryOptimized' + autoScale: { + enabled: true + minNodeCount: 3 + maxNodeCount: 6 + } + autoPause: { + delayInMinutes: 15 + enabled: true + } + isComputeIsolationEnabled: false + sessionLevelPackagesEnabled: true + cacheSize: 50 + dynamicExecutorAllocation: { + enabled: false + // minExecutors: 1 + // maxExecutors: 5 + } + isAutotuneEnabled: false + provisioningState: 'Succeeded' + } + tags: { + environment: tag1 + location: tag2 + } + dependsOn: [ + synapseWorkspace + ] +} + + + +resource sqlPools 'Microsoft.Synapse/workspaces/sqlPools@2021-06-01' = { + parent: synapseWorkspace + name: sqlPoolNameNameCleaned + location: location // Replace this with the desired region for your SQL pool + tags: {} + sku: { + name: 'DW100c' + capacity: 0 + } + properties: { + status: 'Paused' + maxSizeBytes: 263882790666240 + collation: 'SQL_Latin1_General_CP1_CI_AS' + //creationDate: '2023-07-06T10:32:35.047Z' + storageAccountType: 'GRS' + provisioningState: 'Succeeded' + } +} + + + + +/* +------------------------------------------------------------------------------ +OUTPUTS +------------------------------------------------------------------------------ +*/ + +output sqlserver_resource_id string = synapseWorkspace.id +output sqlserver_name string = synapseWorkspace.name diff --git a/solutions/graph-data-sales-analytics/iac/pipeline/azure-pipelines.yml b/solutions/graph-data-sales-analytics/iac/pipeline/azure-pipelines.yml new file mode 100644 index 00000000..19ca0ef9 --- /dev/null +++ b/solutions/graph-data-sales-analytics/iac/pipeline/azure-pipelines.yml @@ -0,0 +1,147 @@ +###################################################### +########### Defining trigger and stages ############## +###################################################### + +trigger: +- none + +###################################################### +################# TEMPLATE VALIDATION ################ +###################################################### +stages: +- stage: Template_Validation + jobs: + - job: Bicep_Template_Validation + displayName: 'Bicep_Template_Validation' + variables: + - group: devenv_variable_group + pool: + vmImage: 'windows-latest' + steps: + - checkout: self + path: s/ + + - task: AzureCLI@2 + displayName: 'Bicep_Template_Validation' + inputs: + azureSubscription: 'sc-microsoft-graph-data' + scriptType: 'inlineScript' + scriptLocation: 'inlineScript' + inlineScript: | + az deployment group what-if --resource-group $(resource_group_name) --template-file $(template_filepath) --parameters sqlAdministratorLogin=$(sql_administrator_login) sqlAdministratorLoginPassword=$(sql_administrator_login_password) administratorLogin=$(administrator_login) administratorSid=$(administrator_sid) env=$(environment) tag1=$(tag1) tag2=$(tag2) project=$(project) region=$(region) isSQLResourceExists=$(is_sql_resource_exists) + + + +##################################################### +########DEVELOPMENT ENVIRONMENT DEPLOYMENT ######### +##################################################### + +- stage: Development_Deployment + jobs: + - deployment: Development_Deployment + displayName: Development_Environment_Setup + variables: + - group: devenv_variable_group + environment: development + strategy: + runOnce: + deploy: + steps: + - checkout: self + path: s/ + + - task: AzureResourceManagerTemplateDeployment@3 + displayName: Deploy Resources + inputs: + deploymentScope: 'Resource Group' + azureResourceManagerConnection: 'sc-microsoft-graph-data' + action: 'Create Or Update Resource Group' + resourceGroupName: $(resource_group_name) + # subscriptionId: $(SubscriptionId) + # location: $(Location) + templateLocation: 'Linked artifact' + csmFile: $(template_filepath) + csmParametersFile: $(parameters_filepath) + overrideParameters: -sqlAdministratorLogin $(sql_administrator_login) + -sqlAdministratorLoginPassword $(sql_administrator_login_password) + -project=$(project) + -env $(environment) + -tag1 $(tag1) + -tag2 $(tag2) + -administratorLogin $(administrator_login) + -administratorSid $(administrator_sid) + -region $(region) + -isSQLResourceExists $(is_sql_resource_exists) + deploymentMode: 'Incremental' + +###################################################### +################# STAGING DEPLOYMENT ################# +###################################################### + +# - stage: stg_Deployment +# jobs: +# - deployment: stg_Deployment +# displayName: stg_Deployment_Setup +# variables: +# - group: ct_stg_datamgmt_variables +# environment: staging +# strategy: +# runOnce: +# deploy: +# steps: +# - checkout: self +# path: s/ + +# - task: AzureResourceManagerTemplateDeployment@3 +# displayName: Deploy Resources +# inputs: +# deploymentScope: 'Resource Group' +# azureResourceManagerConnection: 'Datamgmt-SC' +# action: 'Create Or Update Resource Group' +# resourceGroupName: $(ResourceGroupName) +# subscriptionId: $(SubscriptionId) +# location: $(Location) +# templateLocation: 'Linked artifact' +# csmFile: $(TemplateFilePath) +# csmParametersFile: $(ParametersFilePath) + # overrideParameters: -sqlAdministratorLogin $(sqlAdministratorLogin) + # -sqlAdministratorLoginPassword $(sqlAdministratorLoginPassword) +# deploymentMode: 'Incremental' + +###################################################### +################# PRODUCTION DEPLOYMENT ############## +###################################################### + +# - stage: prod_Deployment +# jobs: +# - deployment: prod_Deployment +# displayName: prod_Deployment_Setup +# variables: +# - group: ct_prod_datamgmt_variables +# environment: production +# strategy: +# runOnce: +# deploy: +# steps: +# - checkout: self +# path: s/ + +# - task: AzureResourceManagerTemplateDeployment@3 +# displayName: Deploy Resources +# inputs: +# deploymentScope: 'Resource Group' +# azureResourceManagerConnection: 'Datamgmt-SC' +# action: 'Create Or Update Resource Group' +# resourceGroupName: $(ResourceGroupName) +# subscriptionId: $(SubscriptionId) +# location: $(Location) +# templateLocation: 'Linked artifact' +# csmFile: $(TemplateFilePath) +# csmParametersFile: $(ParametersFilePath) + # overrideParameters: -sqlAdministratorLogin $(sqlAdministratorLogin) + # -sqlAdministratorLoginPassword $(sqlAdministratorLoginPassword) +# deploymentMode: 'Incremental' + +############################################################################### +############################################################################### +############################################################################### \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/powerbi/README.md b/solutions/graph-data-sales-analytics/powerbi/README.md new file mode 100644 index 00000000..49b92808 --- /dev/null +++ b/solutions/graph-data-sales-analytics/powerbi/README.md @@ -0,0 +1,79 @@ + +# Power BI Sales Sentiment Dashboard + +## Data Sources: + +1. Salesforce + - Opportunity: Tracks potential sales deals with details like amount, stage, and close date. + - Contact: Stores individual's contact information for effective customer communication. + - Account: Manages company or organization details to monitor customer relationships. + - Employee: Maintains user information and permissions within the Salesforce platform + + 2. Microsoft 365 + - Messages: Stores email communication data, including sender, recipient, and content. + + +### **Data Connection** + + - **Azure Syanapse Analtics SQL** : Import Mode + + - **Access Power BI Report Using Parameters**: + - Configured parameterization for the Server Name and Database Name, integrated dynamic M queries for each table. + - Upon downloading the PowerBI template file, open it, and input the parameter values as given below, providing the respective data source parameters. + +![Parameter Configuration Dialog](../docs/media/Parameters.PNG) + +
+ + + +### **Data Model** + + - Data model created, enabling data integration and analysis of Salesforce and M-365 data source in a centralized platform, supporting data-driven decision-making. + +![Data model](../docs/media/DataModel.PNG) + + + + +## KPIs & Measures: + + +### Opportunity Summary: + +![opportunitiy Summary](../docs/media/OpportunitySummary.PNG) + +- **Analyze Revenue trend over time**: Track the revenue generated from opportunities over different time periods, enabling you to identify growth patterns, seasonality, trends, and opportunity lost. +- **Sentiment by Account or Opportunity**: Evaluate the sentiment associated with each account or opportunity, providing insights into customer satisfaction and potential areas for improvement. +- **Opportunity Closed, Open and Lost**: Get a comprehensive overview of all opportunities, including key metrics and important details, allowing you to assess the overall performance briefly. +- **Revenue by accounts**: Gain visibility into revenue generated from different accounts, helping you prioritize and focus on high-value customers for continued growth. + +### Communication Analysis: +![Communication Analysis](../docs/media/CommunicationAnalysis.png) + +- **Opportunity Interactions** : Enables sales teams to track account and opportunity data in a structured format. Facilitates analysis of conversation counts, identifying engagement levels with accounts. Last interaction timestamp helps prioritize follow-ups and ensures timely communication. Provides an overview of the sales pipeline, allowing for better sales forecasting and planning. +- **Communication Map** : Offers a visual representation of email interactions, making it easier to identify key communication patterns and relationships. Helps sales and customer support teams understand the most frequent interactions between email addresses. +- **Email Communications** : Empowers teams to analyze email conversations in detail, improving customer communication. Enables understanding of sentiment scores, aiding in gauging customer satisfaction, and addressing concerns promptly. + +### Help Information: + +![Help Information Overview](../docs/media/HelpInformation.png) + +#### **Navigation** + +- **Opportunity Summary** : + The opportunity summary provides insights into revenue trends, sentiment analysis by account and opportunity, enabling better decision-making, and identifying growth patterns, customer satisfaction, and areas for improvement. + The Revenue Trend and Opportunity Status provide real-time insights into sales performance, enabling data-driven decisions to optimize revenue streams and sales strategies. + Revenue By Account helps identify top-performing accounts, guiding businesses to focus on high-value clients and improve customer retention + +- **Communication Analysis** : + The Opportunity Interaction KPI identifies bottlenecks in the sales process, allowing sales teams to streamline workflows and increase conversion rates. + Sentiment Analysis and Communication Map offer a deeper understanding of customer satisfaction and engagement, facilitating better relationship management and targeted communication efforts. + Email Communication data aids in tracking interactions with prospects, optimizing outreach efforts, and enhancing overall communication efficiency. + +#### **Explainability** + +- Overall Sentiment Score: The model combines all the sentence-level scores and calculates their average. This average score represents the overall sentiment of the entire email. It's like a summary of how the whole email feels, without favoring any specific sentence. +- Word-Level Analysis: The model examines the impact of each word in the emails by calculating scores for each word. It pays attention only to meaningful words and ignores common words like "a," "an," and "the," which don't contribute much to the sentiment. +- VADER Sentiment Analysis: The model uses a tool called VADER for sentiment analysis. It's like a dictionary that understands the emotional meaning of words, helping us figure out how positive, negative, or neutral the email is. For more details about VADER and how the sentiment analysis is done, you can check out this link: VADER Sentiment Analysis. +- Score Range : Negative 0 - 40% , Neutral 41- 60% ,Positive 61- 100% diff --git a/solutions/graph-data-sales-analytics/powerbi/SalesSentimentDashboard.pbit b/solutions/graph-data-sales-analytics/powerbi/SalesSentimentDashboard.pbit new file mode 100644 index 00000000..fdcfc91d Binary files /dev/null and b/solutions/graph-data-sales-analytics/powerbi/SalesSentimentDashboard.pbit differ diff --git a/solutions/graph-data-sales-analytics/synapse/README.md b/solutions/graph-data-sales-analytics/synapse/README.md new file mode 100644 index 00000000..4a50979c --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/README.md @@ -0,0 +1,49 @@ +# **Azure Synapse Directory Structure** + +## **Directories and Files** + +- **[credential](credential)** + - **[WorkspaceSystemIdentity](credential/WorkspaceSystemIdentity.json):** This JSON file contains connection details for the Azure Synapse workspace. + +- **[dataset](dataset)** + - **[DS_Binary](dataset/DS_Binary.json):** Defines a dataset for loading binary format data into ADLS within an Azure Synapse workspace. + - **[DS_CSV](dataset/DS_CSV.json):** Specifies a dataset for loading CSV format data into ADLS within an Azure Synapse workspace. + - **[DS_Microsoft365](dataset/DS_Microsoft365.json):** Describes a dataset for importing data from Microsoft 365 using a linked M365 service within an Azure Synapse workspace. + - **[DS_Parquet](dataset/DS_Parquet.json):** Contains an Azure Synapse workspace dataset for loading data in Parquet format into ADLS. + - **[DS_Json](dataset/DS_Json.json):** Provides an Azure Synapse workspace dataset for loading data in JSON format into ADLS. + +- **[integrationRuntime](integrationRuntime)** + - **[AutoResolveIntegrationRuntime](integrationRuntime/AutoResolveIntegrationRuntime.json):** Configuration for managed private endpoint hosted integration runtime. + +- **[linkedService](linkedService)** + - **[LS_AzureSqlDatabase](linkedService/LS_AzureSqlDatabase.json):** Linked service for connecting to the Metadata SQL database. + - **[LS_AzureDataLakeStorage](linkedService/LS_AzureDataLakeStorage.json):** Linked service to connect to the ADLS account. + - **[LS_AzureKeyVault](linkedService/LS_AzureKeyVault.json):** Linked service providing a connection to Azure Key Vault. + - **[LS_Microsoft365](linkedService/LS_Microsoft365.json):** Linked service for establishing a connection with Microsoft 365. + - **[syngdcscindevil-WorkspaceDefaultSqlServer](linkedService/syngdcscindevil-WorkspaceDefaultSqlServer.json):** Linked service to connect to the default SQL server. + - **[syngdcscindevil-WorkspaceDefaultStorage](linkedService/syngdcscindevil-WorkspaceDefaultStorage.json):** Linked service for connecting to Azure Key Vault. + +- **[managedVirtualNetwork](managedVirtualNetwork)** + - **[managedVirtualNetwork](managedVirtualNetwork/default/managedPrivateEndpoint/synapse-ws-sql--cdp-foundation-synapse):** Code for creating private endpoints to connect with Synapse notebooks in ADF. + +- **[notebook](notebook)** + - **[variables](notebook/variables.json)** : Notebook containing common variables used across notebooks. + - **[Saslesforce_SourceToRaw](notebook/Saslesforce_SourceToRaw.json)** : Synapse notebook for copying data from Salesforce to Raw ADLS layer. + - **[RawToBronze](notebook/RawToBronze.json)** : Synapse notebook for transforming data from Raw to Bronze stage. + - **[BronzeToSilver](notebook/BronzeToSilver.json):** Synapse notebook for transforming data from Bronze to Silver stage. + - **[SilverToGold](notebook/SilverToGold.json):** Synapse notebook for transforming data from Silver to Gold stage. + - **[M365_Silver_To_Gold](notebook/M365_Silver_To_Gold.json):** Synapse notebook for transforming M365 data from Silver to Gold stage. + - **[Sentiment_code_nltk](notebook/Sentiment_code_nltk.json):** Synapse notebook for sentiment analysis on M365 data. + +- **[pipeline](pipeline)** + - **[pl_source_to_raw](pipeline/pl_source_to_raw.json):** JSON script for deploying the source to raw notebook in Synapse. + - **[pl_raw_to_bronze](pipeline/pl_raw_to_bronze.json):** JSON script for deploying the raw to bronze notebook in Synapse. + - **[pl_bronze_to_silver](pipeline/pl_bronze_to_silver.json):** JSON script for deploying the bronze to silver notebook in Synapse. + - **[pl_silver_to_gold](pipeline/pl_silver_to_gold.json):** JSON script for deploying the silver to gold notebook in Synapse. + +- **[sqlscript](sqlscript)** + - **[ExternalTables](sqlscript/ExternalTables.json):** SQL script to generate external tables that are utilized within Power BI. + + +> **Note:** The provided code/solution has been tested on sample data to ensure its functionality. However, it is highly advised and best practice to thoroughly test the code on actual, real-world data before deploying it in a production environment. + diff --git a/solutions/graph-data-sales-analytics/synapse/credential/WorkspaceSystemIdentity.json b/solutions/graph-data-sales-analytics/synapse/credential/WorkspaceSystemIdentity.json new file mode 100644 index 00000000..5dfc3fae --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/credential/WorkspaceSystemIdentity.json @@ -0,0 +1,6 @@ +{ + "name": "WorkspaceSystemIdentity", + "properties": { + "type": "ManagedIdentity" + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataflow/DF_M365.json b/solutions/graph-data-sales-analytics/synapse/dataflow/DF_M365.json new file mode 100644 index 00000000..8ec6f0f8 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataflow/DF_M365.json @@ -0,0 +1,87 @@ +{ + "name": "DF_M365", + "properties": { + "type": "MappingDataFlow", + "typeProperties": { + "sources": [ + { + "linkedService": { + "referenceName": "LS_Microsoft365", + "type": "LinkedServiceReference" + }, + "name": "M365Source", + "description": "Import data from Microsoft 365" + } + ], + "sinks": [ + { + "dataset": { + "referenceName": "DS_Parquet", + "type": "DatasetReference" + }, + "name": "sinkADLS" + } + ], + "transformations": [], + "scriptLines": [ + "source(output(", + " id as string,", + " createdDateTime as timestamp,", + " lastModifiedDateTime as timestamp,", + " receivedDateTime as timestamp,", + " sentDateTime as timestamp,", + " subject as string,", + " sender as (emailAddress as (address as string, name as string)),", + " from as (emailAddress as (address as string, name as string)),", + " toRecipients as (emailAddress as (address as string, name as string))[],", + " ccRecipients as string[],", + " bccRecipients as string[],", + " replyTo as string[],", + " conversationId as string,", + " uniqueBody as (content as string, contentType as string),", + " body as (content as string, contentType as string),", + " bodyPreview as string,", + " conversationIndex as string", + " ),", + " allowSchemaDrift: true,", + " validateSchema: false,", + " store: 'microsoft365',", + " format: 'json',", + " sourceTableName: 'BasicDataSet_v0.Inbox_v1',", + " autoFlatten: false,", + " sourceAllowedGroups: ['ecd941cc-0ea7-47f6-badb-21b1e14ea6d3'],", + " sourceDateFilterColumn: 'createdDateTime',", + " sourceStartTime: (toTimestamp(1672531200000L)),", + " sourceEndTime: (toTimestamp(1690502400000L)),", + " sourceStructure: [", + " ['name' -> 'id', 'type' -> 'string'],", + " ['name' -> 'createdDateTime', 'type' -> 'string'],", + " ['name' -> 'lastModifiedDateTime', 'type' -> 'string'],", + " ['name' -> 'receivedDateTime', 'type' -> 'string'],", + " ['name' -> 'sentDateTime', 'type' -> 'string'],", + " ['name' -> 'subject', 'type' -> 'string'],", + " ['name' -> 'sender', 'type' -> 'string'],", + " ['name' -> 'from', 'type' -> 'string'],", + " ['name' -> 'toRecipients', 'type' -> 'string'],", + " ['name' -> 'ccRecipients', 'type' -> 'string'],", + " ['name' -> 'bccRecipients', 'type' -> 'string'],", + " ['name' -> 'replyTo', 'type' -> 'string'],", + " ['name' -> 'conversationId', 'type' -> 'string'],", + " ['name' -> 'uniqueBody', 'type' -> 'string'],", + " ['name' -> 'body', 'type' -> 'string'],", + " ['name' -> 'bodyPreview', 'type' -> 'string'],", + " ['name' -> 'conversationIndex', 'type' -> 'string']", + " ]) ~> M365Source", + "M365Source sink(allowSchemaDrift: true,", + " validateSchema: false,", + " format: 'parquet',", + " umask: 0022,", + " preCommands: [],", + " postCommands: [],", + " skipDuplicateMapInputs: true,", + " skipDuplicateMapOutputs: true,", + " saveOrder: 1) ~> sinkADLS" + ] + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataflow/Dataflow.json b/solutions/graph-data-sales-analytics/synapse/dataflow/Dataflow.json new file mode 100644 index 00000000..ee3254b3 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataflow/Dataflow.json @@ -0,0 +1,113 @@ +{ + "name": "Dataflow", + "properties": { + "type": "MappingDataFlow", + "typeProperties": { + "sources": [ + { + "linkedService": { + "referenceName": "LS_Microsoft365", + "type": "LinkedServiceReference" + }, + "name": "M365Source" + } + ], + "sinks": [ + { + "dataset": { + "referenceName": "DS_Json", + "type": "DatasetReference" + }, + "name": "sink1" + } + ], + "transformations": [], + "scriptLines": [ + "source(output(", + " id as string,", + " createdDateTime as string,", + " lastModifiedDateTime as string,", + " changeKey as string,", + " categories as string,", + " receivedDateTime as string,", + " sentDateTime as string,", + " hasAttachments as boolean,", + " internetMessageId as string,", + " subject as string,", + " importance as string,", + " parentFolderId as string,", + " sender as string,", + " from as string,", + " toRecipients as string,", + " ccRecipients as string,", + " bccRecipients as string,", + " replyTo as string,", + " conversationId as string,", + " uniqueBody as string,", + " isDeliveryReceiptRequested as boolean,", + " isReadReceiptRequested as boolean,", + " isRead as boolean,", + " isDraft as boolean,", + " webLink as string,", + " attachments as string,", + " inferenceClassification as string,", + " flag as string,", + " body as string,", + " bodyPreview as string,", + " conversationIndex as string", + " ),", + " allowSchemaDrift: true,", + " validateSchema: false,", + " store: 'microsoft365',", + " format: 'json',", + " sourceTableName: 'BasicDataSet_v0.Inbox_v1',", + " autoFlatten: false,", + " sourceAllowedGroups: ['ecd941cc-0ea7-47f6-badb-21b1e14ea6d3'],", + " sourceDateFilterColumn: 'createdDateTime',", + " sourceStartTime: (toTimestamp(1688169600000L)),", + " sourceEndTime: (toTimestamp(1690329600000L)),", + " sourceStructure: [", + " ['name' -> 'id', 'type' -> 'string'],", + " ['name' -> 'createdDateTime', 'type' -> 'string'],", + " ['name' -> 'lastModifiedDateTime', 'type' -> 'string'],", + " ['name' -> 'changeKey', 'type' -> 'string'],", + " ['name' -> 'categories', 'type' -> 'string'],", + " ['name' -> 'receivedDateTime', 'type' -> 'string'],", + " ['name' -> 'sentDateTime', 'type' -> 'string'],", + " ['name' -> 'hasAttachments', 'type' -> 'boolean'],", + " ['name' -> 'internetMessageId', 'type' -> 'string'],", + " ['name' -> 'subject', 'type' -> 'string'],", + " ['name' -> 'importance', 'type' -> 'string'],", + " ['name' -> 'parentFolderId', 'type' -> 'string'],", + " ['name' -> 'sender', 'type' -> 'string'],", + " ['name' -> 'from', 'type' -> 'string'],", + " ['name' -> 'toRecipients', 'type' -> 'string'],", + " ['name' -> 'ccRecipients', 'type' -> 'string'],", + " ['name' -> 'bccRecipients', 'type' -> 'string'],", + " ['name' -> 'replyTo', 'type' -> 'string'],", + " ['name' -> 'conversationId', 'type' -> 'string'],", + " ['name' -> 'uniqueBody', 'type' -> 'string'],", + " ['name' -> 'isDeliveryReceiptRequested', 'type' -> 'boolean'],", + " ['name' -> 'isReadReceiptRequested', 'type' -> 'boolean'],", + " ['name' -> 'isRead', 'type' -> 'boolean'],", + " ['name' -> 'isDraft', 'type' -> 'boolean'],", + " ['name' -> 'webLink', 'type' -> 'string'],", + " ['name' -> 'attachments', 'type' -> 'string'],", + " ['name' -> 'inferenceClassification', 'type' -> 'string'],", + " ['name' -> 'flag', 'type' -> 'string'],", + " ['name' -> 'body', 'type' -> 'string'],", + " ['name' -> 'bodyPreview', 'type' -> 'string'],", + " ['name' -> 'conversationIndex', 'type' -> 'string']", + " ]) ~> M365Source", + "M365Source sink(allowSchemaDrift: true,", + " validateSchema: false,", + " umask: 0022,", + " preCommands: [],", + " postCommands: [],", + " skipDuplicateMapInputs: true,", + " skipDuplicateMapOutputs: true,", + " saveOrder: 1) ~> sink1" + ] + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataset/DS_Binary.json b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Binary.json new file mode 100644 index 00000000..581ec7df --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Binary.json @@ -0,0 +1,18 @@ +{ + "name": "DS_Binary", + "properties": { + "linkedServiceName": { + "referenceName": "LS_AzureDataLakeStorage", + "type": "LinkedServiceReference" + }, + "annotations": [], + "type": "Binary", + "typeProperties": { + "location": { + "type": "AzureBlobFSLocation", + "folderPath": "M365", + "fileSystem": "raw" + } + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataset/DS_CSV.json b/solutions/graph-data-sales-analytics/synapse/dataset/DS_CSV.json new file mode 100644 index 00000000..c3ea523e --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataset/DS_CSV.json @@ -0,0 +1,45 @@ +{ + "name": "DS_CSV", + "properties": { + "linkedServiceName": { + "referenceName": "LS_AzureDataLakeStorage", + "type": "LinkedServiceReference" + }, + "parameters": { + "container": { + "type": "string" + }, + "directory": { + "type": "string" + }, + "filename": { + "type": "string" + } + }, + "annotations": [], + "type": "DelimitedText", + "typeProperties": { + "location": { + "type": "AzureBlobFSLocation", + "fileName": { + "value": "@dataset().filename", + "type": "Expression" + }, + "folderPath": { + "value": "@dataset().directory", + "type": "Expression" + }, + "fileSystem": { + "value": "@dataset().container", + "type": "Expression" + } + }, + "columnDelimiter": ",", + "escapeChar": "\\", + "firstRowAsHeader": true, + "quoteChar": "\"" + }, + "schema": [] + }, + "type": "Microsoft.Synapse/workspaces/datasets" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataset/DS_Json.json b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Json.json new file mode 100644 index 00000000..6c9b48ae --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Json.json @@ -0,0 +1,20 @@ +{ + "name": "DS_Json", + "properties": { + "linkedServiceName": { + "referenceName": "LS_AzureDataLakeStorage", + "type": "LinkedServiceReference" + }, + "annotations": [], + "type": "Json", + "typeProperties": { + "location": { + "type": "AzureBlobFSLocation", + "folderPath": "M365/Inbox", + "fileSystem": "raw" + } + }, + "schema": {} + }, + "type": "Microsoft.Synapse/workspaces/datasets" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataset/DS_Microsoft365.json b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Microsoft365.json new file mode 100644 index 00000000..1aabf5a2 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Microsoft365.json @@ -0,0 +1,16 @@ +{ + "name": "DS_Microsoft365", + "properties": { + "linkedServiceName": { + "referenceName": "LS_Microsoft365", + "type": "LinkedServiceReference" + }, + "annotations": [], + "type": "Office365Table", + "schema": [], + "typeProperties": { + "tableName": "BasicDataSet_v0.Inbox_v1" + } + }, + "type": "Microsoft.Synapse/workspaces/datasets" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataset/DS_Parquet.json b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Parquet.json new file mode 100644 index 00000000..46c008e4 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataset/DS_Parquet.json @@ -0,0 +1,21 @@ +{ + "name": "DS_Parquet", + "properties": { + "linkedServiceName": { + "referenceName": "LS_AzureDataLakeStorage", + "type": "LinkedServiceReference" + }, + "annotations": [], + "type": "Parquet", + "typeProperties": { + "location": { + "type": "AzureBlobFSLocation", + "folderPath": "inbox_parquet_six", + "fileSystem": "raw" + }, + "compressionCodec": "snappy" + }, + "schema": [] + }, + "type": "Microsoft.Synapse/workspaces/datasets" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/dataset/DS_SalesforceObjects.json b/solutions/graph-data-sales-analytics/synapse/dataset/DS_SalesforceObjects.json new file mode 100644 index 00000000..ce98caab --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/dataset/DS_SalesforceObjects.json @@ -0,0 +1,24 @@ +{ + "name": "DS_SalesforceObjects", + "properties": { + "linkedServiceName": { + "referenceName": "Salesforce1", + "type": "LinkedServiceReference" + }, + "parameters": { + "objectname": { + "type": "string" + } + }, + "annotations": [], + "type": "SalesforceObject", + "schema": [], + "typeProperties": { + "objectApiName": { + "value": "@dataset().objectname", + "type": "Expression" + } + } + }, + "type": "Microsoft.Synapse/workspaces/datasets" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/integrationRuntime/AutoResolveIntegrationRuntime.json b/solutions/graph-data-sales-analytics/synapse/integrationRuntime/AutoResolveIntegrationRuntime.json new file mode 100644 index 00000000..e07b108c --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/integrationRuntime/AutoResolveIntegrationRuntime.json @@ -0,0 +1,20 @@ +{ + "name": "AutoResolveIntegrationRuntime", + "properties": { + "type": "Managed", + "typeProperties": { + "computeProperties": { + "location": "AutoResolve", + "dataFlowProperties": { + "computeType": "General", + "coreCount": 8, + "timeToLive": 0 + } + } + }, + "managedVirtualNetwork": { + "type": "ManagedVirtualNetworkReference", + "referenceName": "default" + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/integrationRuntime/IntegrationRuntime.json b/solutions/graph-data-sales-analytics/synapse/integrationRuntime/IntegrationRuntime.json new file mode 100644 index 00000000..64828e2e --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/integrationRuntime/IntegrationRuntime.json @@ -0,0 +1,25 @@ +{ + "name": "IntegrationRuntime", + "properties": { + "type": "Managed", + "typeProperties": { + "computeProperties": { + "location": "Central India", + "dataFlowProperties": { + "computeType": "General", + "coreCount": 8, + "timeToLive": 10, + "cleanup": false, + "customProperties": [] + }, + "pipelineExternalComputeScaleProperties": { + "timeToLive": 60 + } + } + }, + "managedVirtualNetwork": { + "type": "ManagedVirtualNetworkReference", + "referenceName": "default" + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureDataLakeStorage.json b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureDataLakeStorage.json new file mode 100644 index 00000000..9ad1dd2a --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureDataLakeStorage.json @@ -0,0 +1,26 @@ +{ + "name": "LS_AzureDataLakeStorage", + "properties": { + "annotations": [], + "type": "AzureBlobFS", + "typeProperties": { + "url": "https://adlsgdcscindevil.dfs.core.windows.net/", + "tenant": "2d2199a8-cb98-4269-b2ae-c63cf2b7c7f0", + "servicePrincipalId": "f072c8b9-44a5-4908-bbb4-e44c05fb7728", + "servicePrincipalCredentialType": "ServicePrincipalKey", + "servicePrincipalCredential": { + "type": "AzureKeyVaultSecret", + "store": { + "referenceName": "LS_AzureKeyVault", + "type": "LinkedServiceReference" + }, + "secretName": "client-secret" + } + }, + "connectVia": { + "referenceName": "AutoResolveIntegrationRuntime", + "type": "IntegrationRuntimeReference" + } + }, + "type": "Microsoft.Synapse/workspaces/linkedservices" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureKeyVault.json b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureKeyVault.json new file mode 100644 index 00000000..c5ffd5db --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureKeyVault.json @@ -0,0 +1,11 @@ +{ + "name": "LS_AzureKeyVault", + "type": "Microsoft.Synapse/workspaces/linkedservices", + "properties": { + "annotations": [], + "type": "AzureKeyVault", + "typeProperties": { + "baseUrl": "https://kv-gdcs-cin-il.vault.azure.net/" + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureSqlDatabase.json b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureSqlDatabase.json new file mode 100644 index 00000000..7aea0617 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_AzureSqlDatabase.json @@ -0,0 +1,15 @@ +{ + "name": "LS_AzureSqlDatabase", + "type": "Microsoft.Synapse/workspaces/linkedservices", + "properties": { + "annotations": [], + "type": "AzureSqlDatabase", + "typeProperties": { + "connectionString": "Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=sqlserver-gdcs-cin-dev-il.database.windows.net;Initial Catalog=sqldb-gdcs-cin-dev-i" + }, + "connectVia": { + "referenceName": "AutoResolveIntegrationRuntime", + "type": "IntegrationRuntimeReference" + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/linkedService/LS_Microsoft365.json b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_Microsoft365.json new file mode 100644 index 00000000..76b40bc4 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/linkedService/LS_Microsoft365.json @@ -0,0 +1,27 @@ +{ + "name": "LS_Microsoft365", + "type": "Microsoft.Synapse/workspaces/linkedservices", + "properties": { + "annotations": [], + "type": "Office365", + "typeProperties": { + "office365TenantId": "2d2199a8-cb98-4269-b2ae-c63cf2b7c7f0", + "servicePrincipalTenantId": "2d2199a8-cb98-4269-b2ae-c63cf2b7c7f0", + "servicePrincipalId": "f072c8b9-44a5-4908-bbb4-e44c05fb7728", + "servicePrincipalKey": { + "type": "AzureKeyVaultSecret", + "store": { + "referenceName": "LS_AzureKeyVault", + "type": "LinkedServiceReference" + }, + "secretName": "client-secret" + }, + "office365DataDiscoveryServiceUrl": "https://ind.odds.si.office.net/DiscoveryService/", + "officeDiscoveryUrl": "https://ind.odds.si.office.net/DiscoveryService/" + }, + "connectVia": { + "referenceName": "AutoResolveIntegrationRuntime", + "type": "IntegrationRuntimeReference" + } + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/linkedService/syngdcscindevil-WorkspaceDefaultSqlServer.json b/solutions/graph-data-sales-analytics/synapse/linkedService/syngdcscindevil-WorkspaceDefaultSqlServer.json new file mode 100644 index 00000000..b09744dc --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/linkedService/syngdcscindevil-WorkspaceDefaultSqlServer.json @@ -0,0 +1,20 @@ +{ + "name": "syngdcscindevil-WorkspaceDefaultSqlServer", + "type": "Microsoft.Synapse/workspaces/linkedservices", + "properties": { + "typeProperties": { + "connectionString": "Data Source=tcp:syngdcscindevil.sql.azuresynapse.net,1433;Initial Catalog=@{linkedService().DBName}" + }, + "parameters": { + "DBName": { + "type": "String" + } + }, + "type": "AzureSqlDW", + "connectVia": { + "referenceName": "AutoResolveIntegrationRuntime", + "type": "IntegrationRuntimeReference" + }, + "annotations": [] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/linkedService/syngdcscindevil-WorkspaceDefaultStorage.json b/solutions/graph-data-sales-analytics/synapse/linkedService/syngdcscindevil-WorkspaceDefaultStorage.json new file mode 100644 index 00000000..97ebdf1f --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/linkedService/syngdcscindevil-WorkspaceDefaultStorage.json @@ -0,0 +1,15 @@ +{ + "name": "syngdcscindevil-WorkspaceDefaultStorage", + "type": "Microsoft.Synapse/workspaces/linkedservices", + "properties": { + "typeProperties": { + "url": "https://adlsgdcseus2devil.dfs.core.windows.net" + }, + "type": "AzureBlobFS", + "connectVia": { + "referenceName": "AutoResolveIntegrationRuntime", + "type": "IntegrationRuntimeReference" + }, + "annotations": [] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default.json b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default.json new file mode 100644 index 00000000..caa77f73 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default.json @@ -0,0 +1,4 @@ +{ + "name": "default", + "type": "Microsoft.Synapse/workspaces/managedVirtualNetworks" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/mpep_AzureDataLakeStorage.json b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/mpep_AzureDataLakeStorage.json new file mode 100644 index 00000000..a83b8105 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/mpep_AzureDataLakeStorage.json @@ -0,0 +1,7 @@ +{ + "name": "mpep_AzureDataLakeStorage", + "properties": { + "privateLinkResourceId": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Storage/storageAccounts/adlsgdcscindevil", + "groupId": "dfs" + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/mpep_AzureSqlDatabase.json b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/mpep_AzureSqlDatabase.json new file mode 100644 index 00000000..ba17dd33 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/mpep_AzureSqlDatabase.json @@ -0,0 +1,10 @@ +{ + "name": "mpep_AzureSqlDatabase", + "properties": { + "privateLinkResourceId": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Sql/servers/sqlserver-gdcs-cin-dev-il", + "groupId": "sqlServer", + "fqdns": [ + "sqlserver-gdcs-cin-dev-il.database.windows.net" + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/synapse-ws-sql--syngdcscindevil.json b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/synapse-ws-sql--syngdcscindevil.json new file mode 100644 index 00000000..a18994e2 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/synapse-ws-sql--syngdcscindevil.json @@ -0,0 +1,10 @@ +{ + "name": "synapse-ws-sql--syngdcscindevil", + "properties": { + "privateLinkResourceId": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil", + "groupId": "sql", + "fqdns": [ + "syngdcscindevil.70962fda-26f0-4a3d-be69-9803a6a4e367.sql.azuresynapse.net" + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/synapse-ws-sqlOnDemand--syngdcscindevil.json b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/synapse-ws-sqlOnDemand--syngdcscindevil.json new file mode 100644 index 00000000..9691f215 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/managedVirtualNetwork/default/managedPrivateEndpoint/synapse-ws-sqlOnDemand--syngdcscindevil.json @@ -0,0 +1,10 @@ +{ + "name": "synapse-ws-sqlOnDemand--syngdcscindevil", + "properties": { + "privateLinkResourceId": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil", + "groupId": "sqlOnDemand", + "fqdns": [ + "syngdcscindevil-ondemand.70962fda-26f0-4a3d-be69-9803a6a4e367.sql.azuresynapse.net" + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/BronzeToSilver.json b/solutions/graph-data-sales-analytics/synapse/notebook/BronzeToSilver.json new file mode 100644 index 00000000..9857c3b1 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/BronzeToSilver.json @@ -0,0 +1,533 @@ +{ + "name": "BronzeToSilver", + "properties": { + "folder": { + "name": "Stage3 BronzeToSilver" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "88e5f1c5-0e41-4cc7-8041-23da8cee610c" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "Synapse PySpark" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net" + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "automaticScaleJobs": false + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Import all the libraries needed" + ] + }, + { + "cell_type": "code", + "metadata": { + "collapsed": false + }, + "source": [ + "from pyspark.sql.types import *\r\n", + "from pyspark.sql import SparkSession\r\n", + "from pyspark.sql.functions import col, explode, collect_list, concat_ws, udf, expr, regexp_replace,array_contains\r\n", + "from datetime import datetime\r\n", + "from pyspark.sql import functions as F\r\n", + "import json\r\n", + "import adal\r\n", + "import pyodbc\r\n", + "import struct\r\n", + "import os\r\n", + "import pandas as pd" + ], + "execution_count": 28 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## calling variable file to intitialize the variables" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "%run config/variables" + ], + "execution_count": 29 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Set global constants" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "tags": [ + "parameters" + ] + }, + "source": [ + "input_container = \"bronze\"\r\n", + "output_container = \"silver\"\r\n", + "cloud_alias = \"SFSC\"" + ], + "execution_count": 30 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Generate input and utput container URL's" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "input_url = f'abfss://{input_container}@{adls_name}.dfs.core.windows.net/{cloud_alias}'\r\n", + "output_url = f'abfss://{output_container}@{adls_name}.dfs.core.windows.net/{cloud_alias}'" + ], + "execution_count": 33 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create connection to database and set the cursor" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "conn = pyodbc.connect(connString, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct});\r\n", + "cursor = conn.cursor()" + ], + "execution_count": 31 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Get all entities" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "entities = mssparkutils.fs.ls(input_url)\r\n", + "\r\n", + "for entity in entities:\r\n", + " entity_name = os.path.splitext(entity.name)[0]\r\n", + " print(entity_name)" + ], + "execution_count": 35 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Read CSV and save dataframe in delta format at desired location" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "for entity in entities:\r\n", + " entity_name = os.path.splitext(entity.name)[0]\r\n", + "\r\n", + " watermark = cursor.execute(f\"SELECT [SilverWatermark] FROM [dbo].[MetadataSalesCloud] WHERE [EntityName] = '{entity_name}' AND [CloudAbbreviation] = '{cloud_alias}'\") \\\r\n", + " .fetchall()\r\n", + "\r\n", + " print(entity_name)\r\n", + " watermark = watermark[0][0]\r\n", + " print('LastTimestamp:', watermark)\r\n", + "\r\n", + "\r\n", + " try:\r\n", + " df = spark.read.format('delta')\\\r\n", + " .option(\"linesep\", \"\\n\")\\\r\n", + " .option(\"header\", \"true\")\\\r\n", + " .option(\"sep\", \"`\")\\\r\n", + " .option(\"multiLine\",'true')\\\r\n", + " .load(f'{input_url}/{entity_name}')\r\n", + "\r\n", + " \r\n", + " dataframe = df\r\n", + "\r\n", + " ## replace missing values with string 'NA'\r\n", + " dataframe = dataframe.na.fill(value='NA')\r\n", + "\r\n", + " ## remove any duplicate rows\r\n", + " uniquedf = dataframe.dropDuplicates()\r\n", + "\r\n", + " uniquedf.write.format('delta').mode('overwrite').save(f'{output_url}/{entity_name}')\r\n", + "\r\n", + " cursor.execute(f\"UPDATE [dbo].[MetadataSalesCloud] SET [SilverWatermark] = ? WHERE [EntityName] = ? AND [CloudAbbreviation] = ?\", datetime.utcnow(), entity_name, cloud_alias)\r\n", + " cursor.commit()\r\n", + "\r\n", + " print('Records written in delta table: ',df.count(),' for Entity: ', entity_name)\r\n", + "\r\n", + " except Exception as err:\r\n", + " print(err)\r\n", + "" + ], + "execution_count": 36 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Transform M365 and Salesforce data" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "container = \"silver\"\r\n", + "salesforce_alias = \"SFSC\"\r\n", + "email_alias = \"M365\"\r\n", + "\r\n", + "sfsc_silver_url = f'abfss://{container}@{adls_name}.dfs.core.windows.net/{salesforce_alias}'\r\n", + "email_silver_url = f'abfss://{container}@{adls_name}.dfs.core.windows.net/{email_alias}'" + ], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create Unique list of emails from sender, to, cc, bcc, from fields" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Define a function to extract unique emails from each item in DataFrame\r\n", + "def extract_unique_emails(email_json):\r\n", + " unique_emails = set()\r\n", + " unique_emails_to = set()\r\n", + "\r\n", + " # Extract sender and from email\r\n", + " unique_emails.add(email_json['sender']['emailAddress']['address'])\r\n", + " unique_emails.add(email_json['from']['emailAddress']['address'])\r\n", + "\r\n", + " # Extract from to, cc, and bcc email addresses\r\n", + " email_fields = ['toRecipients','ccRecipients','bccRecipients']\r\n", + " for field in email_fields:\r\n", + " for recipient in email_json[field]:\r\n", + " if recipient is not None:\r\n", + " unique_emails.add(recipient['emailAddress']['address'])\r\n", + " if field == 'toRecipients':\r\n", + " unique_emails_to.add(recipient['emailAddress']['address'])\r\n", + "\r\n", + " return list(unique_emails), list(unique_emails_to)\r\n", + "\r\n", + "# Register the UDF to be used in DataFrame transformations\r\n", + "spark.udf.register(\"extract_unique_emails\", extract_unique_emails, ArrayType(StringType()))\r\n", + "\r\n", + "# Use DataFrame functions to extract and concatenate unique email addresses for each row\r\n", + "parquet_df_with_emails = parquet_df.withColumn(\"email_lists\", expr(\"extract_unique_emails(struct(*))\"))\r\n", + "parquet_df_with_emails = parquet_df_with_emails.withColumn(\"unique_email_list\", parquet_df_with_emails[\"email_lists\"][0])\r\n", + "parquet_df_with_emails = parquet_df_with_emails.withColumn(\"to_recipient_list\", parquet_df_with_emails[\"email_lists\"][1])\r\n", + "\r\n", + "# Drop the temporary column \"email_lists\" as it is no longer needed\r\n", + "parquet_df_with_emails = parquet_df_with_emails.drop(\"email_lists\")\r\n", + "\r\n", + "parquet_df_with_emails.write.format('delta').mode('overwrite').save(f'{email_silver_url}/Email')" + ], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create unique list of Employee and Client from Salesforce based on Opportunity" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Define the entity name and corresponding column names\r\n", + "entity_columns = {\r\n", + " \"Account\": ['Id', 'IsDeleted', 'Name', 'Phone', 'AccountNumber', 'Website', 'Industry', 'AnnualRevenue', 'Description', 'Rating', 'OwnerId', 'LastModifiedDate'],\r\n", + " \"Opportunity\": ['Id' ,'IsDeleted' ,'AccountId' ,'Name' ,'Description' ,'Amount' ,'CloseDate' ,'Type' ,'IsClosed' ,'IsWon' ,'OwnerId' ,'CreatedDate' ,'LastModifiedDate' ,'ContactId'],\r\n", + " \"Contact\": ['Id' ,'IsDeleted' ,'AccountId' ,'LastName' ,'FirstName' ,'Name' ,'Phone' ,'Email' ,'OwnerId' ,'CreatedDate' ,'LastModifiedDate'],\r\n", + " \"User\": ['Id' ,'AccountId' ,'Address' ,'City' ,'ContactId' ,'CreatedById' ,'CreatedDate' ,'Department' ,'Name' ,'Email' ,'EmployeeNumber' ,'IsActive' ,'LastModifiedById' ,'LastModifiedDate' ,'MobilePhone']\r\n", + "}\r\n", + "\r\n", + "entities = mssparkutils.fs.ls(sfsc_silver_url)\r\n", + "\r\n", + "# Read the entities and store them in a dictionary\r\n", + "entity_dataframes = {}\r\n", + "for entity in entities:\r\n", + " entity_name = os.path.splitext(entity.name)[0]\r\n", + "\r\n", + " df_entity = spark.read.format('delta')\\\r\n", + " .option(\"linesep\", \"\\n\")\\\r\n", + " .option(\"header\", \"true\")\\\r\n", + " .option(\"sep\", \"`\")\\\r\n", + " .option(\"multiLine\", \"true\")\\\r\n", + " .load(f'{sfsc_silver_url}/{entity_name}')\r\n", + "\r\n", + " selected_columns = entity_columns.get(entity_name)\r\n", + " df_final_entity = df_entity.select(*selected_columns)\r\n", + "\r\n", + " entity_dataframes[entity_name] = df_final_entity\r\n", + "\r\n", + "# Cache the DataFrames for reuse\r\n", + "entity_dataframes[\"Opportunity\"].cache()\r\n", + "entity_dataframes[\"User\"].cache()\r\n", + "entity_dataframes[\"Contact\"].cache()\r\n", + "\r\n", + "\r\n", + "# Perform the joins on DataFrames directly\r\n", + "df_EmployeeClientUniqueList = entity_dataframes[\"Opportunity\"].alias(\"opp\")\\\r\n", + " .join(entity_dataframes[\"User\"].alias(\"usr\"), col(\"opp.OwnerId\") == col(\"usr.Id\"))\\\r\n", + " .join(entity_dataframes[\"Contact\"].alias(\"ct\"), col(\"ct.Id\") == col(\"opp.ContactId\"))\\\r\n", + " .select(\r\n", + " col(\"opp.Id\").alias(\"OpportunityId\"),\r\n", + " col(\"usr.Email\").alias(\"EmployeeEmail\"),\r\n", + " col(\"ct.Email\").alias(\"ClientEmail\")\r\n", + " )\\\r\n", + " .distinct()\\\r\n", + " .filter(col(\"ClientEmail\") != \"NA\")\\\r\n", + " .filter(col(\"ClientEmail\").isNotNull()) # Filter records where ClientEmail is not null\r\n", + "\r\n", + "# Replace the '.invalid' in the 'EmployeeEmail' column with an empty string\r\n", + "df_EmployeeClientUniqueList = df_EmployeeClientUniqueList.withColumn('EmployeeEmail', regexp_replace(col('EmployeeEmail'), '\\.invalid$', ''))\r\n", + "\r\n", + "# Collect the data as a list of rows\r\n", + "rows = df_EmployeeClientUniqueList.collect()\r\n", + "\r\n", + "# Create a list comprehension to build the email_pairs_list directly\r\n", + "opp_email_pairs_list = [[row['OpportunityId'], row['EmployeeEmail'], row['ClientEmail']] for row in rows]\r\n", + "\r\n", + "# # Print the list of lists with pairs of email addresses\r\n", + "# print(\"List of EmployeeEmail and ClientEmail Pairs:\")\r\n", + "# print(opp_email_pairs_list)\r\n", + "" + ], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "email_pairs_rdd = spark.sparkContext.parallelize(opp_email_pairs_list)\r\n", + "email_pairs_df = email_pairs_rdd.map(lambda x: Row(EmployeeEmail=x[1], opportunityId=x[0])).toDF()\r\n", + "\r\n", + "email_pairs_df.write.format('delta').mode('overwrite').save(f'{sfsc_silver_url}/OppEmployeeClientEmail')" + ], + "execution_count": null + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/M365_Silver_To_Gold.json b/solutions/graph-data-sales-analytics/synapse/notebook/M365_Silver_To_Gold.json new file mode 100644 index 00000000..1bdb1acf --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/M365_Silver_To_Gold.json @@ -0,0 +1,511 @@ +{ + "name": "M365_Silver_To_Gold", + "properties": { + "folder": { + "name": "Stage4 SilverToGold" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "d70d390b-ac55-4019-8bd5-dee3ef69f1ea" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "Synapse PySpark" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net" + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "automaticScaleJobs": false + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "code", + "source": [ + "from pyspark.sql.types import *\r\n", + "from pyspark.sql import SparkSession\r\n", + "from pyspark.sql.functions import col, explode, collect_list, concat_ws, udf, expr, regexp_replace,array_contains\r\n", + "from datetime import datetime\r\n", + "from sklearn.feature_extraction.text import TfidfVectorizer\r\n", + "from sklearn.metrics.pairwise import cosine_similarity\r\n", + "from pyspark.sql import Row\r\n", + "from bs4 import BeautifulSoup\r\n", + "from pyspark.ml.feature import HashingTF, Tokenizer\r\n", + "from pyspark.ml.feature import MinHashLSH\r\n", + "from pyspark.ml import Pipeline\r\n", + "from pyspark.sql import functions as F\r\n", + "\r\n", + "import json\r\n", + "import adal\r\n", + "import pyodbc\r\n", + "import struct\r\n", + "import os\r\n", + "import pandas as pd" + ], + "execution_count": 2 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Initialize the variables" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "%run config/variables" + ], + "execution_count": 3 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Setup the global variables" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "input_container = \"silver\"\r\n", + "salesforce_alias = \"SFSC\"\r\n", + "\r\n", + "output_container = \"gold\"\r\n", + "email_alias = \"M365\"\r\n", + "\r\n", + "input_url_sfsc = f'abfss://{input_container}@{adls_name}.dfs.core.windows.net/{salesforce_alias}'\r\n", + "input_url_m365 = f'abfss://{input_container}@{adls_name}.dfs.core.windows.net/{email_alias}'\r\n", + "output_url = f'abfss://{output_container}@{adls_name}.dfs.core.windows.net/{email_alias}'" + ], + "execution_count": 4 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "parquet_df_with_emails = spark.read.format('delta')\\\r\n", + " .option(\"linesep\", \"\\n\")\\\r\n", + " .option(\"header\", \"true\")\\\r\n", + " .option(\"multiLine\",'true')\\\r\n", + " .load(f'{input_url_m365}/Email')" + ], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "# Select the desired columns\r\n", + "selected_columns = [\r\n", + " \"id\",\r\n", + " \"sentDateTime\",\r\n", + " \"lastModifiedDateTime\",\r\n", + " \"subject\",\r\n", + " \"bodyPreview\",\r\n", + " \"conversationId\",\r\n", + " \"conversationIndex\",\r\n", + " \"uniqueBody.content as uniqueBody_content\",\r\n", + " \"unique_email_list\",\r\n", + " \"from.emailAddress.address as from\",\r\n", + " \"to_recipient_list\"\r\n", + "]\r\n", + "\r\n", + "# Select the desired columns and create the final DataFrame\r\n", + "email_df = parquet_df_with_emails.selectExpr(*selected_columns)\r\n", + "\r\n", + "# # Show the resulting DataFrame\r\n", + "# display(email_df)" + ], + "execution_count": 14 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Filter the emails based on the unique list of EmployeeClient" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "email_pairs_df = spark.read.format('delta')\\\r\n", + " .option(\"linesep\", \"\\n\")\\\r\n", + " .option(\"header\", \"true\")\\\r\n", + " .option(\"multiLine\",'true')\\\r\n", + " .load(f'{input_url_sfsc}/OppEmployeeClientEmail')" + ], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Step 1: Tokenize and create TF vectors for email_pairs_df\r\n", + "tokenizer = Tokenizer(inputCol=\"EmployeeEmail\", outputCol=\"words\")\r\n", + "email_pairs_df = tokenizer.transform(email_pairs_df)\r\n", + "\r\n", + "# Step 2: Flatten the array of strings into a single string using concat_ws\r\n", + "email_pairs_df = email_pairs_df.withColumn(\"EmployeeEmail\", F.concat_ws(\", \", \"words\"))\r\n", + "\r\n", + "# Step 3: Create TF vectors for email_pairs_df\r\n", + "hashingTF = HashingTF(inputCol=\"words\", outputCol=\"tf_features\")\r\n", + "email_pairs_df = hashingTF.transform(email_pairs_df)\r\n", + "\r\n", + "# Step 4: Convert the list of emails into a single string representation in email_df\r\n", + "email_df = email_df.withColumn(\"unique_email_list_str\", F.concat_ws(\", \", \"unique_email_list\"))\r\n", + "\r\n", + "# Step 5: Tokenize and create TF vectors for email_df\r\n", + "tokenizer = Tokenizer(inputCol=\"unique_email_list_str\", outputCol=\"words\")\r\n", + "df_created_from_email_features = tokenizer.transform(email_df)\r\n", + "\r\n", + "# Step 6: Create TF vectors for df_created_from_email_features\r\n", + "hashingTF = HashingTF(inputCol=\"words\", outputCol=\"tf_features\")\r\n", + "df_created_from_email_features = hashingTF.transform(df_created_from_email_features)\r\n", + "\r\n", + "# Step 7: Apply MinHashLSH to find similar email addresses between email_pairs_df and df_created_from_email_features\r\n", + "num_hash_tables = 5\r\n", + "min_hash_lsh = MinHashLSH(inputCol=\"tf_features\", outputCol=\"hashes\", numHashTables=num_hash_tables)\r\n", + "model = min_hash_lsh.fit(email_pairs_df)\r\n", + "df_similar_emails = model.approxSimilarityJoin(email_pairs_df, df_created_from_email_features, 0)\r\n", + "\r\n", + "# Step 8: Extract the relevant information from the joined dataframe\r\n", + "df_comparison_result = df_similar_emails.select(\r\n", + " F.col(\"datasetA.EmployeeEmail\").alias(\"EmployeeClient_Email\"),\r\n", + " F.col(\"datasetA.opportunityId\").alias(\"opportunityId\"),\r\n", + " F.col(\"datasetB.id\").alias(\"id\"),\r\n", + " F.col(\"datasetB.unique_email_list\").alias(\"email from unique_email_list\")\r\n", + ")\r\n", + "\r\n", + "# Step 9: Show the comparison result\r\n", + "df_comparison_result.show(truncate=False)" + ], + "execution_count": 9 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "entity_name = \"Opportunity\"\r\n", + "df_opportunity = spark.read.format('delta')\\\r\n", + " .option(\"linesep\", \"\\n\")\\\r\n", + " .option(\"header\", \"true\")\\\r\n", + " .option(\"sep\", \"`\")\\\r\n", + " .option(\"multiLine\",'true')\\\r\n", + " .load(f'{input_url_sfsc}/{entity_name}')\r\n", + "\r\n", + "\r\n", + "df_opportunity = df_opportunity.select(\"Id\" ,\"IsDeleted\" ,\"AccountId\" ,\"Name\" ,\"Description\" ,\"StageName\" ,\"Amount\" ,\"ExpectedRevenue\" ,\"Type\" ,\"IsClosed\" ,\"IsWon\" ,\"OwnerId\" ,\"CreatedDate\" ,\"CreatedById\" ,\"LastModifiedDate\" ,\"LastModifiedById\" ,\"ContactId\")\r\n", + "\r\n", + "filtered_email_df = email_df.join(df_comparison_result, email_df.id == df_comparison_result.id, \"inner\")" + ], + "execution_count": 12 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Map the opportunity Id to the Emails for the Filtered data based on Unique EmployeeClient From Opportunity" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Create an empty list to store results\r\n", + "results_list = []\r\n", + "\r\n", + "def find_opportunity_id(opportunity_data, email_subject, email_body):\r\n", + " # Combine the opportunity name and description into a single string\r\n", + " opportunity_text = [opp['opportunity_name'] + ' ' + opp['opportunity_description'] for opp in opportunity_data]\r\n", + "\r\n", + " # Combine the email subject and body into a single string\r\n", + " email_text = email_subject + ' ' + email_body\r\n", + "\r\n", + " # Create a TfidfVectorizer and fit_transform the opportunity_text and email_text\r\n", + " vectorizer = TfidfVectorizer()\r\n", + " opportunity_vectors = vectorizer.fit_transform(opportunity_text)\r\n", + " email_vector = vectorizer.transform([email_text])\r\n", + "\r\n", + " # Calculate cosine similarity between the email vector and opportunity vectors\r\n", + " similarities = cosine_similarity(email_vector, opportunity_vectors)\r\n", + "\r\n", + " # Find the index of the highest similarity score\r\n", + " max_similarity_index = similarities.argmax()\r\n", + "\r\n", + " # If the similarity score is above a certain threshold (e.g., 0.5), consider it a match\r\n", + " if similarities[0, max_similarity_index] > 0.5:\r\n", + " return opportunity_data[max_similarity_index]['opportunity_id']\r\n", + " else:\r\n", + " return None\r\n", + "\r\n", + " \r\n", + "# Convert DataFrame to a list of dictionaries\r\n", + "opportunity_data = df_opportunity.rdd.map(lambda row: {\r\n", + " 'opportunity_id': row['Id'],\r\n", + " 'opportunity_name': row['Name'],\r\n", + " 'opportunity_description': row['Description']\r\n", + "}).collect()\r\n", + "\r\n", + " \r\n", + "# Convert DataFrame to a list of Rows\r\n", + "email_rows = filtered_email_df.collect()\r\n", + "\r\n", + "# Loop through the Rows and extract 'subject' and 'uniqueBody_content' values\r\n", + "for row in email_rows:\r\n", + " # Replace 'row' with the object containing the email data\r\n", + " email_id = row['id']\r\n", + " sent_date_time = row['sentDateTime']\r\n", + " last_modified_date_time = row['lastModifiedDateTime']\r\n", + " email_subject = row['subject']\r\n", + " body_preview = row['bodyPreview']\r\n", + " conversation_id = row['conversationId']\r\n", + " conversation_index = row['conversationIndex']\r\n", + " from_email_address = row['from']\r\n", + " to_recipient_list = row['to_recipient_list']\r\n", + " html_data = row['uniqueBody_content']\r\n", + "\r\n", + " # UDF to extract text from HTML\r\n", + " def extract_text_from_html(html_string):\r\n", + " soup = BeautifulSoup(html_string, 'html.parser')\r\n", + " return soup.get_text().strip()\r\n", + "\r\n", + " # Register UDF\r\n", + " extract_text_udf = udf(extract_text_from_html, StringType())\r\n", + "\r\n", + " # Create DataFrame from the given HTML data\r\n", + " data = [(html_data,)]\r\n", + " email_body = spark.createDataFrame(data, [\"html_data\"])\r\n", + "\r\n", + " # Apply UDF to extract text\r\n", + " email_body = email_body.withColumn(\"extracted_text\", extract_text_udf(\"html_data\"))\r\n", + "\r\n", + " # Collect the extracted text as a string\r\n", + " email_text = email_body.select(\"extracted_text\").collect()[0][0]\r\n", + " \r\n", + " # Find the Opportunity ID using machine learning-based comparison\r\n", + " opportunity_id = find_opportunity_id(opportunity_data, email_subject, email_text)\r\n", + "\r\n", + " \r\n", + " # Display the result\r\n", + " if opportunity_id is not None:\r\n", + " # Append the result to the results_list\r\n", + " result = {\r\n", + " 'email_id': email_id,\r\n", + " 'sent_date_time': sent_date_time,\r\n", + " 'last_modified_date_time': last_modified_date_time,\r\n", + " 'body_preview': body_preview,\r\n", + " 'conversation_id': conversation_id,\r\n", + " 'conversation_index': conversation_index,\r\n", + " 'from_email_address': from_email_address,\r\n", + " 'to_recipient_list': to_recipient_list,\r\n", + " 'email_subject': email_subject,\r\n", + " 'email_text': email_text,\r\n", + " 'opportunity_id': opportunity_id\r\n", + " }\r\n", + " results_list.append(result)\r\n", + " # print(f\"Opportunity ID: {opportunity_id}\")\r\n", + " else:\r\n", + " pass\r\n", + " # print(\"No matching Opportunity found.\")" + ], + "execution_count": 16 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create Delta table" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Define the schema for the result DataFrame\r\n", + "result_schema = StructType([\r\n", + " StructField(\"email_id\", StringType(), False),\r\n", + " StructField(\"sent_date_time\", StringType(), True),\r\n", + " StructField(\"last_modified_date_time\", StringType(), True),\r\n", + " StructField(\"body_preview\", StringType(), True),\r\n", + " StructField(\"conversation_id\", StringType(), True),\r\n", + " StructField(\"conversation_index\", StringType(), True),\r\n", + " StructField(\"from_email_address\", StringType(), True),\r\n", + " StructField(\"to_recipient_list\", StringType(), True),\r\n", + " StructField(\"email_subject\", StringType(), True),\r\n", + " StructField(\"email_text\", StringType(), True),\r\n", + " StructField(\"opportunity_id\", StringType(), True)\r\n", + "])\r\n", + "\r\n", + "\r\n", + "# Create a DataFrame from the results_list\r\n", + "result_df = spark.createDataFrame(results_list, schema=result_schema)\r\n", + "\r\n", + "result_df.write.format('delta') \\\r\n", + " .mode('overwrite') \\\r\n", + " .option(\"mergeSchema\", \"true\") \\\r\n", + " .save(f'{output_url}/Email')" + ], + "execution_count": 19 + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/RawToBronze.json b/solutions/graph-data-sales-analytics/synapse/notebook/RawToBronze.json new file mode 100644 index 00000000..7cc5b1c0 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/RawToBronze.json @@ -0,0 +1,342 @@ +{ + "name": "RawToBronze", + "properties": { + "folder": { + "name": "Stage2 RawToBronze" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "32599d53-d381-4f6b-9b2f-cc0ab258e00b" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "Synapse PySpark" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net" + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "automaticScaleJobs": false + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Import all the libraries needed" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "from notebookutils import mssparkutils\r\n", + "from datetime import datetime\r\n", + "from pandas import date_range\r\n", + "from os.path import isfile\r\n", + "from datetime import timedelta\r\n", + "from pyspark.sql import SparkSession\r\n", + "import adal\r\n", + "import pyodbc\r\n", + "import struct\r\n", + "import os" + ], + "execution_count": 6 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "%run config/variables" + ], + "execution_count": 7 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Set global constants" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "tags": [ + "parameters" + ] + }, + "source": [ + "## Global constants for adls\r\n", + "input_container = \"raw\"\r\n", + "output_container = \"bronze\"\r\n", + "cloud_alias = \"SFSC\"" + ], + "execution_count": 8 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Generate input and utput container URL's" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# create url for input and output container\r\n", + "input_url = f'abfss://{input_container}@{adls_name}.dfs.core.windows.net/{cloud_alias}'\r\n", + "output_url = f'abfss://{output_container}@{adls_name}.dfs.core.windows.net/{cloud_alias}'" + ], + "execution_count": 4 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create connection to database and set the cursor" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "conn = pyodbc.connect(connString, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct});\r\n", + "cursor = conn.cursor()" + ], + "execution_count": 5 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Get all entities from input container" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "entities = mssparkutils.fs.ls(input_url)\r\n", + "\r\n", + "# for entity in entities:\r\n", + "# entity_name = os.path.splitext(entity.name)[0]\r\n", + "# print(entity_name)" + ], + "execution_count": 6 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Read CSV and save dataframe in delta format at desired location" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "for entity in entities:\r\n", + " entity_name = os.path.splitext(entity.name)[0]\r\n", + " watermark = cursor.execute(f\"SELECT [BronzeWatermark] FROM [dbo].[MetadataSalesCloud] WHERE [EntityName] = '{entity_name}' AND [CloudAbbreviation] = '{cloud_alias}'\")\\\r\n", + " .fetchall()\r\n", + " \r\n", + " print('Table Name: ', entity_name)\r\n", + " watermark = watermark[0][0]\r\n", + " print('LastTimestamp:', watermark)\r\n", + "\r\n", + " try:\r\n", + " df = spark.read.csv(f'{input_url}/{entity_name}',sep='`', header=True,multiLine=True, inferSchema=True)\r\n", + " df.write.format('delta').mode('overwrite').save(f'{output_url}/{entity_name}')\r\n", + " \r\n", + " cursor.execute(f\"UPDATE [dbo].[MetadataSalesCloud] SET [BronzeWatermark] = ? WHERE [EntityName] = ? AND [CloudAbbreviation] = ?\", datetime.utcnow(), entity_name, cloud_alias)\r\n", + " cursor.commit()\r\n", + "\r\n", + " print('Records written in delta table: ',df.count(),' for Entity: ', entity_name)\r\n", + " except Exception as err:\r\n", + " print(err)\r\n", + "" + ], + "execution_count": 7 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Read M365 data and create delta table" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "input_url_m365 = f'abfss://{input_container}@{adls_name}.dfs.core.windows.net/M365'\r\n", + "output_url_m365 = f'abfss://{output_container}@{adls_name}.dfs.core.windows.net/M365'\r\n", + "\r\n", + "entity_name = \"Inbox\"\r\n", + "\r\n", + "# Read the Parquet file into a DataFrame\r\n", + "parquet_df = spark.read.format('parquet').load(f'{input_url_m365}/{entity_name}')\r\n", + "\r\n", + "parquet_df.write.format('delta').mode('overwrite').save(f'{output_url}/{entity_name}')" + ], + "execution_count": null + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/SalesforceFunctions.json b/solutions/graph-data-sales-analytics/synapse/notebook/SalesforceFunctions.json new file mode 100644 index 00000000..c5e11f94 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/SalesforceFunctions.json @@ -0,0 +1,150 @@ +{ + "name": "SalesforceFunctions", + "properties": { + "folder": { + "name": "utility" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "1aceb01e-ce19-4215-86d6-1b8ae63512d1" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "Synapse PySpark" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net", + "authHeader": null + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "extraHeader": null + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "def get_salesforce_access_token(client_id, client_secret):\r\n", + " # Salesforce OAuth2 token endpoint\r\n", + " token_url = 'https://fractalanalytics--uat.my.salesforce.com/services/oauth2/token'\r\n", + "\r\n", + " # Request payload for token endpoint\r\n", + " payload = {\r\n", + " 'grant_type': 'client_credentials',\r\n", + " 'client_id': client_id,\r\n", + " 'client_secret': client_secret,\r\n", + " }\r\n", + " # Send POST request to token endpoint\r\n", + " response = requests.post(token_url, data=payload)\r\n", + "\r\n", + " # Check if the request was successful\r\n", + " if response.status_code == 200:\r\n", + " # Access token is returned in the response\r\n", + " access_token = response.json()['access_token']\r\n", + " return access_token\r\n", + " else:\r\n", + " print('Error:', response.content)\r\n", + " return None" + ], + "execution_count": 16 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "\r\n", + "def get_salesforce_records(access_token, query):\r\n", + " api_endpoint = 'https://fractalanalytics--uat.sandbox.my.salesforce.com' \r\n", + " query_api_endpoint = api_endpoint + '/services/data/v58.0/query/'\r\n", + "\r\n", + " # Set up the headers\r\n", + " headers = {\r\n", + " 'Authorization': 'Bearer ' + access_token,\r\n", + " 'Content-Type': 'application/json'\r\n", + " }\r\n", + "\r\n", + " # Make the API request using the requests library\r\n", + " response = requests.get(query_api_endpoint, headers=headers, params={'q': query})\r\n", + "\r\n", + " # Process the response\r\n", + " if response.status_code == 200:\r\n", + " data = response.json()\r\n", + "\r\n", + " # Create an empty list to hold all the records\r\n", + " records = []\r\n", + " records.extend(data['records'])\r\n", + "\r\n", + " # Check if the response contains 'nextRecordsUrl' key\r\n", + " next_records_url = data.get('nextRecordsUrl')\r\n", + "\r\n", + " # Loop through subsequent pages until there are no more records\r\n", + " while next_records_url:\r\n", + " response = requests.get(api_endpoint + next_records_url, headers=headers)\r\n", + " if response.status_code == 200:\r\n", + " data = response.json()\r\n", + " records.extend(data['records'])\r\n", + " next_records_url = data.get('nextRecordsUrl')\r\n", + " else:\r\n", + " return response.status_code, response.text\r\n", + " return records\r\n", + " else:\r\n", + " return response.status_code, response.text\r\n", + "" + ] + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/Saslesforce_SourceToRaw.json b/solutions/graph-data-sales-analytics/synapse/notebook/Saslesforce_SourceToRaw.json new file mode 100644 index 00000000..00f06059 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/Saslesforce_SourceToRaw.json @@ -0,0 +1,314 @@ +{ + "name": "Saslesforce_SourceToRaw", + "properties": { + "folder": { + "name": "Stage1 SourceToRaw" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "6c478333-5f46-4c26-b42e-9806ea465f32" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "Synapse PySpark" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net", + "authHeader": null + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "extraHeader": null + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "code", + "source": [ + "from pyspark.sql.types import *\r\n", + "from pyspark.sql import SparkSession\r\n", + "from datetime import datetime\r\n", + "import json\r\n", + "import requests\r\n", + "import adal\r\n", + "import pyodbc\r\n", + "import struct" + ], + "execution_count": 1 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "%run config/variables" + ], + "execution_count": 2 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "%run utility/SalesforceFunctions" + ], + "execution_count": 3 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Set global constants" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Global constants for adls\r\n", + "input_container = \"raw\"\r\n", + "cloud_alias = \"SFSC\"" + ], + "execution_count": 4 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create connection to database and set the cursor" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "conn = pyodbc.connect(connString, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct});\r\n", + "cursor = conn.cursor()" + ], + "execution_count": 5 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Get all salesfroce entities from metdata db" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "metadata_entities = cursor.execute(f\"SELECT [EntityName],[Query],[SchemaColumn] FROM [dbo].[MetadataSalesCloud] WHERE [CloudAbbreviation] = '{cloud_alias}'\") \\\r\n", + " .fetchall()\r\n", + "" + ], + "execution_count": 6 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Get data from Salesforce entitites" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "access_token = get_salesforce_access_token(salesforce_client_id, salesforce_client_secret)\r\n", + "\r\n", + "for row in metadata_entities:\r\n", + " entity_name = row[0]\r\n", + " query = row[1]\r\n", + " schema_string = row[2]\r\n", + "\r\n", + " result = get_salesforce_records(access_token, query)\r\n", + "\r\n", + " if isinstance(result, tuple):\r\n", + " status_code, response_text = result\r\n", + " print(\"Request failed with status code:\", status_code)\r\n", + " print(\"Response text:\", response_text)\r\n", + " else:\r\n", + " records = result\r\n", + " print(\"Total records:\", len(records) , \"for entity:\", entity_name)\r\n", + "\r\n", + " spark = SparkSession.builder.getOrCreate()\r\n", + "\r\n", + " # Convert the schema string to StructType\r\n", + " schema = eval(schema_string)\r\n", + "\r\n", + " # Convert the JSON response to a Spark DataFrame using the specified schema\r\n", + " df = spark.createDataFrame(records, schema=schema)\r\n", + " \r\n", + " # Filter out the rows with corrupt records\r\n", + " df = df.filter(df[\"_corrupt_record\"].isNull())\r\n", + "\r\n", + " # Drop the \"_corrupt_record\" column\r\n", + " df = df.drop(\"_corrupt_record\")\r\n", + "\r\n", + " input_url = f'abfss://{input_container}@{adls_name}.dfs.core.windows.net/{cloud_alias}/{entity_name}'\r\n", + "\r\n", + " # Write DataFrame to CSV in ADLS\r\n", + " df.write.csv(input_url, sep='`', header=True, mode=\"overwrite\")\r\n", + " print('Records written in adls: ',df.count(),' for Entity: ', entity_name)\r\n", + "\r\n", + " try: \r\n", + " cursor.execute(f\"UPDATE [dbo].[MetadataSalesCloud] SET [Watermark] = ? WHERE [EntityName] = ? AND [CloudAbbreviation] = ?\", datetime.utcnow(), entity_name, cloud_alias)\r\n", + " cursor.commit()\r\n", + " except Exception as err:\r\n", + " print(err)\r\n", + "" + ], + "execution_count": 10 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Write data to CSV in ADLS" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "# read csv file\r\n", + "# df_adls = spark.read.csv(input_url,sep='`', header=True,multiLine=True, inferSchema=True)\r\n", + "# display(df_adls)" + ], + "execution_count": 9 + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/Sentiment_code_nltk.json b/solutions/graph-data-sales-analytics/synapse/notebook/Sentiment_code_nltk.json new file mode 100644 index 00000000..dd947bb5 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/Sentiment_code_nltk.json @@ -0,0 +1,524 @@ +{ + "name": "Sentiment_code_nltk", + "properties": { + "description": "NLTK based sentiment model", + "folder": { + "name": "sentiment_analysis" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "c8c3c478-e9c1-40e8-ad52-717796fe2eb0" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "Synapse PySpark" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net" + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "automaticScaleJobs": false + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "import nltk\r\n", + "nltk.download('punkt')\r\n", + "nltk.download('vader_lexicon')\r\n", + "nltk.download('stopwords')\r\n", + "\r\n", + "import pandas as pd\r\n", + "import re\r\n", + "from nltk.sentiment.vader import SentimentIntensityAnalyzer as sia\r\n", + "from nltk.tokenize import sent_tokenize, word_tokenize\r\n", + "from nltk.corpus import stopwords\r\n", + "import gensim\r\n", + "from gensim import corpora\r\n", + "\r\n", + "from pyspark.sql import SparkSession\r\n", + "from pyspark.sql.types import StructType, StructField, IntegerType, StringType" + ], + "execution_count": 35 + }, + { + "cell_type": "code", + "source": [ + "'''\r\n", + "This module contains functions for analyzing the sentiment of emails \r\n", + "and extracting themes using LDA topic modeling. It uses the VADER\r\n", + "(Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis\r\n", + "tool to determine the sentiment of individual tokens and text. \r\n", + "The sentiment analysis can be performed using a sliding window \r\n", + "approach on sentences to identify the best and least sentiment \r\n", + "windows within the email text. Additionally, the module applies \r\n", + "LDA topic modeling to extract themes from the emails. \r\n", + "The output provides overall sentiment scores, token-level \r\n", + "sentiment scores, and themes for each email, which can be used\r\n", + "to gain insights and classify emails based on their sentiments \r\n", + "and themes.\r\n", + "'''\r\n", + "\r\n", + "def analyze_token_sentiment(token, additional_words=None):\r\n", + " \"\"\"\r\n", + " Analyzes the sentiment of a single token using the VADER SentimentIntensityAnalyzer.\r\n", + " Args:\r\n", + " token (str): The token to analyze.\r\n", + " additional_words (list, optional): Additional words to include in the analysis. Defaults to None.\r\n", + " Returns:\r\n", + " float: The sentiment score of the token.\r\n", + " \"\"\"\r\n", + " sia_obj = sia()\r\n", + " token_polarity_scores = sia_obj.polarity_scores(token)\r\n", + " token_sentiment = token_polarity_scores['compound']\r\n", + " return token_sentiment\r\n", + "\r\n", + "def analyze_sentiment(text, additional_words=None):\r\n", + " \"\"\"\r\n", + " Analyzes the sentiment of a text using the VADER SentimentIntensityAnalyzer.\r\n", + " Args:\r\n", + " text (str): The text to analyze.\r\n", + " additional_words (list, optional): Additional words to include in the analysis. Defaults to None.\r\n", + " Returns:\r\n", + " tuple: A tuple containing the compound sentiment score of the text and a list of token-level sentiment scores.\r\n", + " \"\"\"\r\n", + "\r\n", + " sia_obj = sia()\r\n", + " polarity_scores = sia_obj.polarity_scores(text)\r\n", + " compound_score = polarity_scores['compound']\r\n", + " tokenized_text = word_tokenize(text)\r\n", + " stop_words = set(stopwords.words('english'))\r\n", + "\r\n", + " if additional_words:\r\n", + " stop_words.update(additional_words)\r\n", + "\r\n", + " tokenized_text = [word.lower() for word in tokenized_text if word.lower() not in stop_words]\r\n", + " token_sentiments = []\r\n", + "\r\n", + " for token in tokenized_text:\r\n", + " token_sentiment = analyze_token_sentiment(token, additional_words)\r\n", + " token_sentiments.append(token_sentiment)\r\n", + "\r\n", + " return compound_score, token_sentiments\r\n", + "\r\n", + "\r\n", + "def analyze_sentence_window(sentence, window_size, additional_words=None):\r\n", + " \"\"\"\r\n", + " Analyzes the sentiment of a sentence using a sliding window approach.\r\n", + " Args:\r\n", + " sentence (str): The sentence to analyze.\r\n", + " window_size (int): The size of the sliding window.\r\n", + " additional_words (list, optional): Additional words to include in the analysis. Defaults to None.\r\n", + "\r\n", + " Returns:\r\n", + " tuple: A tuple containing the best and least sentiment windows and their corresponding sentiment scores.\r\n", + " \"\"\"\r\n", + " words = word_tokenize(sentence)\r\n", + " num_words = len(words)\r\n", + " if num_words < window_size:\r\n", + "\r\n", + " return None, None, None, None\r\n", + "\r\n", + " best_sentiment = None\r\n", + " best_window = None\r\n", + " least_sentiment = None\r\n", + " least_window = None\r\n", + "\r\n", + " for i in range(num_words - window_size + 1):\r\n", + " window_words = words[i:i+window_size]\r\n", + " window_text = ' '.join(window_words)\r\n", + " compound_score, _ = analyze_sentiment(window_text, additional_words)\r\n", + "\r\n", + " if best_sentiment is None or compound_score > best_sentiment:\r\n", + " best_sentiment = compound_score\r\n", + " best_window = window_text\r\n", + "\r\n", + " if least_sentiment is None or compound_score < least_sentiment:\r\n", + " least_sentiment = compound_score\r\n", + " least_window = window_text\r\n", + "\r\n", + " return best_window, best_sentiment, least_window, least_sentiment\r\n", + "\r\n", + "\r\n", + "def analyze_email(email, additional_words=None, remove_phrases=None, window_size=5):\r\n", + " \"\"\"\r\n", + " Analyzes the sentiment of an email.\r\n", + " Args:\r\n", + " email (str): The email text to analyze.\r\n", + " additional_words (list, optional): Additional words to include in the analysis. Defaults to None.\r\n", + " remove_phrases (list, optional): Phrases to remove from the email text. Defaults to None.\r\n", + " window_size (int, optional): The size of the sliding window. Defaults to 5.\r\n", + "\r\n", + " Returns:\r\n", + " tuple: A tuple containing lists of the overall sentiment scores and token-level sentiment scores for the email.\r\n", + " \"\"\"\r\n", + " sentences = sent_tokenize(email)\r\n", + " email_sentiments = []\r\n", + " email_token_sentiments = []\r\n", + "\r\n", + " for sentence in sentences:\r\n", + " # Remove specified phrases\r\n", + " # Remove text after \"regard\" or \"regards\"\r\n", + " sentence = re.sub(r'\\bregard(s)?\\b.*', '', sentence, flags=re.IGNORECASE)\r\n", + " if remove_phrases:\r\n", + " for phrase in remove_phrases:\r\n", + " sentence = sentence.replace(phrase, \"\")\r\n", + "\r\n", + " compound_score, token_sentiments = analyze_sentiment(sentence, additional_words)\r\n", + " if compound_score != 0:\r\n", + " email_sentiments.append(compound_score)\r\n", + " email_token_sentiments.extend(token_sentiments)\r\n", + "\r\n", + " best_window, best_sentiment, least_window, least_sentiment = analyze_sentence_window(sentence, window_size, additional_words)\r\n", + " if best_window is not None and least_window is not None:\r\n", + " print(f\"Sentence: {sentence}\")\r\n", + " print(f\"Best {window_size}-Word Window: {best_window} (Sentiment: {best_sentiment})\")\r\n", + " print(f\"Least {window_size}-Word Window: {least_window} (Sentiment: {least_sentiment})\")\r\n", + " print()\r\n", + " return email_sentiments, email_token_sentiments\r\n", + "\r\n", + " \r\n", + "def analyze_emails(emails, additional_words=None, remove_phrases=None, window_size=5):\r\n", + " \"\"\"\r\n", + " Analyzes the sentiment of a list of emails.\r\n", + " Args:\r\n", + " emails (list): A list of email texts to analyze.\r\n", + " additional_words (list, optional): Additional words to include in the analysis. Defaults to None.\r\n", + " remove_phrases (list, optional): Phrases to remove from the email text. Defaults to None.\r\n", + " window_size (int, optional): The size of the sliding window. Defaults to 5.\r\n", + " Returns:\r\n", + " tuple: A tuple containing lists of overall sentiment scores, token-level sentiment scores, and themes for each email.\r\n", + " \"\"\"\r\n", + " all_email_sentiments = []\r\n", + " all_email_token_sentiments = []\r\n", + " overall_email_sentiments = []\r\n", + "\r\n", + " for email in emails:\r\n", + " email_sentiments, email_token_sentiments = analyze_email(email, additional_words, remove_phrases, window_size)\r\n", + " all_email_sentiments.append(email_sentiments)\r\n", + " all_email_token_sentiments.extend(email_token_sentiments)\r\n", + " overall_sentiment = sum(email_sentiments) / len(email_sentiments) if email_sentiments else 0\r\n", + " overall_email_sentiments.append(overall_sentiment)\r\n", + " return all_email_sentiments, all_email_token_sentiments, overall_email_sentiments\r\n", + "\r\n", + " \r\n", + "def extract_theme(emails):\r\n", + " \"\"\"\r\n", + " Extracts themes from a list of emails using LDA topic modeling.\r\n", + " Args:\r\n", + " emails (list): A list of email texts.\r\n", + " Returns:\r\n", + " list: A list of integers representing the themes for each email.\r\n", + " \"\"\"\r\n", + " # Tokenize emails\r\n", + " tokenized_emails = [word_tokenize(email.lower()) for email in emails]\r\n", + "\r\n", + " # Remove stop words\r\n", + " stop_words = set(stopwords.words('english'))\r\n", + " tokenized_emails = [[word for word in email if word not in stop_words] for email in tokenized_emails]\r\n", + "\r\n", + " # Create dictionary of words and their frequency\r\n", + " dictionary = corpora.Dictionary(tokenized_emails)\r\n", + "\r\n", + " # Create bag-of-words (BoW) representation of emails\r\n", + " bow_corpus = [dictionary.doc2bow(email) for email in tokenized_emails]\r\n", + "\r\n", + " # Perform LDA topic modeling\r\n", + " num_topics = 5 # You can change the number of topics as per your requirement\r\n", + " lda_model = gensim.models.LdaModel(bow_corpus, num_topics=num_topics, id2word=dictionary, passes=20)\r\n", + "\r\n", + " # Extract themes from each email\r\n", + " themes = []\r\n", + " for bow in bow_corpus:\r\n", + " topic_distribution = lda_model.get_document_topics(bow)\r\n", + " theme = max(topic_distribution, key=lambda x: x[1])[0]\r\n", + " themes.append(theme)\r\n", + " return themes\r\n", + "\r\n", + "# Function to compute the label based on sentiment score\r\n", + "def compute_label(sentiment):\r\n", + " if sentiment < -0.2:\r\n", + " return \"Negative\"\r\n", + " elif sentiment > 0.2:\r\n", + " return \"Positive\"\r\n", + " else:\r\n", + " return \"Neutral\"\r\n", + "\r\n", + "# Function to scale sentiment scores from -1 to 1 to 0 to 100\r\n", + "def scale_sentiment(score):\r\n", + " return (score + 1) * 50" + ], + "execution_count": 39 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "container = \"gold\"\r\n", + "directory_alias = \"M365\"\r\n", + "\r\n", + "adls_name = \"adlsgdcscindevil\"\r\n", + "\r\n", + "input_url = f'abfss://{container}@{adls_name}.dfs.core.windows.net/{directory_alias}'\r\n", + "output_url = f'abfss://{container}@{adls_name}.dfs.core.windows.net/{directory_alias}'" + ], + "execution_count": 36 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "df_email = spark.read.format('delta')\\\r\n", + " .option(\"linesep\", \"\\n\")\\\r\n", + " .option(\"header\", \"true\")\\\r\n", + " .option(\"multiLine\",'true')\\\r\n", + " .load(f'{input_url}/email')\r\n", + "\r\n", + "# Get the 'email_text' column from the DataFrame\r\n", + "email_text_column = df_email.select(\"email_text\")\r\n", + "\r\n", + "# Collect the 'email_text' values as a list of Rows\r\n", + "email_text_rows = email_text_column.collect()\r\n", + "\r\n", + "# Extract the email_text values from the Rows and create a list\r\n", + "email_text_list = [row[\"email_text\"] for row in email_text_rows]\r\n", + "\r\n", + "# # Display the list of email_text\r\n", + "# print(email_text_list)\r\n", + "" + ], + "execution_count": 47 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# # Sample emails and additional words\r\n", + "# emails = [\r\n", + "# \"No\", \"I got this but next time be please a bit more careful. Create an action plan to mitigate this. Any suggestion required, please connect with me\",\r\n", + "# \"Hi Team, Request you to please grant access to 172.20.33.124 server to me.\", \"This is bad, please redo\", \"We are not going with this deal\",\r\n", + "# \"As discussed, please find attached my understanding about the Pyomo package in python and my approach for solving inventory problem using it.\",\r\n", + "# \"Hope you are doing well. I’m on bench since 25th Sep and wish to know about this project. I’ve exposure related to Google Analytics but not into HTML, would I be eligible for this role?\",\r\n", + "# \"Unfortunately, the HSBC’s project has been put on hold due to administrative issues and I’ve been put to bench.Wish to know about any opening related to DS. Also, wish to understand if I can upskill myself in cloud technologies meanwhile?\",\r\n", + "# \"Thanks for sharing the proposal, it was great one. But we are not going with it as we found a better alternative\",\r\n", + "# \"Action items:1. Matt will update us on M365 after taking the follow-up with internal team.2. Nitin to work on the feedback provided by Chantrelle and Rajesh on Help Information page 1. KPIs and Metrics 2. Support 1. Put - Microsoft Graph Data Connect overview - Microsoft Graph | Microsoft Learn2. No email aliases.3. Put Fractal contact.3. FAQ - 1. Data description: account from CRM which ones we are using. 2. Pricing (https://azure.microsoft.com/en-us/pricing/details/graph-data-connect/) and other things, Fractal will make those updates and send for review. 3. Repo - put it in the in the GitHub with the collection of the rest of the templates.4. Fractal team to send a note via email with this group as we need help from Microsoft (joespinoza@microsoft.com) to setup GitHub (https://github.com/microsoftgraph/dataconnect-solutions).5. Fractal team to send Dashboard to Microsoft by tomorrow for review. Let me know if I missed anything.\",\r\n", + "# \"I've read through the document and I really liked the plan. May I know the timelines for implementing it and if there would be any hurdles which you forsee. Also, do let us know the pricing details\",\r\n", + "# \"Hope, you are doing well. I've read through the document and I really liked the plan. May I know the timelines for implementing it and if there would be any hurdles which you forsee?\"\r\n", + "# ]\r\n", + "\r\n", + "emails = email_text_list \r\n", + "\r\n", + "additional_words = [\"thanks\", \" I hope you're doing well\", \"dear\", \"please\", \"fine\", \"ok\", \"greetings\", \"well\", \"hi\", \"thank\", \"hello\", \"kindly\", \",\", \":\", \"@\", \"!\", \"-\", \".\"]\r\n", + "\r\n", + "\r\n", + "# Analyze emails with a 5-word window\r\n", + "email_sentiments, email_token_sentiments, overall_email_sentiments = analyze_emails(emails, additional_words, window_size=5)\r\n", + "\r\n", + "\r\n", + "Print the overall sentiment scores for each email\r\n", + "for i, email in enumerate(emails):\r\n", + " print(f\"Email {i+1}: {overall_email_sentiments[i]:.2f}\")\r\n", + "\r\n", + " \r\n", + "# Print the overall sentiment score based on token-level sentiment scores\r\n", + "overall_token_sentiment = sum(email_token_sentiments) / len(email_token_sentiments) if email_token_sentiments else 0\r\n", + "print(f\"Overall Token-Level Sentiment: {overall_token_sentiment:.2f}\")\r\n", + "\r\n", + " \r\n", + "df = pd.DataFrame({\r\n", + " \"email_text\": emails,\r\n", + " \"overall_email_sentiments\": overall_email_sentiments\r\n", + "})\r\n", + "\r\n", + " \r\n", + "\r\n", + "# Add a new column for labels\r\n", + "df[\"Label\"] = df[\"overall_email_sentiments\"].apply(compute_label)\r\n", + "\r\n", + "# Extract themes from each email\r\n", + "email_themes = extract_theme(emails)\r\n", + "\r\n", + " # Add the themes to the DataFrame\r\n", + "df[\"Theme\"] = email_themes\r\n", + "\r\n", + "# Map theme indices to human-readable theme labels\r\n", + "theme_labels = {0: \"Business Request\", 1: \"Technical Inquiry\", 2: \"Project Status\", 3: \"General Inquiry\", 4: \"Feedback\"}\r\n", + "df[\"Theme\"] = df[\"Theme\"].map(theme_labels)\r\n", + " \r\n", + "# Add a new column with scaled sentiment scores\r\n", + "df[\"Scaled_Sentiment\"] = df[\"overall_email_sentiments\"].apply(scale_sentiment)\r\n", + "\r\n", + "# Remove the 'overall_email_sentiments' column from the DataFrame\r\n", + "df.drop(columns=[\"overall_email_sentiments\"], inplace=True)\r\n", + "\r\n", + "# Reorder the columns placing 'Scaled_Sentiment' before 'Label'\r\n", + "#df = df[[\"emails\", \"Scaled_Sentiment\", \"Label\", \"Theme\"]]\r\n", + "df = df[[\"email_text\", \"Scaled_Sentiment\", \"Label\"]]\r\n", + "# Print the updated DataFrame\r\n", + "# print(df)\r\n", + "" + ], + "execution_count": 48 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "email_m365 = df_email.toPandas()\r\n", + "df_final = pd.merge(email_m365, df, on='email_text', how='left')" + ], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "# Initialize a SparkSession\r\n", + "spark = SparkSession.builder.appName(\"PandasToDelta\").getOrCreate()\r\n", + "\r\n", + "# Define the schema for the PySpark DataFrame\r\n", + "schema = StructType([\r\n", + " StructField(\"email_id\", StringType(), False),\r\n", + "\tStructField(\"email_subject\", StringType(), True),\r\n", + "\tStructField(\"email_text\", StringType(), True),\r\n", + "\tStructField(\"opportunity_id\", StringType(), True),\r\n", + "\tStructField(\"conversation_id\", StringType(), True),\r\n", + " StructField(\"sent_date_time\", StringType(), True),\r\n", + " StructField(\"last_modified_date_time\", StringType(), True),\r\n", + " StructField(\"body_preview\", StringType(), True),\r\n", + " StructField(\"conversation_index\", StringType(), True),\r\n", + " StructField(\"from_email_address\", StringType(), True),\r\n", + " StructField(\"to_recipient_list\", StringType(), True),\r\n", + " StructField(\"Scaled_Sentiment\", StringType(), True),\r\n", + " StructField(\"Label\", StringType(), True)\r\n", + "])\r\n", + "\r\n", + "# Convert Pandas DataFrame to PySpark DataFrame with the specified schema\r\n", + "df_email_sentiment = spark.createDataFrame(df_final, schema)\r\n", + "\r\n", + "df_email_sentiment.write.format('delta') \\\r\n", + " .mode('overwrite') \\\r\n", + " .option(\"mergeSchema\", \"true\") \\\r\n", + " .save(f'{output_url}/EmailSentiment')\r\n", + "" + ], + "execution_count": 54 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "" + ], + "execution_count": null + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/SilverToGold.json b/solutions/graph-data-sales-analytics/synapse/notebook/SilverToGold.json new file mode 100644 index 00000000..6c3bff95 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/SilverToGold.json @@ -0,0 +1,336 @@ +{ + "name": "SilverToGold", + "properties": { + "folder": { + "name": "Stage4 SilverToGold" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "f07f7ba1-2ab7-4a4e-aa74-b1605f369ce1" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "Synapse PySpark" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net" + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "automaticScaleJobs": false + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Import all the libraries needed" + ] + }, + { + "cell_type": "code", + "metadata": { + "collapsed": false + }, + "source": [ + "from notebookutils import mssparkutils\r\n", + "from datetime import datetime\r\n", + "from pyspark.sql import SparkSession\r\n", + "from pyspark.sql.functions import to_timestamp, col, substring\r\n", + "import adal\r\n", + "import pyodbc\r\n", + "import struct\r\n", + "import os" + ], + "execution_count": 1 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Calling variable file to intitialize the variables" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "%run config/variables" + ], + "execution_count": 2 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Set global constants" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "tags": [ + "parameters" + ] + }, + "source": [ + "input_container = \"silver\"\r\n", + "output_container = \"gold\"\r\n", + "cloud_alias = \"SFSC\"" + ], + "execution_count": 3 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Generate input and utput container URL's" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "input_url = f'abfss://{input_container}@{adls_name}.dfs.core.windows.net/{cloud_alias}'\r\n", + "output_url = f'abfss://{output_container}@{adls_name}.dfs.core.windows.net/{cloud_alias}'" + ], + "execution_count": 4 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create connection to database and set the cursor" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "conn = pyodbc.connect(connString, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct});\r\n", + "cursor = conn.cursor()" + ], + "execution_count": 5 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Get all entities" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "entities = mssparkutils.fs.ls(input_url)" + ], + "execution_count": 6 + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Define the entity name and corresponding column names\r\n", + "entity_columns = {\r\n", + " \"Account\": ['Id', 'IsDeleted', 'Name', 'Phone', 'AccountNumber', 'Website', 'Industry', 'AnnualRevenue', 'Description', 'Rating', 'OwnerId', 'LastModifiedDate'],\r\n", + " \"Opportunity\": ['Id' ,'IsDeleted' ,'AccountId' ,'Name','StageName' ,'Description' ,'Amount' ,'CloseDate' ,'Type' ,'IsClosed' ,'IsWon' ,'OwnerId' ,'CreatedDate' ,'LastModifiedDate' ,'ContactId'],\r\n", + " \"Contact\": ['Id' ,'IsDeleted' ,'AccountId' ,'LastName' ,'FirstName' ,'Name' ,'Phone' ,'Email' ,'OwnerId' ,'CreatedDate' ,'LastModifiedDate'],\r\n", + " \"User\": ['Id' ,'AccountId' ,'Address' ,'City' ,'ContactId' ,'CreatedById' ,'CreatedDate' ,'Department' ,'Name' ,'Email' ,'EmployeeNumber' ,'IsActive' ,'LastModifiedById' ,'LastModifiedDate' ,'MobilePhone']\r\n", + "}" + ], + "execution_count": 7 + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Read CSV and save dataframe in delta format at desired location" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + }, + "collapsed": false + }, + "source": [ + "for entity in entities:\r\n", + " entity_name = os.path.splitext(entity.name)[0]\r\n", + "\r\n", + " watermark = cursor.execute(f\"SELECT [GoldWatermark] FROM [dbo].[MetadataSalesCloud] WHERE [EntityName] = '{entity_name}' AND [CloudAbbreviation] = '{cloud_alias}'\") \\\r\n", + " .fetchall()\r\n", + "\r\n", + " print(entity_name)\r\n", + " watermark = watermark[0][0]\r\n", + " print('LastTimestamp:', watermark)\r\n", + "\r\n", + "\r\n", + " try:\r\n", + " df = spark.read.format('delta')\\\r\n", + " .option(\"linesep\", \"\\n\")\\\r\n", + " .option(\"header\", \"true\")\\\r\n", + " .option(\"sep\", \"`\")\\\r\n", + " .option(\"multiLine\",'true')\\\r\n", + " .load(f'{input_url}/{entity_name}')\r\n", + "\r\n", + " #Filter the required columns\r\n", + " selected_columns = entity_columns.get(entity_name)\r\n", + " df_final = df.select(*selected_columns)\r\n", + "\r\n", + " df_final.write.format('delta').mode('overwrite').save(f'{output_url}/{entity_name}')\r\n", + "\r\n", + " cursor.execute(f\"UPDATE [dbo].[MetadataSalesCloud] SET [GoldWatermark] = ? WHERE [EntityName] = ? AND [CloudAbbreviation] = ?\", datetime.utcnow(), entity_name, cloud_alias)\r\n", + " cursor.commit()\r\n", + "\r\n", + " print('Records written in delta table: ',df.count(),' for Entity: ', entity_name)\r\n", + "\r\n", + " except Exception as err:\r\n", + " print(err)" + ], + "execution_count": 11 + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/notebook/variables.json b/solutions/graph-data-sales-analytics/synapse/notebook/variables.json new file mode 100644 index 00000000..139bdff5 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/notebook/variables.json @@ -0,0 +1,166 @@ +{ + "name": "variables", + "properties": { + "folder": { + "name": "config" + }, + "nbformat": 4, + "nbformat_minor": 2, + "bigDataPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "sessionProperties": { + "driverMemory": "56g", + "driverCores": 8, + "executorMemory": "56g", + "executorCores": 8, + "numExecutors": 2, + "runAsWorkspaceSystemIdentity": false, + "conf": { + "spark.dynamicAllocation.enabled": "false", + "spark.dynamicAllocation.minExecutors": "2", + "spark.dynamicAllocation.maxExecutors": "2", + "spark.autotune.trackingId": "38a47d2b-fa92-4971-ba9c-a82f2c80bd71" + } + }, + "metadata": { + "saveOutput": true, + "enableDebugMode": false, + "kernelspec": { + "name": "synapse_pyspark", + "display_name": "python" + }, + "language_info": { + "name": "python" + }, + "a365ComputeOptions": { + "id": "/subscriptions/2058f82f-b8ac-423e-8c83-227732887c3a/resourceGroups/fractal-neal-coe-dev-rg/providers/Microsoft.Synapse/workspaces/syngdcscindevil/bigDataPools/synspgdcscin", + "name": "synspgdcscin", + "type": "Spark", + "endpoint": "https://syngdcscindevil.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/synspgdcscin", + "auth": { + "type": "AAD", + "authResource": "https://dev.azuresynapse.net", + "authHeader": null + }, + "sparkVersion": "3.1", + "nodeCount": 10, + "cores": 8, + "memory": 56, + "extraHeader": null + }, + "sessionKeepAliveTimeout": 30 + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "## variables\n", + "\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# Variable\n", + "\n", + "**Purpose**: This notebook is designed to declare and define all the variables, paths and spark configurations used in all other notebooks. This notebook act as a single source container for all the settings and configurations." + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Initializing all the variables used for authentication, jdbc connection and configuration" + ] + }, + { + "cell_type": "code", + "source": [ + "# Create a SparkSession\n", + "sc = SparkSession.builder.getOrCreate()\n", + "\n", + "\n", + "token_library = sc._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary\n", + "\n", + "\n", + "# adls name\n", + "adls_name = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"adls-name\")\n", + "\n", + "# server and database name\n", + "metadata_database_name = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"metadata-db-name\")\n", + "metadata_server_name = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"metadata-server-name\")\n", + "\n", + "syn_database_name = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"syn-db-name\")\n", + "syn_server_name = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"syn-server-name\")\n", + "\n", + "# SalesForce API details:\n", + "salesforce_client_id = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"salesforce-client-id\")\n", + "salesforce_client_secret = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"salesforce-client-secret\")\n", + "\n", + "# SPN details\n", + "client_id = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"client-id\")\n", + "client_secret = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"client-secret\")\n", + "tenant_id = token_library.getSecretWithLS(\"LS_AzureKeyVault\",\"tenant-id\")\n", + "\n", + "# Setting spark config\n", + "spark.conf.set(\"fs.azure.account.auth.type\", \"OAuth\")\n", + "spark.conf.set(\"fs.azure.account.oauth.provider.type\", \"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider\")\n", + "spark.conf.set(\"fs.azure.account.oauth2.client.id\", client_id)\n", + "spark.conf.set(\"fs.azure.account.oauth2.client.secret\", client_secret)\n", + "spark.conf.set(\"fs.azure.account.oauth2.client.endpoint\", \"https://login.microsoftonline.com/\"+tenant_id+\"/oauth2/token\")\n", + "\n", + "\n", + "# Oauth connection variables\n", + "authority = \"https://login.microsoftonline.com/\" + tenant_id\n", + "context = adal.AuthenticationContext(authority)\n", + "token = context.acquire_token_with_client_credentials(\"https://database.windows.net/\", client_id,client_secret)\n", + "access_token = token[\"accessToken\"]\n", + "\n", + "#synapse server jdbc_url\n", + "jdbc_url = \"jdbc:sqlserver://\" + syn_server_name + \":1433\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## setup connection with metatdata db" + ] + }, + { + "cell_type": "code", + "metadata": { + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "SQL_COPT_SS_ACCESS_TOKEN = 1256 \r\n", + "connString = \"Driver={ODBC Driver 17 for SQL Server};SERVER=\"+ metadata_server_name + \";DATABASE=\" + metadata_database_name\r\n", + "#get bytes from token obtained\r\n", + "tokenb = bytes(token[\"accessToken\"], \"UTF-8\")\r\n", + "exptoken = b'';\r\n", + "for i in tokenb:\r\n", + " exptoken += bytes({i});\r\n", + " exptoken += bytes(1);\r\n", + "tokenstruct = struct.pack(\"=i\", len(exptoken)) + exptoken;" + ] + } + ] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/pipeline/MasterPipeline.json b/solutions/graph-data-sales-analytics/synapse/pipeline/MasterPipeline.json new file mode 100644 index 00000000..6fc40d02 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/pipeline/MasterPipeline.json @@ -0,0 +1,85 @@ +{ + "name": "MasterPipeline", + "properties": { + "activities": [ + { + "name": "Source To Raw", + "description": "Run pipeline which load data from source to raw in ADLS.", + "type": "ExecutePipeline", + "dependsOn": [], + "userProperties": [], + "typeProperties": { + "pipeline": { + "referenceName": "pl_source_to_raw", + "type": "PipelineReference" + }, + "waitOnCompletion": true + } + }, + { + "name": "Raw To Bronze", + "description": "Run the synapse notebook which load data from raw in ADLS and create delta tables in bronze.", + "type": "ExecutePipeline", + "dependsOn": [ + { + "activity": "Source To Raw", + "dependencyConditions": [ + "Succeeded" + ] + } + ], + "userProperties": [], + "typeProperties": { + "pipeline": { + "referenceName": "pl_raw_to_bronze", + "type": "PipelineReference" + }, + "waitOnCompletion": true + } + }, + { + "name": "Silver To Gold", + "description": "Run the synapse notebook which will run transformations on delta tables in silver and store it in gold layer", + "type": "ExecutePipeline", + "dependsOn": [ + { + "activity": "Bronze To Silver", + "dependencyConditions": [ + "Succeeded" + ] + } + ], + "userProperties": [], + "typeProperties": { + "pipeline": { + "referenceName": "pl_silver_to_gold", + "type": "PipelineReference" + }, + "waitOnCompletion": true + } + }, + { + "name": "Bronze To Silver", + "description": "Run the synapse notebook which will run transformations on delta tables in bronze and store it in silver layer", + "type": "ExecutePipeline", + "dependsOn": [ + { + "activity": "Raw To Bronze", + "dependencyConditions": [ + "Succeeded" + ] + } + ], + "userProperties": [], + "typeProperties": { + "pipeline": { + "referenceName": "pl_bronze_to_silver", + "type": "PipelineReference" + }, + "waitOnCompletion": true + } + } + ], + "annotations": [] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/pipeline/pl_bronze_to_silver.json b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_bronze_to_silver.json new file mode 100644 index 00000000..f8aad1c1 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_bronze_to_silver.json @@ -0,0 +1,35 @@ +{ + "name": "pl_bronze_to_silver", + "properties": { + "activities": [ + { + "name": "Bronze To Silver", + "description": "Execute the bronze_to_silver notebook", + "type": "SynapseNotebook", + "dependsOn": [], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "notebook": { + "referenceName": "BronzeToSilver", + "type": "NotebookReference" + }, + "snapshot": true, + "sparkPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "executorSize": "Small", + "driverSize": "Small" + } + } + ], + "annotations": [] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/pipeline/pl_m365_source.json b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_m365_source.json new file mode 100644 index 00000000..6cbd77e5 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_m365_source.json @@ -0,0 +1,37 @@ +{ + "name": "pl_m365_source", + "properties": { + "activities": [ + { + "name": "M365 Data flow", + "type": "ExecuteDataFlow", + "dependsOn": [], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "dataflow": { + "referenceName": "DF_M365", + "type": "DataFlowReference" + }, + "compute": { + "coreCount": 16, + "computeType": "General" + }, + "traceLevel": "Fine" + } + } + ], + "folder": { + "name": "DataSources" + }, + "annotations": [], + "lastPublishTime": "2023-07-27T08:00:04Z" + }, + "type": "Microsoft.Synapse/workspaces/pipelines" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/pipeline/pl_raw_to_bronze.json b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_raw_to_bronze.json new file mode 100644 index 00000000..4f076961 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_raw_to_bronze.json @@ -0,0 +1,35 @@ +{ + "name": "pl_raw_to_bronze", + "properties": { + "activities": [ + { + "name": "Raw to Bronze", + "description": "Execute the raw_to_bronze notebook", + "type": "SynapseNotebook", + "dependsOn": [], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "notebook": { + "referenceName": "RawToBronze", + "type": "NotebookReference" + }, + "snapshot": true, + "sparkPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "executorSize": "Small", + "driverSize": "Small" + } + } + ], + "annotations": [] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/pipeline/pl_salesforce_source.json b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_salesforce_source.json new file mode 100644 index 00000000..2d23df05 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_salesforce_source.json @@ -0,0 +1,75 @@ +{ + "name": "pl_salesforce_source", + "properties": { + "activities": [ + { + "name": "Copy data Salesforce", + "type": "Copy", + "dependsOn": [], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "source": { + "type": "SalesforceSource", + "readBehavior": "query" + }, + "sink": { + "type": "DelimitedTextSink", + "storeSettings": { + "type": "AzureBlobFSWriteSettings" + }, + "formatSettings": { + "type": "DelimitedTextWriteSettings", + "quoteAllText": true, + "fileExtension": ".txt" + } + }, + "enableStaging": false, + "translator": { + "type": "TabularTranslator", + "typeConversion": true, + "typeConversionSettings": { + "allowDataTruncation": true, + "treatBooleanAsNumber": false + } + } + }, + "inputs": [ + { + "referenceName": "DS_SalesforceObjects", + "type": "DatasetReference", + "parameters": { + "objectname": "opportunity" + } + } + ], + "outputs": [ + { + "referenceName": "DS_CSV", + "type": "DatasetReference", + "parameters": { + "container": "raw", + "directory": "SFSC", + "filename": { + "value": "opportunity.csv", + "type": "Expression" + } + } + } + ] + } + ], + "folder": { + "name": "DataSources" + }, + "annotations": [], + "lastPublishTime": "2023-07-03T07:46:55Z" + }, + "type": "Microsoft.Synapse/workspaces/pipelines" +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/pipeline/pl_silver_to_gold.json b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_silver_to_gold.json new file mode 100644 index 00000000..54a5ee0a --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_silver_to_gold.json @@ -0,0 +1,114 @@ +{ + "name": "pl_silver_to_gold", + "properties": { + "activities": [ + { + "name": "Salesforce Silver To Gold", + "description": "Execute the salesforce silver_to_gold notebook", + "type": "SynapseNotebook", + "dependsOn": [], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "notebook": { + "referenceName": "SilverToGold", + "type": "NotebookReference" + }, + "snapshot": true, + "sparkPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "executorSize": "Small", + "conf": { + "spark.dynamicAllocation.enabled": null, + "spark.dynamicAllocation.minExecutors": null, + "spark.dynamicAllocation.maxExecutors": null + }, + "driverSize": "Small", + "numExecutors": null + } + }, + { + "name": "M365 Silver To Gold", + "description": "Execute the M365 silver_to_gold notebook", + "type": "SynapseNotebook", + "dependsOn": [], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "notebook": { + "referenceName": "M365_Silver_To_Gold", + "type": "NotebookReference" + }, + "snapshot": true, + "sparkPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "executorSize": "Small", + "conf": { + "spark.dynamicAllocation.enabled": null, + "spark.dynamicAllocation.minExecutors": null, + "spark.dynamicAllocation.maxExecutors": null + }, + "driverSize": "Small", + "numExecutors": null + } + }, + { + "name": "Sentiment Analysis", + "description": "Execute the sentiment analysis notebook", + "type": "SynapseNotebook", + "dependsOn": [ + { + "activity": "M365 Silver To Gold", + "dependencyConditions": [ + "Succeeded" + ] + } + ], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "notebook": { + "referenceName": "Sentiment_code_nltk", + "type": "NotebookReference" + }, + "snapshot": true, + "sparkPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "executorSize": "Small", + "conf": { + "spark.dynamicAllocation.enabled": null, + "spark.dynamicAllocation.minExecutors": null, + "spark.dynamicAllocation.maxExecutors": null + }, + "driverSize": "Small", + "numExecutors": null + } + } + ], + "annotations": [] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/pipeline/pl_source_to_raw.json b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_source_to_raw.json new file mode 100644 index 00000000..714b7da9 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/pipeline/pl_source_to_raw.json @@ -0,0 +1,55 @@ +{ + "name": "pl_source_to_raw", + "properties": { + "activities": [ + { + "name": "Salesforce Source to Raw", + "description": "This notebook will pull the data from Sales force using api.", + "type": "SynapseNotebook", + "dependsOn": [], + "policy": { + "timeout": "0.12:00:00", + "retry": 0, + "retryIntervalInSeconds": 30, + "secureOutput": false, + "secureInput": false + }, + "userProperties": [], + "typeProperties": { + "notebook": { + "referenceName": "Saslesforce_SourceToRaw", + "type": "NotebookReference" + }, + "snapshot": true, + "sparkPool": { + "referenceName": "synspgdcscin", + "type": "BigDataPoolReference" + }, + "executorSize": "Small", + "conf": { + "spark.dynamicAllocation.enabled": null, + "spark.dynamicAllocation.minExecutors": null, + "spark.dynamicAllocation.maxExecutors": null + }, + "driverSize": "Small", + "numExecutors": null + } + }, + { + "name": "M365 Source To Raw", + "description": "Call the pipeline to run the copy activity for loading M65 data.", + "type": "ExecutePipeline", + "dependsOn": [], + "userProperties": [], + "typeProperties": { + "pipeline": { + "referenceName": "pl_m365_source", + "type": "PipelineReference" + }, + "waitOnCompletion": true + } + } + ], + "annotations": [] + } +} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/publish_config.json b/solutions/graph-data-sales-analytics/synapse/publish_config.json new file mode 100644 index 00000000..6417cdf2 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/publish_config.json @@ -0,0 +1 @@ +{"publishBranch":"workspace_publish","enableGitComment":false} \ No newline at end of file diff --git a/solutions/graph-data-sales-analytics/synapse/sqlscript/ExternalTables.json b/solutions/graph-data-sales-analytics/synapse/sqlscript/ExternalTables.json new file mode 100644 index 00000000..5afba3c0 --- /dev/null +++ b/solutions/graph-data-sales-analytics/synapse/sqlscript/ExternalTables.json @@ -0,0 +1,17 @@ +{ + "name": "ExternalTables", + "properties": { + "content": { + "query": "-- Create analytics_db Database\n\n-- Create master key in databases with some password (one-off per database)\nCREATE MASTER KEY ENCRYPTION BY PASSWORD = '*********'\nGO\n\n-- Create databases scoped credential that use Managed Identity, SAS token or Service Principal. \n-- User needs to create only database-scoped credentials that should be used to access data source:\nCREATE DATABASE SCOPED CREDENTIAL WorkspaceIdentity\nWITH IDENTITY = 'Managed Identity'\nGO\nCREATE DATABASE SCOPED CREDENTIAL SasCredential\nWITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = '***********'\nGO\nCREATE DATABASE SCOPED CREDENTIAL SPNCredential WITH\nIDENTITY = '*************@https://login.microsoftonline.com/*********/oauth2/token' \n, SECRET = '***************'\nGO\n\n-- Create data source that one of the credentials above, external file format, and external tables that reference this data source and file format:\nCREATE EXTERNAL FILE FORMAT [SynapseParquetFormat] WITH ( FORMAT_TYPE = PARQUET)\nGO\n\n-- Delta Lake external table\nCREATE EXTERNAL FILE FORMAT DeltaLakeFormat WITH ( FORMAT_TYPE = DELTA );\nGO\n\nCREATE EXTERNAL DATA SOURCE mysample\nWITH ( LOCATION = 'https://.dfs.core.windows.net//'\n-- Uncomment one of these options depending on authentication method that you want to use to access data source:\n--,CREDENTIAL = WorkspaceIdentity \n--,CREDENTIAL = SasCredential \n--,CREDENTIAL = SPNCredential\n)\n\n-- opportunity table\nCREATE EXTERNAL TABLE [dbo].[Opportunity] (\n\t[Id] [varchar](100) NULL,\n\t[IsDeleted] [bit] NULL,\n\t[AccountId] [varchar](100) NULL,\n\t[Name] [varchar](115) NULL,\n\t[StageName] [varchar](100) NULL,\n\t[Amount] [float] NULL,\n\t[CloseDate] [varchar](50) NULL,\n\t[Type] [varchar](50) NULL,\n\t[IsClosed] [bit] NULL,\n\t[IsWon] [bit] NULL,\n\t[OwnerId] [varchar](50) NULL,\n\t[CreatedDate] [varchar](50) NULL,\n\t[LastModifiedDate] [varchar](50) NULL,\n\t[ContactId] [varchar](50) NULL\n) \nWITH \n(\n LOCATION = 'Opportunity', \n DATA_SOURCE = [sfsc_gold],\n FILE_FORMAT = [DeltaLakeFormat] \n);\n\n-- User Table\nCREATE EXTERNAL TABLE [dbo].[User]\n(\n\t[Id] [varchar](100) NULL,\n\t[AccountId] [varchar](255) NULL,\n\t[ContactId] [varchar](100) NULL,\n\t[CreatedById] [varchar](100) NULL,\n\t[CreatedDate] [varchar](100) NULL,\n\t[Department] [varchar](100) NULL,\n\t[Name] [varchar](255) NULL,\n\t[EmployeeNumber] [varchar](255) NULL,\n\t[IsActive] [bit] NULL,\n\t[LastModifiedById] [varchar](255) NULL,\n\t[LastModifiedDate] [varchar](255) NULL\n)\nWITH\n(\n\tLOCATION = 'User', \n\tDATA_SOURCE = [sfsc_gold],\n\tFILE_FORMAT = [DeltaLakeFormat]\n);\n\n\n-- Account Table\nCREATE EXTERNAL TABLE [dbo].[Account]\n(\n\t[Id] [varchar](100) NULL,\n\t[IsDeleted] [bit] NULL,\n\t[Name] [varchar](100) NULL,\n\t[AccountNumber] [varchar](100) NULL,\n\t[Industry] [varchar](100) NULL,\n\t[AnnualRevenue] [float] NULL,\n\t[Rating] [varchar](100) NULL,\n\t[OwnerId] [varchar](100) NULL,\n\t[LastModifiedDate] [varchar](100) NULL\n)\nWITH\n(\n\tLOCATION = 'Account', \n\tDATA_SOURCE = [sfsc_gold],\n\tFILE_FORMAT = [DeltaLakeFormat]\n);\n\n\n-- Contact Table\nCREATE EXTERNAL TABLE [dbo].[Contact]\n(\n\t[Id] [varchar](50) NULL,\n\t[IsDeleted] [bit] NULL,\n\t[AccountId] [varchar](50) NULL,\n\t[Name] [varchar](100) NULL,\n\t[OwnerId] [varchar](100) NULL,\n\t[CreatedDate] [varchar](100) NULL,\n\t[LastModifiedDate] [varchar](100) NULL\n)\nWITH\n(\n\tLOCATION = 'Contact', \n\tDATA_SOURCE = [sfsc_gold],\n\tFILE_FORMAT = [DeltaLakeFormat]\n);\n\n-- Email Table\nCREATE EXTERNAL TABLE [dbo].[Email] (\n email_id VARCHAR(255) ,\n sent_date_time VARCHAR(255),\n last_modified_date_time VARCHAR(255),\n body_preview VARCHAR(MAX),\n conversation_id VARCHAR(MAX),\n conversation_index VARCHAR(MAX),\n from_email_address VARCHAR(MAX),\n to_recipient_list VARCHAR(MAX),\n email_subject VARCHAR(MAX),\n email_text VARCHAR(MAX),\n opportunity_id VARCHAR(MAX),\n Scaled_Sentiment VARCHAR(MAX),\n Label VARCHAR(MAX)\n)\nWITH\n(\n\tLOCATION = 'EmailSentiment', \n\tDATA_SOURCE = [M365],\n\tFILE_FORMAT = [DeltaLakeFormat]\n);\n\n", + "metadata": { + "language": "sql" + }, + "currentConnection": { + "databaseName": "golddatabase", + "poolName": "Built-in" + }, + "resultLimit": 5000 + }, + "type": "SqlQuery" + } +} \ No newline at end of file