• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: Framework

New framework can train a robotic arm on 6 grasping tasks in less than an hour

December 17, 2020   Big Data

Advances in machine learning have given rise to a range of robotics capabilities including grasping, pushing, pulling, and other object manipulation skills. However, general-purpose algorithms to date have been extremely sample-inefficient, limiting their applicability to the real world. Spurred on by this, researchers at the University of California, Berkely developed a framework — Framework for Efficient Robotic Manipulation (FERM) — that leverages cutting-edge techniques to achieve what they claim is “extremely” sample-efficient robotic manipulation algorithm training. The coauthors say that, given only 10 demonstration amounting to 15 to 50 minutes of real-world training time, a single robotic arm can learn to reach, pick, move, and pull large objects or flip a switch and open a drawer using FERM.

McKinsey pegs the robotics automation potential for production occupations at around 80%, and the pandemic is likely to accelerate this shift. A report by the Manufacturing Institute and Deloitte found that 4.6 million manufacturing jobs will need to be filled over the next decade, and challenges brought on by physical distancing measures and a sustained uptick in ecommerce activity have stretched some logistics operations to the limit. The National Association of Manufacturers says 53.1% of manufacturers anticipate a change in operations due to the health crisis, with 35.5% saying they’re already facing supply chain disruptions.

FERM could help accelerate the shift toward automation by making “pixel-based” reinforcement learning — a type of machine learning in which algorithms learn to complete tasks from recorded demonstrations — more data-efficient. As the researchers explain in a paper, FERM first collects a small number of demonstrations and stores them in a “replay buffer.” An encoder machine learning algorithm pretrains on the demonstration data contained within the replay buffer. Then, a reinforcement learning algorithm in FERM trains on images “augmented” with data generated both by the encoder and the initial demonstrations.

According to the researchers, FERM is easy to assemble in that it only requires a robot, a graphics card, two cameras, a handful of demonstrations, and a reward function that guides the reinforcement learning algorithm toward a goal. In experiments, they say that FERM enabled an xArm to learn six tasks within 25 minutes of training time (corresponding to 20 to 80 episodes of training) with an average success rate of 96.7%. The arm could even generalize to objects not seen during training or demonstrations and deal with obstacles blocking its way to goal positions.

 New framework can train a robotic arm on 6 grasping tasks in less than an hour

“To the best of our knowledge, FERM is the first method to solve a diverse set of sparse-reward robotic manipulation tasks directly from pixels in less than one hour,” the researchers wrote. “Due to the limited amount of supervision required, our work presents exciting avenues for applying reinforcement learning to real robots in a quick and efficient manner.”

Open source frameworks like FERM promise to advance the state of the art in robotic manipulation, but there remain questions about how to measure progress. As my colleague Khari Johnson writes, metrics used to measure progress in robotic grasping can vary based on the task. For example, for robots operating in a mission-critical environment like space, accuracy matters above all.

“Under certain circumstances, if we have nice objects and you have a very fast robot, you can get there [human picking rates],” roboticist Ken Goldberg told VentureBeat in a previous interview. “But they say humans are like 650 per hour; that’s an amazing level. It’s very hard to beat humans. We’re very good. We’ve evolved over millions of years.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Monitoring the Power Platform: Power Apps – Power Apps Component Framework

July 6, 2020   Microsoft Dynamics CRM

Summary

 

The idea of monitoring Dynamics 365 or Model Driven Applications is not a new concept. Understanding where services are failing, users are interacting with the platform, where form and business processes could be tuned for performance are key drivers for most if not all businesses, from small companies to enterprises. Luckily, the Dynamics 365 platform provides many tools to help audit and monitor business and operational events.

This article will cover adding Azure Application Insights functionality to Power Apps Component Framework controls. In this article we will briefly touch on the subjects of Power Apps Component Framework and NPM packages. We will explore building and extending a sample Power Apps Component Framework control to send events to Azure Application Insights. We will conclude with context considerations and reviewing events.

What is Power Apps Component Framework?

 

“Power Apps component framework empowers professional developers and app makers to create code components for model-driven and canvas apps (public preview) to provide enhanced user experience for the users to work with data on forms, views, and dashboards.”

Power Apps Component Framework Controls can be added to both Canvas and Model Driven Applications to extend the user experience beyond the standard controls available. Examples include grids behaving like maps, number controls turned into dials or sliders and even integrations with Azure Cognitive services.

Power Apps Component Framework Controls render in context of the Model Driven Application form unlike web resources and as such behave like other controls. Multiple Power Apps Component Framework controls can be added to a form. Power Apps Component Framework controls are solution aware and can be added and migrated across environments just like other components of a solution.

Application Insights JavaScript GitHub Repo

 

The official Azure Application Insights JavaScript GitHub Repository contains all the information needed to download and install the SDK for use within a Power Apps Component Framework control. The Getting Started section describes creating an Azure Application Insights resource and installing the correct NPM package.

Application Insights NPM Package

 

To install the Azure Application Insights NPM package, use the following command (I use Visual Studio Code‘s terminal):

npm i --save @microsoft/applicationinsights-web

Implementing Application Insights in a PCF Control

 

This section assumes you have already covered the basics for creating a PCF control. If not, please refer to the official documentation. The next two sections can also be found in the Azure Application Insights JavaScript SDK GitHub repo. I’ve adjusted the code slightly to take into account the way PCF controls render.

Add Application Insights Dependencies

 

In the index.ts file, we first need to import the Azure Application Insights module:

import { ApplicationInsights, IEventTelemetry } from '@microsoft/applicationinsights-web'

The next step is to add an instance of the ApplicationInsights object. I added mine to the PCF control class that allowed for use across the different methods used by PCF. That said you may choose to implement Azure Application Insights in another way, such as in the global namespace, so where and how you decide to create this object is up to the developer.

  pcfControlAppInsights = new ApplicationInsights({ config: {
    instrumentationKey: '<your instrumentation key>',
    enableResponseHeaderTracking: true,
    enableRequestHeaderTracking: true
    /* ...Other Configuration Options... */
   } });

As shown above, various configuration properties can be set here. These include how often messages are sent, how long sessions last, if sampling should be used, etc. I’m including request and response header tracking in this example, but many others exist. For additional properties, review the ApplicationInsights-JS GitHub documentation.

Initialize Application Insights

 

Once the ApplicationInsights object has been created for use, the first step is to initialize the instance.

this.pcfControlAppInsights.loadAppInsights();

The Azure Application Insights module only expects to be loaded once, attempting to do so again could result in an error.

Error: Core should not be initialized more than once
    at AppInsightsCore.BaseCore.initialize (webpack://pcf_tools_652ac3f36e1e4bca82eb3c1dc44e6fad/./node_modules/@microsoft/applicationinsights-core-js/dist-esm/JavaScriptSDK/BaseCore.js?:46:13)
    at AppInsightsCore.initialize (webpack://pcf_tools_652ac3f36e1e4bca82eb3c1dc44e6fad/./node_modules/@microsoft/applicationinsights-core-js/dist-esm/JavaScriptSDK/AppInsightsCore.js?:33:33)
    at Initialization.loadAppInsights (webpack://pcf_tools_652ac3f36e1e4bca82eb3c1dc44e6fad/./node_modules/@microsoft/applicationinsights-web/dist-esm/Initialization.js?:299:16)

To account for this, the core includes a isInitialized method to help determine is the SDK has been loaded. To extend the previous example, include it within a conditional check:

   if (!this.pcfControlAppInsights.core.isInitialized!()){
    //Use this to load the app insights dependencies for use.
    this.pcfControlAppInsights.loadAppInsights();
    }

Once loaded, the Application Insights object can be used immediately to record page views, exceptions or any event that needs to be documented. To begin using and seeing an example, simply add a call to the trackPageView method which requires zero inputs.

//Send Page View Event
this.pcfControlAppInsights.trackPageView();

Usage across PCF Control Methods

 

PCF Controls have three main methods used within the lifecycle of a control consisting of the initialization of the control (init), changes to the control (updateView) and removal of the control (destroy). The usage of PCF controls vary greatly and other methods can be used for input output, HTML elements within the control, etc which will not be covered here. If you’d like to learn more, there are many great tools and open source projects that go into this at great length.

For this article just be aware that Azure Application Insights can be used to track virtually any callback with a control. Examples include user clicks of a button, responses from API calls, or as described above, when and how a control is created or initialized.

Here is an example of tracking a custom event within the initialize method:

AUTHOR NOTE: CLICK ON EACH IMAGE TO ENLARGE FOR DETAIL

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

Here is an example of tracking a custom event within the update view method:

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

Here is an example of tracking a custom event within the get outputs method:

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

Here is an example of tracking a custom event within the destroy method:

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

Configure Context

 

Adding context to Azure Application Insights messages is extremely important and vital to leveraging the robust feature set associated with the service. That said, out of the box, the SDK provides automatic context properties and allows developers the ability to overwrite. This provides the potential to select well known values or correlating identifiers such as form or entity Ids in Dynamics 365, system user Ids, Model Driven Application Ids and so on.

The image below shows a captured Fiddler trace of a default context properties.

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

To add or modify context for all messages sent to Azure Application Insights, use the addTelemetryInitializer method. This method provides the ability to pass in a custom envelope which can be extended to modify or extend properties of an Azure Application Insights message. Here, developers can set cloud role and instance, operation names and identifiers, and even custom properties that will be included in each record in the Azure Application Insights tables.

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

The first part of the code above creates an object that contains Azure Application Insights tags and custom properties. These tags will align with the session and operational context for each message sent from the PCF control.

var telemetryInitializer = (envelope) => {
    envelope.tags["ai.cloud.role"] = orgSettings._customControlExposedOrgSettings.organizationId;
    envelope.tags["ai.cloud.roleInstance"] = response;
    envelope.tags["ai.session.id"] = window.URL;
    envelope.tags["ai.operation.name"] = QnAMakerControl.name;
    envelope.data.cloudRole = 'just checking in';
    }

In the sample code above the cloud role and instance are set to the Dynamics 365 organization and Model Driven Application identifier. The values used here will impact how the Application Map in Azure Application Insights renders.

The operation name is the class name of the PCF control. Operations include both the core operation and the parent, this could be thought of as the core operation being the event (initialization of the control) and the parent being the workblock (loading of the Dynamics 365 form).

Next, add the ITelemetryItem created above using the addTelemetryInitializer method:

//Add the context properties
this.pcfControlAppInsights.addTelemetryInitializer(telemetryInitializer);

To reiterate, setting the context, while desired for many reasons, is completely optional. Following the Implementing Application Insights in a PCF control section above will work just fine.

At this point session, role and operation data points have been set, one other point I’d recommend reviewing is adding the user identifier. This can be added as a custom property within the custom dimensions property bag but for this example we will use the Authenticated User property on the core ApplicationInsights object.

//Set user authenticated
this.pcfControlAppInsights.setAuthenticatedUserContext(context.userSettings.userId, undefined, true);

The sample code above uses the userId provided from the context for Model Driven Applications. Consider a different approach if working with Canvas Driven Applications.

Reviewing Application Insights

 

Once the control is implemented with a Model Driven Application, data will being to flow to Azure Application Insights. An interesting note here is that the Azure Application Insights SDK is also used by the platform. In some cases the events sent from your Model Driven Application may be delivered in the same envelope! Don’t worry, this has no impact to the platform and will not break if the platform changes how it delivers messages.

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

Once delivered, user session and events will begin to appear. The below example shows the init method, followed by the updateview and getOutputs with user interaction in between.

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

Example of a custom event record:

 Monitoring the Power Platform: Power Apps   Power Apps Component Framework

This reference includes all the sample code above and additional usage showing how to send various events to Azure Application Insights.

Next Steps

 

In this article we have discussed how to add Azure Application Insights to Power Apps Component Framework controls. Continue exploring and evaluating how the initialization of the ApplicationInsights object can be extended with different properties, such as enableAutoRouteTracking

In an upcoming article, we will explore the tools within Azure Application Insights that rely on the context properties discussed at the end of this current article. Configuring both the actual delivery mechanism and adding context will help expose insights from the data collected.

If you are interested in learning more about specialized guidance and training for monitoring or other areas of the Power Platform, which includes a monitoring workshop, please contact your Technical Account Manager or Microsoft representative for further details.

Your feedback is extremely valuable so please leave a comment below and I’ll be happy to help where I can! Also, if you find any inconsistencies, omissions or have suggestions, please go here to submit a new issue.

Index

 

Monitoring the Power Platform: Introduction and Index

Let’s block ads! (Why?)

Dynamics 365 Customer Engagement in the Field

Read More

Implementing a Data Quality Monitoring Framework

May 9, 2020   BI News and Info

I. Introduction

Data Quality is imperative to the success of any Business Intelligence solution. In today’s era, where the data is being ingested from multiple sources, the quality of the data might differ between sources. The first step to a successful reporting application implementation is to ensure that only the highest quality data makes it to the reporting tables. In order to achieve this, a Data Quality Monitoring framework should be implemented.

Some of the characteristics of the Data Quality Framework are as follows:

  • The Data Quality Monitoring framework should have a predefined set of rules specific to a Data Domain.
  • The Data Quality Monitoring framework should be scalable, i.e., it should support the addition of new rules and should be able to support data quality monitoring across multiple data domains.
  • The Data Quality Monitoring framework should have the ability to enable and disable rules to be executed against the data.
  • All the results from Data Quality monitoring should be stored in a rules results repository.

II. Data Quality Monitoring Framework Process Flow

The process starts with the source data file(s) landing on the SQL Server or any ETL Server. Once files are detected, the Pre-Stage Data Quality rules are executed. Data Stewards receive a notification once Pre-Stage rules are executed, and results are available for data stewards for review. The processing stops if any of the Pre-Stage “Gating” Data Quality rules have failed. The process continues to load data into Stage table only if none of the Pre-Stage Gating rules has failed or if the Data Steward has chosen to override the failure. The process then loads data into Stage Table. After this, the post-stage Data Quality Rules are executed, and Data Stewards are notified when the results are available for review. If there is NO Gating rules failure, then the process automatically publishes a validated data file for the downstream systems to use. If any of the post-stage Gating rules have failed, then the Data Steward could decide to either abandon the cycle and request a new file from source or override the failure in order to publish data files for downstream systems.

word image 61 Implementing a Data Quality Monitoring Framework

Figure 1: Process Flow Diagram for Data Quality Monitoring Framework. Data

The following section of the article will concentrate on the Implementation of Data Quality Rules (Pre-Stage & Post Stage) using Database tables.

III. Data Quality Monitoring Data Mart:

To implement Data Quality Monitoring Framework, a Data Quality Data Mart is needed.

A Data Quality Data Mart would have tables to provide the following capabilities.

  • A table to store all predefined Data Quality rules. (DATA_QUALITY_RULE table)
  • A table to provide the ability to turn on and turn off rules and store threshold percentages for each rule for its corresponding data domain (DATA_QUALITY_RULE_EXECUTE table).
  • A table to store the results of Data Quality Rules to serve as a Data Quality Rule Monitoring results repository (DATA_QUALITY_RULE_RESULTS).

A possible model would look like Figure 2

word image 62 Implementing a Data Quality Monitoring Framework

Figure 2: The data quality model

Here are the definitions for the three tables.

A. DATA_QUALITY_RULE Table

This table holds all the predefined Data Quality Rules. These rules could be for one data domain or multiple data domains.

DDL for creating the DATA_QUALITY_RULE table:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

CREATE TABLE DATA_QUALITY_RULE

(

    DATA_QUALITY_RULE_ID INTEGER  NOT NULL,

    RULE_NAME VARCHAR(60)  NOT NULL,

    RULE_DESCRIPTION VARCHAR(256)  NOT NULL,

    RULE_TYPE_CODE VARCHAR(20) NOT NULL,

    RULE_LEVEL_CODE VARCHAR(20) NOT NULL,

    SCHEMA_NAME VARCHAR(60),

    TABLE_NAME VARCHAR(60),

    DATA_NAME VARCHAR(60),

    RULE_SCRIPT_TYPE_CODE VARCHAR(20),

    RULE_SCRIPT_TEXT VARCHAR(256),

    CREATE_PROCESS_ID BIGINT NOT NULL,

    CREATE_PROCESS_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_USER_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_SYSTEM_TIMESTAMP DATETIME NOT NULL,

    CONSTRAINT [PK_DATA_QUALITY_RULE_ID] PRIMARY KEY  ([DATA_QUALITY_RULE_ID])

);

Column Name

Column Details

DATA_QUALITY_RULE_ID

Unique Rule Identifier

RULE_NAME

Name of the Rule

RULE_DESCRIPTION

Details about the Rule

RULE_TYPE_CODE

Describes if Rule is “GATING” or “PASSIVE”

RULE_LEVEL_CODE

Level at which rule is executed at file or table or column level.

SCHEMA_NAME

Schema of the Table or Schema of File.

TABLE_NAME

Table that holds the data on which Data Quality Rules need to execute.

DATA_NAME

Column Name on which rule is executed

RULE_SCRIPT_TYPE_CODE

Code for Detecting if Rule shall pass or Fail

RULE_SCRIPT_TEXT

Description regarding RULE_SCRIPT_TYPE_CODE

CREATE_PROCESS_ID

Process ID that loaded data in DATA_QUALITY_RULE Table

CREATE_PROCESS_NAME

Process Name that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_USER_NAME

Service Account that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_SYSTEM_TIMESTAMP

Timestamp when data that rule got inserted in DATA_QUALITY_RULE table

B. DATA_QUALITY_RULE_EXECUTE Table

This table holds information related to whether the rule is active or not, and the threshold percentage values against which data quality is measured to pass or fail a rule.

DDL for creating DATA_QUALITY_RULE_EXECUTE table

1

2

3

4

5

6

7

8

9

10

11

12

13

14

CREATE TABLE DATA_QUALITY_RULE_EXECUTE

(

    DATA_QUALITY_DATA_SOURCE_CODE VARCHAR(20) NOT NULL,

    DATA_QUALITY_RULE_ID INTEGER  NOT NULL,

    RULE_PROCESS_STEP_CODE VARCHAR(256)  NOT NULL,

    RULE_EXECUTE_STATUS_CODE VARCHAR(20) NOT NULL,

    THRESHOLD_PASS_PERCENT VARCHAR(20) NOT NULL,

    CREATE_PROCESS_ID BIGINT NOT NULL,

    CREATE_PROCESS_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_USER_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_SYSTEM_TIMESTAMP DATETIME NOT NULL,

    FOREIGN KEY ([DATA_QUALITY_RULE_ID]) REFERENCES [DATA_QUALITY_RULE] ([DATA_QUALITY_RULE_ID])

   ON DELETE NO ACTION ON UPDATE NO ACTION

);

Column Name

Column Details

DATA_QUALITY_DATA_SOURCE_CODE

Data Domain or Source of the Data

DATA_QUALITY_RULE_ID

Unique Rule Identifier

RULE_PROCESS_STEP_CODE

Step at which data rule is being applied on Data (PRE_STAGE/POST_STAGE)

RULE_EXECUTE_STATUS_CODE

Indicates the Status of Rule. “P” Indicates “Pass”; “F” Indicates “Fail”, “O” Indicates “Override”

THRESHOLD_PASS_PERCENT

Threshold Percent that if met cause causes rule to “Pass” else will cause it to “Fail”

CREATE_PROCESS_ID

Process ID that loaded data in DATA_QUALITY_RULE Table

CREATE_PROCESS_NAME

Process Name that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_USER_NAME

Service Account that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_SYSTEM_TIMESTAMP

Timestamp when data that rule got inserted in DATA_QUALITY_RULE table

C. DATA_QUALITY_RULE_RESULTS Table

This table is a repository to store the results of Data Quality Rule monitoring.

DDL for creating DATA_QUALITY_RULE_RESULTS table:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

CREATE TABLE DATA_QUALITY_RULE_RESULTS

(

DATA_QUALITY_RESULT_ROW_ID BIGINT NOT NULL,  

DATA_QUALITY_DATA_SOURCE_CODE VARCHAR(20) NOT NULL,

DATA_QUALITY_RULE_ID INTEGER  NOT NULL,

RULE_PROCESS_STEP_CODE VARCHAR(20) NOT NULL,

TABLE_ROW_IDENTIFIER BIGINT,

   SCHEMA_NAME VARCHAR(60),

   TABLE_NAME VARCHAR(60),

   DATA_NAME VARCHAR(60),

   DATA_VALUE VARCHAR(250),

     RULE_TYPE_CODE VARCHAR(20),

     RULE_LEVEL_CODE VARCHAR(20),

     PASS_FAIL_OVERRIDE_CODE CHAR(1),

   PASS_COUNT INTEGER,

   FAIL_COUNT INTEGER,

   SYSTEM_PARTITION_IDENTIFIER VARCHAR(60),

     CREATE_PROCESS_ID BIGINT NOT NULL,

   CREATE_PROCESS_NAME VARCHAR(35) NOT NULL,

   CREATE_RECORD_USER_NAME VARCHAR(35) NOT NULL,

   CREATE_RECORD_SYSTEM_TIMESTAMP DATETIME NOT NULL,

   CONSTRAINT [PK_DATA_QUALITY_RESULT_ROW_ID] PRIMARY KEY  ([DATA_QUALITY_RESULT_ROW_ID]),

   FOREIGN KEY ([DATA_QUALITY_RULE_ID]) REFERENCES [DATA_QUALITY_RULE] ([DATA_QUALITY_RULE_ID])

ON DELETE NO ACTION ON UPDATE NO ACTION

);

Column Name

Column Details

DATA_QUALITY_RESULT_ROW_ID

Unique Identifier for each record in DATA_QUALITY_RULE

DATA_QUALITY_DATA_SOURCE_CODE

Data Domain or Source of the Data

DATA_QUALITY_RULE_ID

Unique Rule Identifier

RULE_PROCESS_STEP_CODE

Step at which data rule is being applied on Data (PRE_STAGE/POST_STAGE)

TABLE_ROW_IDENTIFIER

Unique Identifier from the Source table.

SCHEMA_NAME

Schema of the Table or Schema of File.

TABLE_NAME

Table that holds the data on which Data Quality Rules need to execute.

DATA_NAME

Column Name on which rule is executed

DATA_VALUE

Data Value

RULE_TYPE_CODE

Describes if Rule is “GATING” or “PASSIVE”

RULE_LEVEL_CODE

Level at which rule is executed at file or table or column level.

PASS_FAIL_OVERRIDE_CODE

Status of Data Quality Rule (Pass or Fail or Override)

PASS_COUNT

Count of Records that Passed the Rule

FAIL_COUNT

Count of Records that Failed the Rule

SYSTEM_PARTITION_IDENTIFIER

Partitioning key for DATA_QUALITY_RULE table

CREATE_PROCESS_ID

Process ID that loaded data in DATA_QUALITY_RULE Table

CREATE_PROCESS_NAME

Process Name that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_USER_NAME

Service Account that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_SYSTEM_TIMESTAMP

Timestamp when data that rule got inserted in DATA_QUALITY_RULE table

IV. Understanding Data Quality Monitoring Implementation with an Example:

In order to understand the above Data Quality Framework, I will walk through an example as follows:

A. Source File

The Source File is a pipe-delimited file with Invoice data and its corresponding Control file.

InvoiceId|CustomerId|InvoiceDate|BillingAddress|BillingCity|BillingState|BillingCountry|BillingPostalCode|Total|StoreID

100|23|2007-01-11|69 Salem Street|Boston|MA|USA|2113|13.86|1

200|16|2007-02-19|1600 Amphitheatre Parkway|Mountain View|CA|USA|94043-1351|0.99|2

300|17|2007-03-04|1 Microsoft Way|Redmond|WA|USA|98052-8300|1.98|3

400|19|2007-03-05|1 Infinite Loop|Cupertino|CA|USA|95014|1.98|4

500|21|2007-03-06|801 W 4th Street|Reno|NV|USA|89503|3.96|5

B. Control File

The Control File has details related to the source file like record counts and other important details.

DataFileName|ControlFileName|DataFileCount|MinInvoiceDate|MaxInvoiceDate|TotalAmount

Invoice_202001.dat|invoice.ctl|5|2007-01-11|2007-02-28|22.77

C. DATA_QUALITY_RULE Table

The Data Quality Rule table is a repository of all rules that need to be executed against the data to gauge the quality of data.

There are 2 types of data quality rules.

  1. Gating Rules
  2. Passive Rules

Gating Rules: Gating Rules are critical rules. These are rules that are critical to certify the quality of data. If any one of these rules fails, that means the data is not good enough to be loaded into application tables. The source data should be either corrected, or a new version of file needs to be requested from the source system. In that case, the data quality rules need to be rerun on new file and gating rules should pass for the data to be loaded into underlying reporting application tables. When “Gating” data quality rule fails, the ETL cycle would stop and would not proceed until either the new data is requested or that gating rule passes, or data stewards do a manual override.

Passive Rules: Passive Rules are the rules that are good to have but are not very critical. That means data is still useful for analytical reporting. If a passive data quality rule fails, then also data would be allowed to flow down to downstream applications.

In order to understand I’ll walk you through the following five rules.

Rule 1:

Data Quality Rule to Check the Source Schema/Layout matches the expected layout.

Rule 2:

Data Quality Rule to verify the same file has not been processed earlier (Duplicate File Check)

Rule 3:

Data Quality Rule to verify each record in the data file has an Invoice Date between the Minimum Invoice Date and Maximum Invoice Data and write failed records to the results table.

Rule 4:

Data Quality Rule to provide a count of passed/failed records for Invoice Date between the Minimum Invoice Date and Maximum Invoice Data.

Rule 5:

Data Quality Rule to provide a count of passed/failed records for StoreID being null.

Please refer to DataQuality_InsertStatements.txt for insert statements.

Note: I am inserting rules in the DATA_QUALITY_RULE table using Insert Statement (for demonstration/explanation purposes for each record). A preferable way would be to load all rules from a pipe separated file using an ETL tool or script.

RULE_SCRIPT_TYPE_CODE: The rule script type code field defines when a rule passes or fails and when records are supposed to be loaded to the DATA_QUALITY_RULE_RESULTS table.

To understand how this works, review this example:

RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_NE_F_F”

‘COMP_THRE_VAL’ stands for Compare Threshold values, i.e. compare the threshold value of the actual threshold vs the expected threshold.

NE – Stands for Not Equal, i.e. if the actual data threshold percent % and expected threshold percent (from DATA_QUALITY_RULE_EXECUTE) are not equal. Other possible values are

“EQ” stands for “Equal to”

“GE” stands for “Greater than or equal to”

“GT” stands for “Greater than”

“LE” stands for “Less than or equal to”

“LT” stands for “Less than”

F – Stands for action to be taken regarding the status of the rule. In this case, if the expected threshold percent is not equal to actual the data threshold percent, then fail the rule. The other possible value is “P”.

F – Stands for write the failed records to the DATA_QUALITY_RULE_RESULTS table. “P” Stands for write-only passed records and “E” stands for loading everything, i.e. both pass and fail records to be written to DATA_QUALITY_RULE_RESULTS table.

D. DATA_QUALITY_RULE_EXECUTE

This table is used to control the activation and inactivation of a data quality rule. If a rule is marked as active in the DATA_QUALITY_RULE_EXECUTE table, then that rule will be executed against the data, and corresponding data metrics would be captured and loaded in DATA_QUALITY_RULE_RESULTS table. If a rule is marked inactive in the DATA_QUALITY_RULE_EXECUTE table, then that rule will NOT be executed against the source data, and its metrics are not captured in the table.

Data Quality rules would be executed mostly at two places in the process as follows:

  1. Pre-Stage DQ Rules – Before loading into the Stage Table
  2. Post Stage DQ Rules – After loading into the Stage Table

Pre-Stage DQ Rules:

The Pre-Stage DQ rules are executed before the data gets loaded into the Stage Table.

The intent of having Pre-Stage DQ rules is to avoid loading bad data in stage tables and to avoid abortion of the ETL load process because of an unexpected file layout.

Some of the examples for Pre-Stage Data Quality rules are:

  1. Schema/layout validation of input file.
  2. Duplicate File Check
  3. Control File and Data File Count Match.

Post-Stage DQ Rules: Post-Stage DQ rules are executed after the data is loaded into the staging table. Post-Stage DQ rules would typically validate data against business-critical fields and produce metrics.

Please refer to DataQuality_InsertStatements.txt for insert statements.

Below is an explanation for each Insert statement into DATA_QUALITY_RULE_EXECUTE table

Rule 1 (Schema Layout Verification Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “PRE_STAGE” rule that means it should be executed before the Invoice data file is even loaded into an Invoice Stage table

Rule 2 (Duplicate File Validation Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “PRE_STAGE” rule that means it should be executed before the Invoice data file is even loaded into Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 100, meaning that even if a single record does not meet the criteria, the rule will fail.

Rule 3 (Invoice Date Validation Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “POST_STAGE” rule meaning that it should be executed after the Invoice data file is loaded into Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 100, meaning that even if a single record does not meet the criteria, the rule will fail.

Rule 4 (Invoice Date Summary Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “POST_STAGE” rule that means it should be executed after the Invoice data file is loaded into the Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 100, meaning that even if a single record does not meet the criteria, the rule will fail.

Rule 5 (Store ID Summary Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “POST_STAGE” rule that means it should be executed after the Invoice data file is loaded into the Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 90%, meaning that 90% or more of the records should have StoreID populated for the rule to pass.

E. DATA_QUALITY_RULE_RESULTS

The DATA_QUALITY_RULE_RESULTS table is a repository where all the data quality metrics for the rule executions are stored. The table is partitioned on SYSTEM_PARTITION_IDENTIFIER. A new partition is created for each month for each data source.

Data Quality Rule Results are generated by calculating the expected and actual results. Both expected and actual results are compared, and if the actual results match the expected results, the rule pass else would be marked as failed.

F. Expected Results:

Multiple steps are involved in generating expected results.

word image 63 Implementing a Data Quality Monitoring Framework

Figure 3: The expected results

In order to generate expected results, three steps are involved.

  1. Join the Data Quality Rule table with Data Quality Rule Execute table.
  2. Pivot the Control File Data
  3. Join the Data Quality Rule tables data with control file data

1. Join Data Quality Rule and Data Quality Rule Execute table as follows:

The SQL query below joins the Data_Quality_Rule table to the Data_Quality_Rule_Execute table on Rule ID to create a flattened record with all Data Quality Rules Information.

Please Note: You could add a where condition on DATA_QUALITY_DATA_SOURCE_CODE to filter the application for which you like to run the data quality rule.

1

2

3

4

5

6

7

8

9

10

11

12

13

select

dq.DATA_QUALITY_RULE_ID,

dq.RULE_TYPE_CODE,

dq.RULE_LEVEL_CODE,

dq.SCHEMA_NAME,

dq.TABLE_NAME,

dq.DATA_NAME,

dq.RULE_SCRIPT_TYPE_CODE,

dq.RULE_SCRIPT_TEXT,

dqr.RULE_PROCESS_STEP_CODE,

dqr.RULE_EXECUTE_STATUS_CODE,

dqr.THRESHOLD_PASS_PERCENT

from data_quality_rule dq inner join DATA_QUALITY_RULE_EXECUTE dqr on dq.DATA_QUALITY_RULE_ID=dqr.DATA_QUALITY_RULE_ID

The joined data would look something like this.

DATA_QUALITY_RULE_ID|RULE_TYPE_CODE|RULE_LEVEL_CODE|SCHEMA_NAME|TABLE_NAME|DATA_NAME|RULE_SCRIPT_TYPE_CODE|RULE_SCRIPT_TEXT|RULE_PROCESS_STEP_CODE|RULE_EXECUTE_STATUS_CODE|THRESHOLD_PASS_PERCENT

1|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|SCHEMA_VALIDATION|COMP_THRE_VAL_EQ_P_E|Val_SchemaFile_Chk|PRE_STAGE|A|100.00

2|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|DUPLICATE_VALIDATION|COMP_THRE_VAL_EQ_P_E|Val_DuplicateFile_Chk|PRE_STAGE|A|100.00

3|GATING|COLUMN|dbo|Invoice|InvoiceDate|COMP_THRE_VAL_NE_F_F|Val_InvoiceDate_Chk|POST_STAGE|A|100.00

4|PASSIVE|TABLE|dbo|Invoice|InvoiceDate|COMP_THRE_VAL_EQ_P_E|InvoiceDate_Chk_Summary|POST_STAGE|A|100.00

5|PASSIVE|TABLE|dbo|Invoice|StoreID|COMP_THRE_VAL_EQ_P_F|StoreID_CHK|POST_STAGE|A|90.00

2. Pivot the Control File Data.

This step pivots the data in the control file. Pivoting is nothing more than converting columns to rows.

Control File before pivoting (Also shown in above sections)

DataFileName|ControlFileName|DataFileCount|MinInvoiceDate|MaxInvoiceDate|TotalAmount

Invoice_202001.dat|invoice.ctl|5|2007-01-11|2007-02-28|22.77

Control File After pivoting:

DATA_NAME|DATA_VALUE

DataFileName|Invoice_202001.dat

ControlFileName|Invoice.ctl

DataFileCount|5MinInvoiceDate|2007-01-11

MaxInvoiceDate|2007-02-28

TotalAmount|22.87

Pivoting helps joining control file details with corresponding Data Quality Rule Information (from step 1).

3. Joining the Data Quality Rule Tables Data with Control File Data

This is the final step to create the expected results. In this, you join the pivoted control file data (Step2) with Data Quality Rule tables information (Step1) on “DATA_NAME.” This creates the expected results as follows. Expected results are nothing but control file data associated with corresponding data quality rule information.

DATA_QUALITY_RULE_ID|RULE_TYPE_CODE|RULE_LEVEL_CODE|DATA_NAME|DATA_VALUE|RULE_SCRIPT_TYPE_CODE|RULE_SCRIPT_TEXT|RULE_PROCESS_STEP_CODE|RULE_EXECUTE_STATUS_CODE|THRESHOLD_PASS_PERCENT

1|GATING|FILE|Schema_Validation|Y|COMP_THRE_VAL_EQ_P_E|Val_SchemaFile_Chk|PRE_STAGE|A|100.00

2|GATING|FILE|Duplicate_Validation|Y|COMP_THRE_VAL_EQ_P_E|Val_DuplicateFile_Chk|PRE_STAGE|A|100.00

3|GATING|COLUMN|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_NE_F_F|Val_InvoiceDate_Chk|POST_STAGE|A|100.00

4|PASSIVE|TABLE|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_EQ_P_E|InvoiceDate_Chk_Summary|POST_STAGE|A|0.00

5|PASSIVE|TABLE|StoreID|NOTNULL|COMP_THRE_VAL_EQ_P_F|StoreID_CHK|POST_STAGE|A|90.00

G. Actual Results:

There are two steps involved in creating the actual results as follows:

  1. Pivot the source data file.
  2. Join pivoted source data with Data Quality Rule table and ETL Process Metadata

word image 64 Implementing a Data Quality Monitoring Framework

Figure 4: The actual results

1. Pivot the source data file:

Below is the source file that has been mentioned in an earlier section of this article.

Source File

InvoiceId|CustomerId|InvoiceDate|BillingAddress|BillingCity|BillingState|BillingCountry|BillingPostalCode|Total|StoreID

100|23|2007-01-11|69 Salem Street|Boston|MA|USA|2113|13.86|1

200|16|2007-02-19|1600 Amphitheatre Parkway|Mountain View|CA|USA|94043-1351|0.99|2

300|17|2007-03-04|1 Microsoft Way|Redmond|WA|USA|98052-8300|1.98|3

400|19|2007-03-05|1 Infinite Loop|Cupertino|CA|USA|95014|1.98|4

500|21|2007-03-06|801 W 4th Street|Reno|NV|USA|89503|3.96|5

Pivot the source file data with InvoiceID as the key for each record. Pivoting is a process for transforming row column-level data to the row level. The data would look like below.

Please Note: This is a sample set and not the entire data set.

1

2

3

4

5

6

7

8

9

10

11

InvoiceID|DataName|DataValue

100|InvoiceDate|2007-01-11

200|InvoiceDate|2007-02-19

300|InvoiceDate|2007-03-04

400|InvoiceDate|2007-03-05

500|InvoiceDate|2007-03-06

100|StoreID|1

200|StoreID |2

300|StoreID |3

400|StoreID |4

500|StoreID |5

2. Join Source File Pivot Data with Data Quality Rule table and ETL Process Metadata

In this step, take the pivoted source data (step 1) and join with the Data Quality Rule table to attach Rule ID to each pivoted source data record.

Please Note: The join to ETL process metadata is optional. If there are no Data Quality Rules related to ETL metadata, this join is not needed.

The ETL process metadata is standard metadata information captured by ETL processes like start and end time of ETL executions, if the source file met the expected schema, if the source file is a new file or a duplicate file based on the file name.

Once joining pivoted source data with Data Quality Rule table, you would then join it with ETL process metadata to get information related to Rule 1. This is related to file layout/schema verification of source file and also for Rule 2 that is related to duplicate source file names.

The following shows after validating for Rule 1, there is a Data Value of 5 and Pass Count as 5. This indicates that there were 5 rows in the file, and all 5 of them passed the validation. This information is coming from the ETL process metadata. Similarly, for Rule 2 related to Duplication Validation, the pass count is 1, and a fail count is 0. If this were a duplicate file, there would be a pass count of 0 and fail count of 1. For all the other rules, Pass count and Fail count are not populated yet because they are dependent on matching this actual data with expected data.

Data After Joining source pivoted data with Data Quality rule and ETL process metadata.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

RuleID|InvoiceID|DataName|DataValue|Pass_Count|Fail_Count

1|0|Schema_Validation|5|5|0

2|0|Duplicate_Validation|0|1|0

3|100|InvoiceDate|2007-01-11|NULL|NULL

3|200|InvoiceDate|2007-02-19|NULL|NULL

3|300|InvoiceDate|2007-03-04|NULL|NULL

3|400|InvoiceDate|2007-03-05|NULL|NULL

3|500|InvoiceDate|2007-03-06|NULL|NULL

4|100|InvoiceDate|2007-01-11|NULL|NULL

4|200|InvoiceDate|2007-02-19|NULL|NULL

4|300|InvoiceDate|2007-03-04|NULL|NULL

4|400|InvoiceDate|2007-03-05|NULL|NULL

4|500|InvoiceDate|2007-03-06|NULL|NULL

5|100|StoreID|1|NULL|NULL

5|200|StoreID |2|NULL|NULL

5|300|StoreID |3|NULL|NULL

5|400|StoreID |4|NULL|NULL

5|500|StoreID |5|NULL|NULL

H. Generating Data Quality Rule Results.

Data Quality Rule Results are generated by comparing the Expected Results with Actual Results. The ETL logic compares expected results with actual results to derive Data Quality metrics and then loads it into the Data Quality Rule Results table.

Expected Results (These were derived in above section of article):

DATA_QUALITY_RULE_ID|RULE_TYPE_CODE|RULE_LEVEL_CODE|SCHEMA_NAME|TABLE_NAME|DATA_NAME|DATA_VALUE|RULE_SCRIPT_TYPE_CODE|RULE_SCRIPT_TEXT|RULE_PROCESS_STEP_CODE|RULE_EXECUTE_STATUS_CODE|THRESHOLD_PASS_PERCENT

1|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|SCHEMA_VALIDATION|NULL|COMP_THRE_VAL_EQ_P_E|Val_SchemaFile_Chk|PRE_STAGE|A|100.00

2|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|DUPLICATE_VALIDATION|NULL|COMP_THRE_VAL_EQ_P_E|Val_DuplicateFile_Chk|PRE_STAGE|A|100.00

3|GATING|COLUMN|dbo|Invoice|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_NE_F_F|Val_InvoiceDate_Chk|POST_STAGE|A|100.00

4|PASSIVE|TABLE|dbo|Invoice|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_EQ_P_E|InvoiceDate_Chk_Summary|POST_STAGE|A|100.00

5|PASSIVE|TABLE|dbo|Invoice|StoreID|NULL|COMP_THRE_VAL_EQ_P_F|StoreID_CHK|POST_STAGE|A|90.00

Actual Results (These were derived in above section of article):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

RuleID|InvoiceID|DataName|DataValue|Pass_Count|Fail_Count

1|0|Schema_Validation|5|5|0

2|0|Duplicate_Validation|0|1|0

3|100|InvoiceDate|2007-01-11|NULL|NULL

3|200|InvoiceDate|2007-02-19|NULL|NULL

3|300|InvoiceDate|2007-03-04|NULL|NULL

3|400|InvoiceDate|2007-03-05|NULL|NULL

3|500|InvoiceDate|2007-03-06|NULL|NULL

4|100|InvoiceDate|2007-01-11|NULL|NULL

4|200|InvoiceDate|2007-02-19|NULL|NULL

4|300|InvoiceDate|2007-03-04|NULL|NULL

4|400|InvoiceDate|2007-03-05|NULL|NULL

4|500|InvoiceDate|2007-03-06|NULL|NULL

5|100|StoreID|1|NULL|NULL

5|200|StoreID |2|NULL|NULL

5|300|StoreID |3|NULL|NULL

5|400|StoreID |4|NULL|NULL

5|500|StoreID |5|NULL|NULL

Comparing Expected Results with Actual Results

The expected results are joined with actual results on column Data Quality Rule ID, and the other ETL process compares the actual vs expected and produces Data Quality Rule Metrics.

word image 65 Implementing a Data Quality Monitoring Framework

Rule 1: For Rule 1, the expected threshold is set to “100”. If the schema validation has passed for the data file, then the Actual Threshold would also be derived as “100” in the ETL logic. RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_EQ_P_E”. That means if both thresholds are equal, set the rule status to pass “P”, and “E” represents that rule results are to be loaded into the table irrespective of rule status.

word image 66 Implementing a Data Quality Monitoring Framework

word image 67 Implementing a Data Quality Monitoring Framework

Rule 2: For Rule 2, the expected threshold is set to “100”. If the data file is not a duplicate file, i.e. if the same file name has not been processed earlier, then the Actual Threshold would also be derived as “100” in the ETL logic. RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_EQ_P_E”, so if both thresholds are equal, set the rule status to pass “P”. “E” means that rule results are to be loaded into the table irrespective of rule status.

word image 68 Implementing a Data Quality Monitoring Framework

word image 69 Implementing a Data Quality Monitoring Framework

Rule 3: For Rule 3, the expected threshold is set to “100”, and the InvoiceDate of each record from the actual results dataset is compared with the expected InvoiceDate range of 2007-01-11 to 2007-02-28. As each row is checked, any record that has an InvoiceDate outside the range is marked as failed with the “F” status code. Since the RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_NE_F_F”, any record that fails would be loaded into the Data Quality Rule Results table.

word image 70 Implementing a Data Quality Monitoring Framework

word image 71 Implementing a Data Quality Monitoring Framework

Please Note: Pass count and Fail count loaded as “0” as this is column level rule and not a table-level rule. It is possible to increment the fail count with each row being loaded to into Data Quality Rule results table, but as this will require sorting and ranking rows. It is preferable that pass_count and fail_count be used for table level rules where the aggregation is done and total pass and fail count can be easily identified.

Rule 4: For Rule 4, the expected threshold is set to “100” and from the actual results records for which InvoiceDate does not fall within the expected range of 2007-01-11 to 2007-02-28 are counted, and Actual Threshold percentage is derived at 40. If no records have failed, then the actual Threshold percentage would be 100. As this is a table-level rule, only one record for the entire batch load will be loaded in the Data Quality Rule Results table. RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_EQ_P_E”, so that means if both thresholds are equal set the rule status to pass “P” and “E” (Everything) represents that rule results to be loaded into the table irrespective of rule status.

word image 72 Implementing a Data Quality Monitoring Framework

word image 73 Implementing a Data Quality Monitoring Framework

Rule 5: Rule 5 is to check if the StoreID column is not null. The expected threshold is set to “90” and RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_GE_P_F”. In the data file, there are no records that have StoreID as null. In this case, the ETL would calculate the actual threshold as “100” which is greater than 90% (expected threshold), and the rule will pass “P”. Since the final character is “F” in COMP_THRE_VAL_GE_P_F, it means only failed rules should be loaded in the data quality results table. In this case, no results would be loaded for Rule5. The final rule results table would look like this.

select * from DATA_QUALITY_RULE_RESULTS

word image 74 Implementing a Data Quality Monitoring Framework

word image 75 Implementing a Data Quality Monitoring Framework

word image 76 Implementing a Data Quality Monitoring Framework

In the above example, I have walked through all the 5 data quality rules. Rules 1 and 2 are pre-stage rules, and they should be executed before data is loaded in-to stage table and the Rules 3, 4 and 5 are post-stage data quality rules and should be executed after the data is loaded to stage table.

A dashboard could be created on DATA_QUALITY_RULE_RESULTS table to provide data quality metrics to data stewards for review.

V. Conclusion

By implementing the above Data Quality Monitoring Data Framework, an organization can ensure that the highest quality data is loaded into underlying application tables. Detail metrics related to data quality could be made available to data stewards by building dashboard on top of DATA_QUALITY_RULE_RESULTS table.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

Test Automation and EasyRepro: 03 – Extending the EasyRepro Framework

November 6, 2019   Microsoft Dynamics CRM

EasyRepro is an open source framework built upon Selenium allowing automated UI tests to be performed on a specific Dynamics 365 organization. This article will focus on reviewing how EasyRepro works with the Document Object Model otherwise known as the DOM. This will help us understand how we can extend the EasyRepro code for use with custom objects or areas of the platform not included in the framework at this time. We will wrap with a brief look at XPath and referencing elements from the DOM.

Getting Started

If you haven’t already, check out the first article which covers getting familiar with EasyRepro and working with the source code locally. It covers cloning from GitHub to Azure DevOps then from Azure DevOps Git repository to a development machine. and reviews setting up dependencies and test settings to run a simple test. The second article goes into common actions performed by users when interacting with Dynamics 365 and the correlating EasyRepro commands. It wraps with covering common troubleshooting and debugging scenarios.

The EasyRepro framework contains an abundance of commands that should suit most of your needs. However you’ll find that at times either these commands will need to be modified per your customizations or you may need to create your own custom commands for reuse among tests. 

Some example scenarios for modifying or creating custom commands may include:

  1. Using an ADFS Redirect Login

  2. Working with custom components and resources

  3. Working with strongly typed entity attributes

  4. Including navigation and functionality not present natively in the framework

These scenarios require an understanding of how EasyRepro interacts with Selenium to simulate user actions in a browser. This article will focus on a typical use case for extension: the Login process. We will describe common Selenium functionality and how to reference elements on the Document Object Model in a way to increase maintainability.

Sample Scenarios for Extending

Working with the Browser Driver

As we discussed earlier, your organization’s login process may have a sign on page that a user will get redirected to once they input their username. This is where we begin our journey into extending the framework and utilizing Selenium.

Typical Single Sign On Page:

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework )

Custom method for a Single Sign On Page:

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

The image above shows a method designed to search through a sign in page DOM for input fields for credentials and submission. You will notice that we are working with the browser driver natively (args.Driver) and using methods not found in the sample EasyRepro unit tests. These include FindElement which is part of Selenium and WaitForPageToLoad which is part of EasyRepro’s extension of Selenium. I’ll touch on these at a high level now.

FindElement

FindElement is part of the Selenium.WebDriver assembly and is used to find an element in the DOM. I’ll cover how to search through the DOM in the next section Extending and Working with XPath but I wanted to take this time to show how we will begin extending unit tests using Selenium natively.

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

The  method returns an IWebElement object that represents the element you want to work with. For instance if you want to work with the username textbox you can use FindElement to locate and return the username textbox. Once returned the framework can interact with it and perform actions such as input text or click.

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

Reference:

https://www.toolsqa.com/selenium-webdriver/c-sharp/findelement-and-findelements-commands-in-c/

WaitForPageToLoad

WaitForPageToLoad is part of the EasyRepro framework as a Selenium extension. This is key to point out that we are leveraging both EasyRepro and Selenium natively to achieve our desired result to login. This method waits a specific amount of time and checks for the document ready state. The time interval used to check can be adjusted if needed.

SendKeys and Clear

SendKeys is used to send keystrokes to an element on the page. This can be a complete string such as an email or a single keypress. This can be used to send the username and password to your sign in page. It also can be used to send the Enter or Tab keypress to move to the next field or submit.

Clear as it sounds is used to remove any sort of input that may already exist in an element. This is useful if your sign in page attempts to automatically input credentials.

Both of these methods are part of the IWebElement shown above. Other useful properties of IWebElement include Text, Selected and GetAttribute.

Understanding how to extend element references

Considerations need to be made when designing unit tests to help reduce the maintenance work needed if something changes. For instance when referencing custom HTML web resources or even the ADFS Redirect from above, think how a change to an element in the DOM could propagate across some of all of your unit tests. One way to control maintenance is to centralize commonly used references into proxy objects that can hide the underlying mechanics from the test designer. This is exactly how the EasyRepro framework handles references and when extending can leverage the same approach. In this section we will cover the files used by the framework to reference DOM elements and show how to extend them to include references to our custom login page.

The App Element Reference File

The Microsoft.Dynamics365.UIAutomation.Api.UCI project uses the ElementReference.cs file as well as another file called AppElementReference.cs. The ElementReference file looks to have been brought over from the classic interface. What’s unique about each is how they reference elements in the DOM which I’ll cover in the next section. For now let’s focus on reviewing the AppElementReference.cs file which is located in the DTO folder of the Microsoft.Dynamics365.UIAutomation.Api.UCI project.

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

Inside of the AppElementReference file are two objects used by EasyRepro to represent and locate elements: The AppReference and AppElements classes.

The AppReference class

The AppReference class includes sub classes that represent the objects used by EasyRepro, specifically the WebClient object, to connect to DOM elements. This allows the framework to standardize how the search for a particular container or element is performed. Centralizing the reference to the DOM elements will allow the test designer to focus on writing commands against common objects and not specifying the precise location of an element in the DOM.

The AppElement class

The AppElement class is a comma delimited key value pair consisting of the reference object property as a key and the XPath command as the value. The key represents the property name in the class object inside of AppReference while the value is the XPath location of the element in the DOM.

I highly suggest reviewing the AppElement class when extending EasyRepro as it shows recommended ways to locate and reference elements on the DOM. In the next section we will discuss the different ways you can locate elements including XPath.

Referencing elements in the Document Object Model

References to objects generally fall into four patterns:

  1. Resolving via Control Name

  2. Resolving via XPath

  3. Resolving by Element ID

  4. Resolving by CSS Class

In this article we will focus on XPath which is what is primarily used by the EasyRepro framework for the Dynamics 365 platform. However its key to understand each of the approaches for referencing as they can be used for customizations such as web resources, custom controls, etc.

Resolve with Control Name

This will search the DOM for elements with a specific name which is an attribute on the element node. This is not used by EasyRepro to my knowledge. Elements can be found by their name by using By.TagName with the FindElement method.

Resolve with Element ID

This will attempt to find an element by its unique identifier. For instance a textbox on the login form maybe identified as ‘txtUserName’. Assuming this element ID is unique we could search for this particular element by an ID and return an IWebElement representation. An example from the Microsoft.Dynamics365.UIAutomation.Api.UCI project is shown below showing usage with the timeline control.

Definition:

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

Usage by WebClient:

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

Elements can be found by their ID by using By.Id with the FindElement method. 

Resolve with CSS Class

This allows the ability to search by the CSS class defined on the element. Be aware that this can return multiple elements due to the nature of CSS class. There is no usage of this in EasyRepro but again this could be helpful for customizations. Elements can be found by their CSS class name by using By.Class with the FindElement method.

Resolve with XPath

XPath allows us to work quickly and efficiently to search for a known element or path within the DOM. Key functions include the contains method which allow to search node attributes or values. For instance when you review the DOM of a form you’ll notice attributes such as data-id or static text housed within a span element. Coupling this attribute with the html tag can result in a surprisingly accurate way to locate an element. I suggest leveraging the current element class as well as this link from W3 Schools that goes into the schema of XPath.

Going back to an earlier section let’s review how XPath along with the AppElementReference.cs file can help standardize element location.

Using XPath in the ADFS Redirect Login Method

Going back to our original example for the ADFS login method below you’ll see an example of referencing the DOM elements using XPath directly with the Selenium objects driver and By.XPath. Consider the below two images:

Without a static representation of XPath:

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

Using static classes to represent XPath queries:

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

Both of these methods work and perform exactly the same. However the second method provides increased supportability if and when the login page goes through changes. For instance consider if the id of the textbox to input your username changes from txtUserName to txtLoginId. Also what if this txtUserName XPath query is propagated across hundreds or more unit tests?

Creating custom reference objects

Let’s put what we have learned to use by creating a reference to our custom login page. Start by adding a class to the AppReference object and title it AdfsLogin. Inside this class declare string properties that will be used as input for your organization’s login page. Typical inputs include username, password and a submit button. Here is an example:

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

NOTE: While this document demonstrates how to add to the AppElementReference.cs file I would suggest extending this outside of the core files as customizations will have to be merged with any updates from the framework.

Once the static class we want to use in our unit tests has been created we now need to add the XPath references to the AppElement class. Below is an image showing the key value pair discussed in the AppElement section above. The key correlates to the string value of the AdfsLogin class while the value is the XPath directive for our elements.

 Test Automation and EasyRepro: 03   Extending the EasyRepro Framework

As shown in the image for Login_UserId we are searching the DOM for an input element with the id attribute of ‘txtUserName’. XPath can be used to search for any attribute on the DOM element and can return a single value or multiple depending on what you’re searching for.

Next Steps

Custom Components and Web Resources

PowerApps Control Framework controls and web resources are customizations that represent great options to learn and extend the EasyRepro framework. Try locating the container for the PCF controls and HTML web resources and work towards creating custom objects representing the elements for each as described above.

Conclusion

In this article we discussed reasons why we will may need to extend the EasyRepro framework and some techniques in doing so. We explored working with Selenium objects and creating references to help us when creating unit tests. Finally we put this in an example for working with a ADFS Redirect page on login.

Thank you again for reading! Please let me know in the comments how your journey is going!

Previous Posts:

Test Automation and EasyRepro: 01 – Overview and Getting Started

Test Automation and EasyRepro: 02 – Designing and Debugging Unit Tests

Let’s block ads! (Why?)

Dynamics 365 Customer Engagement in the Field

Read More

Apple researchers develop AI framework that leverages users’ intents to automatically label data

September 23, 2019   Big Data
 Apple researchers develop AI framework that leverages users’ intents to automatically label data

Personal assistants like Apple’s Siri accomplish tasks through natural language commands. However, their underlying components often rely on supervised machine learning algorithms requiring large amounts of hand-annotated training data. In an attempt to reduce the time and effort taken to collect this data, researchers at Apple developed a framework that leverages user engagement signals to automatically create data-augmenting labels. They report that when incorporated using strategies like multi-task learning and validation with an external knowledge base, the annotated data significantly improve accuracy in a production deep learning system.

“We believe this is the first use of user engagement signals to help generate training data for a sequence labeling task on a large scale, and can be applied in practical settings to speed up new feature deployment when little human-annotated data is available,” wrote the researchers in a preprint paper. “Moreover … user engagement signals can help us to identify where the digital assistant needs improvement by learning from its own mistakes.”

The researchers used a range of heuristics to identify behaviors indicating either positive or negative engagement. A few included tapping on content to engage with it further (a positive response), listening to a song for a long duration (another positive response), or interrupting content provided by an intelligent assistant and manually selecting different content (a negative response). Those signals were selectively harvested in a “privacy-preserving manner” to automatically produce ground truth annotations, and they were subsequently combined with coarse-grained labels provided by human annotators.

In order to incorporate the coarse-grained labels and the inferred fine-grained labels into an AI model, the paper’s coauthors devised a multi-task learning framework that treats coarse-grained and fine-grained entity labeling as two tasks. Additionally, they incorporated an external knowledge base validator consisting of entities and their relations. Given the prediction “something” as a music title and “the Beatles” as a music artist for the query “Play something by the Beatles,” the validator would perform a lookup for the top label alternatives and send them to a component that’d re-rank the predictions and return the best alternative.

The researchers conducted two separate test sets to evaluate the tasks performed by the multi-task model, which they compiled by randomly sampling from the production system and hand-annotating with ground truth labels. They say that across 21 model runs, adding 260,000 training examples “consistently” reduced the coarse-grained entity error rate on a prediction task compared with the baseline for all amounts of human-annotated data. Moreover, they report that adding weakly supervised fine-grained data had a larger impact when there was a relatively small amount of human-annotated data (5,000 examples). Lastly, they report that for examples where any of the top model hypotheses passed the knowledge base validator, the fine-grained entity error rate dropped by around 50%.

In another experiment, the team sought to determine whether more granular representations of the user’s intent would increase the likelihood of the system selecting the correct action. They sampled roughly 5,000 “play music” commands containing references to multiple bands, artists, and songs and sent them through a system incorporating their framework, after which they asked annotators to grade the response returned by the system as “satisfactory” or “unsatisfactory.” The results produced by the enhanced system achieved a relative task error rate reduction of 24.64%, the researchers report.

They leave to future work exploring using individual users’ engagement behaviors to improve personalization.

“We observe that our model improves user-facing results especially for requests that contain difficult or unusual language patterns,” wrote the coauthors. “For example, the enhanced system correctly handles queries such as ‘Can you play Malibu from Miley Cyrus new album’ and ‘Play Humble through my music Kendrick Lamar.’ Also, the enhanced model identifies entities that users are more likely to refer to in cases of genuine linguistic ambiguity. For example, in ‘Play one by Metallica,’ ‘one’ could either be a non-entity token (meaning play any song by Metallica), or it refer specifically to the song called ‘One’ by ‘Metallica.’ Since most users listen to the song ‘One’ by the ‘Metallica’ whenever they say ‘Play one by Metallica,’ our model trained on engagement-annotated data will learn to predict ‘one’ as [the music title], thus better capturing trends and preferences in our user population.”

The work comes on the heels of a paper describing Apple’s Overton, an AI development tool whose models have processed ‘billions’ of queries. Separately, the Cupertino company recently studied whether users preferred conversing with “chattier” AI assistants.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Researchers craft AI framework that explains your social media feeds

August 10, 2019   Big Data
 Researchers craft AI framework that explains your social media feeds

Feeds are the cornerstones on which modern music recommenders, news aggregators, and social media platforms are built. If you’re like most people, you spend minutes to hours each day scrolling through songs, clips, articles, questions, public service announcements, and advertisements informed by your interests and preferences. But wouldn’t it be great if the algorithms underpinning feeds (and their recommendations) were a little more transparent?

Researchers at the Max Planck Institute for Informatics thought so, which is why they investigated a framework — Framework for Activity-Item Relationship Discovery, or FAIRY — that systematically discovers, ranks, and explains the connection between users’ actions and what appears in their social media feeds. It’s described in a paper (“FAIRY: A Framework for Understanding Relationships between Users’ Actions and their Social Feeds“) published on the preprint server Arxiv.org.

“Transparency and explainability are key concerns in the modern world of cognitive overload, filter bubbles, user tracking, and privacy risks,” wrote the paper’s coauthors. “Here, a feed results from an intricate combination of one’s interests, friendship network, her actions on the platform, and external trends … Over time, a user accumulates several thousands of actions that together constitute [their] profile (posts, upvotes, likes, comments, etc.), making it impossible for the user to remember all these details.”

FAIRY attempts to solve this dilemma by creating user-specific interaction graphs using information visible to users. It learns models for predicting relevance and surprisal from real-world social media platform data, and then it leverages learning-to-rank techniques to uncover and rank relationships derived from the graphs.

The features that guide FAIRY are grouped into five sets — user, category, item, path instance, and path pattern — where “path” refers to explanation paths. Users’ influence (e.g., the ratio of followers to followees) is measured as a complement to their activity, as is their engagement with various feed items.

The researchers note that, due to the sheer volume of feed interactions performed by a typical person, graphed entity relationships span from thousands to millions. The aforementioned learn-to-rank approach made them easier to parse by presenting only the top few connections that are either relevant (generally useful as a satisfactory explanation) or surprising (i.e., forgotten or unknown relationships) to users.

In studies, the scientists tasked 20 volunteers to interact with two platforms — Quora and Last.fm — using fresh accounts with five followers each. They spent 20 hours total on at least one of the two services over the course of several sessions, performing a minimum of 12 activities while identifying non-obvious items after scrolling through their entire feed. After each session, the team updated the interaction graphs and selected three non-obvious recommendations per user before mining explanation paths for the feed items.

In a series of tests, the researchers found that FAIRY outperformed three relationship mining baselines on the task of predicting what users considered relevant and surprising explanations. They attribute its success to a “powerful” information network representation of the user’s sphere of influence and the learning-to-rank approach, and they say that the work represents the first step toward a goal of improving transparency with respect to social media feeds.

“[I]dentifying explanatory relationships between the users’ online behavior … and the feed items they receive is useful for at least three reasons: (i) they can convince the user of their relevance … (ii) they can point the user toward … a course of action to avoid seeing more of certain kinds of items, and (iii) [they could serve as] a proxy that the users could find plausible,” wrote the coauthors. “For example, if Alice sees a post on making bombs in her feed when she herself is unaware of any explicit connection to such, she might be highly curious as to what she might have done to create such an association.”

In the future, the team plans to implement FAIRY as a browser plugin and to investigate further the effects of users’ activities across multiple platforms.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Researchers propose AI framework for activity-detecting wearables

March 26, 2019   Big Data
 Researchers propose AI framework for activity detecting wearables

Activity-detecting wearables aren’t exactly novel — the Apple Watch, Fitbit’s lineup of fitness wearables, and countless smartwatches running Google’s WearOS interpret movements to determine whether you’re, say, jogging rather than walking. But many of the algorithmic models underlying their features need lots of human-generated training data, and typically they can’t make use of that data if it isn’t labeled by hand.

Fortunately, researchers at the University of Massachusetts Amherst have developed a labor-saving solution they say could save valuable time. In a paper published on the preprint server Arxiv.org (“Few-Shot Learning-Based Human Activity Recognition“), they describe a few-shot learning technique — a technique to teach an AI model with a small amount of labeled training data by transferring knowledge from relevant tasks — optimized for wearable sensor-based activity recognition.

“Due to the high costs to obtain … activity data and the ubiquitous similarities between activity modes, it can be more efficient to borrow information from existing activity recognition models than to collect more data to train a new model from scratch when only a few data are available for model training,” the paper’s authors wrote. “The proposed few-shot human activity recognition method leverages a deep learning model for feature extraction and classification while knowledge transfer is performed in the manner of model parameter transfer.”

Concretely, the team devised a framework — few-shot human activity recognition (FSHAR) — comprising three steps. First, a deep learning model — specifically a long-short term memory (LSTM) network, a type of recurrent neural network that can capture long-term dependencies — that transforms low-level sensor input into high-level semantic information is trained with samples. Next, data that’s relevant or helpful to learning the target task (or tasks) is mathematically discerned and separated from that which isn’t relevant. Lastly, the parameters for the network — i.e., the variables machine-learned from historical training data — are fine-tuned before they’re transferred to a target network.

To validate their approach, the researchers performed experiments with 331 samples from two benchmark data sets: opportunity activity recognition data set (OPP), which consists of common kitchen activities from four participants with wearable sensors recorded over five different runs, and physical activity monitoring data set (PAMAP2), which comprises 12 household and exercise activities from nine participants with wearables.

Compared with the baseline, they claim that FSHAR methods “almost always” achieved the best performances.

“With the proposed framework, satisfying human activity recognition results can be achieved even when only very few training samples are available for each class,” they wrote. “Experimental results show the advantages of the framework over methods with no knowledge transfer or that only transfer knowledge of feature extractor.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Google lays out framework for autonomous errand-running robots

March 1, 2019   Big Data

Robots don’t plan ahead as well as humans, but they’re becoming better at it. That’s the gist of a trio of academic papers Google’s robotics research division highlighted in a blog post this afternoon. Taken together, the authors say, they lay the groundwork for robots capable of navigating long distances by themselves.

“In the United States alone, there are three million people with a mobility impairment that prevents them from ever leaving their homes,” senior research scientist Aleksandra Faust and senior robotics software engineer Anthony Francis wrote. “[Machines could] improve the independence of people with limited mobility, for example, by bringing them groceries, medicine, and packages.”

How? In part by using reinforcement learning (RL), an AI training technique that employs rewards to drive agents toward goals. Faust, Francis, and colleagues combined RL with long-range planning to produce planner agents that can traverse short distances (up to 15 meters) safely, without colliding into moving obstacles. They tapped AutoRL, a tool that automates the search for RL rewards and neural network architectures, to train those agents in a simulated environment. They next used the trained agents to build roadmaps, or graphs comprising nodes (locations) and edges that connect to the nodes only if said agents can traverse between them reliably.

 Google lays out framework for autonomous errand running robots

It’s easier said than done; as the researchers point out, training agents with traditional RL approaches poses lots of challenges. It requires spending time iterating and hand-tuning rewards and making poorly informed decisions about AI architectures, not to mention mitigating “catastrophic forgetting,” a phenomenon in which AI systems abruptly forget previously learned information upon learning new information.

AutoRL attempts to solve for this in two phases: reward search and neural network architecture search. During the first stage, it trains agents concurrently over several generations, each with slightly different reward functions. At the end of the phase, the reward that leads the agent to its destination most often is selected. The neural network architecture search phase is a repetition of the first phase, essentially, but using the selected reward to tune the network and optimizing for the cumulative reward.

 Google lays out framework for autonomous errand running robots

Above: Automating reinforcement learning with reward and neural network architecture search.

Image Credit: Google

The process isn’t particularly efficient — AutoRL training over ten generations of 100 agents requires five billion samples, or 32 years’ worth of training. But importantly, it’s automated. The models don’t experience catastrophic forgetting, and the resulting policies are “higher quality” compared to prior art (up to 26 percent better in navigation tasks). They’re even robust enough to guide robots through unstructured environments — i.e., environments they’ve never seen before.

The policies AutoRL produces are great for local navigation, but what about long-range navigation? That’s where probabilistic roadmaps come in. They’re a subcategory of sampling-based planners (which approximate robot motions) that sample robot poses and connect them with “feasible transitions,” creating roadmaps tuned to the unique abilities and geometry of a robot. Combined with hand-tuned RL-based, AutoRL-tuned local planners, they can be used to train robots once locally and subsequently adapted to different environments.

“First, for each robot, we train a local planner policy in a generic simulated training environment,” Faust and Francis explained. “Next, we build a PRM with respect to that policy, called a PRM-RL, over a floor plan for the deployment environment. The same floor plan can be used for any robot we wish to deploy in the building in a one time per robot+environment setup.”

The newest iteration of PRM-RL takes things a step further by replacing the hand-tuned models with AutoRL-trained local planners, which improves long-range navigation. Additionally, it adds simultaneous localization and mapping (SLAM) maps as a source for building the aforementioned roadmaps.

To evaluate PRM-RL, researchers at Google built a roadmap using floor maps of offices up to 200 times larger than the training environments, and accepted edges with at least 90 percent success over 20 trials. Compared to other methods over distances of 100 meters, PRM-RL had 2 to 3 times the rate of success over baseline. And in real-world tests with multiple robots and real building sites, the machines were “very robust” — except near cluttered areas off the edge of the map.

“We can achieve this by development of easy-to-adapt robotic autonomy, including methods that can be deployed in new environments using information that is already available,” Faust and Francis wrote. “This is done by automating the learning of basic, short-range navigation behaviors with AutoRL and using these learned policies in conjunction with SLAM maps to build roadmaps … The result is a policy that once trained can be used across different environments and can produce a roadmap custom-tailored to the particular robot.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Bringing Machine Learning From Idea To Practice: A Framework

October 7, 2018   BI News and Info

When I first learned to swim as a child, I was forced to face my fears and plunge into the water, back first. Though it was scary, it proved to be a very effective way to learn—back strokes are now one of the things I enjoy most about swimming. An age-old Bateke wisdom ratifies this way of learning:

You learn how to cut down the trees by cutting them down.

In the present day, I see organizations that are overwhelmed with machine learning (and other new-generation tech) mostly due to ubiquitous content alluding to the prerequisites of skill set, infrastructure, data, and so on, to the extent that ML’s value to business looks uncertain. Their experience reminds me of my initial fear in that first swimming lesson, but one recommendation stands out above the rest:

Stop viewing ML from the edge. Jump in the pool and practice.

As an experienced practitioner of ML, I’d like to share my insights and a framework that could enable companies to start reducing ML to practice and derive valuable outcomes. But before I jump in, I’d like to set the context with an example of a valuable learning moment.

Discovery

As autumn settles in on the city of Chicago, I have started taking regular evening runs. On one of these occasions, I snapped this picture of the Chicago skyline before taking a break to binge some online content.

1 1024x625 Bringing Machine Learning From Idea To Practice: A Framework

I came across a Medium article detailing a package called ImageAI, developed by Moses Olafenwa. This Python library is built for empowering developers to build systems with self-contained deep learning and computer vision capabilities using simple and few lines of code. My interest piqued, I hit “save” to revisit the article later and sprinted back home in the twilight. 

Consumption

I decided to try out this method myself. I set up the dependencies, such as Tensorflow, OpenCV, and Keras, and started reducing the code to practice using the photo from my evening run. As I hit ‘run’, the output unfolded and I saw accurate bounding boxes detecting objects—a person, bicycle, and boat.

1new 1024x625 Bringing Machine Learning From Idea To Practice: A Framework

Ideation

This activity of reducing the ready-to-use package for practice triggered scenarios that could be useful for the industrial machinery and components manufacturers that I work with. This would allow them to optimize or extend their business processes to address new challenges and opportunities. Here are some of those cases:

  • Random bin-picking tasks: Instead of manual programs, robots could learn to detect objects that need to be picked up through deep learning. Here is an example of how Fanuc, the world’s largest industrial robot maker, is using deep reinforcement learning for automating tasks such as bin-picking.

  • Automated inventory management: Manual inventory counts is a tedious and cost-intensive process, replete with counting errors. Deep learning is improving the game here. I tested ImageAI on another image from my runs that featured a lot of people. Here is what I observed:
imagenew 1024x568 Bringing Machine Learning From Idea To Practice: A Framework

Replace people with objects and there you have an inventory counting application. Multiple papers, like this one, are being written about testing applications of computer vision in this area.

  • Visual inspection for quality control: Deep learning-based visual inspections are turning out to be much more accurate than human inspections in cases involving product inspections not suitable for the naked eye. For example, chip manufacturing or complex assemblies built through additive manufacturing, or 3D printing, is ripe for computer vision applications. Landing.ai, a company started by the great Andrew NG, has developed a framework which requires very small training dataset to get started.

What is the framework to bring AI/ML to a value-oriented practice?

Machine learning/AI has reached a level where companies are genuinely exploring processes to be augmented or transformed through ML. However, not every organization has the ML/AI skillset of large technology providers and are therefore overwhelmed. What should these organizations do?

1. First and foremost, understand the value chain of machine learning and your role

Know your position in the value chain of ML/AI. There are chipmakers (Nvidia, Graph core), algorithm makers (Baidu, Google), platform and infrastructure providers (AWS, GCP, SCP, Azure), enterprise solution providers (SAP, MS), industry solution providers, and corporate consumers (GSK, GE, Wal-Mart etc.). As you can imagine, your role would depend on your position in the value chain; therefore, you need to approach ML/AI opportunities differently. For example, an employee in Google, which is both a producer and a consumer of AI, could either be creating newer algorithms or testing the existing ones for a new application, but an employee in another corporation would focus more on value-driven consumption by utilizing what the upstream players deliver.

As an employee, determine your role in the value chain of ML and set your goals accordingly.

2. Discover, consume, ideate to value

Given the time pressure to bring new ideas to market, there is hardly any sense in getting caught in the endless loops of learning these complex technologies. Ayasdi, an AI platform company, points to the huge opportunity in the consumption of the existing AI packages and approaches. The need is to shift to consuming existing packages on Github, ML APIs offered by companies like MS, Google, SAP, and AWS, and to test scenarios that could bring significant benefits.

The whole point of Amazon’s Deep Lens and Google’s Cloud AutoML is to let developers find near endless applications of video, image, and text analysis based on deep learning. The discovery, consumption, ideation chain that we saw with the object detection scenario above could very well be applied to your business, in turn reducing ML to practice. This could potentially be the de facto job description of teams tasked with ML and AI tasks in such companies.

3. Think in terms of scale and order of change

In his book “Enlightenment Now,” Steven Pinker refers to scale and order of change as important criteria for any impactful policy decisions. The criteria for applications of ML and AI scenarios should be no different.

Instead of building from scratch with no scale and value in mind, you must be scratching to build for scale and value.

Post discover-consume-ideate phase, the short-listed scenarios could be tested in terms of time and dollars spent per training or inference task versus the value that is generated for the end customer. In simple economic terms, both supply chain and consumer surplus need to be maximized.

I have come across companies that do just this – they have scenarios in mind, they discover packages/papers/APIs/Github for applications, they apply in their context, and they look for ways to improve upon the performance, only to get further inspired with more scenarios.

In other words, they take the plunge.

Read more about the value chain of AI here.

Let’s block ads! (Why?)

Digitalist Magazine

Read More

The SuiteCloud Development Framework, a Developer's Nirvana

October 31, 2017   NetSuite
websitelogo The SuiteCloud Development Framework, a Developer's Nirvana

Posted by Gerson Rodriguez, Manager, Product Management – Platform

As developers, one of our chief challenges is releasing new, amazing functionality on time and at the pace demanded of our businesses’ increasingly accelerated development cycles. Our desire to meet these expectations has made our roles among the most stressful jobs in technology – with a CareerCast poll listing the positions of software engineers and computer programmers among the top 10 highest stress roles in the industry.

Does it have to be this way? I don’t think so. Combining our passion for development with a set of tools that eases some of hardest (or simply most tedious) parts of our roles can have us feeling as if we’re sitting cross-legged, floating on a cloud, typing away on our laptops in a state of perpetual Zen. That vision can be achieved when we float our efforts on the SuiteCloud Development Framework.

To ease the process associated with building and deploying new functionality in the cloud, last year, NetSuite introduced the SuiteCloud Development Framework (SDF), a one-stop shop for developers to build, test, deploy and publish applications without ever leaving the code. Developers have the source code of everything in the platform – including object, configuration and logic which allows them to take advantage of any revision control system they choose for version control and team development. And because it’s a decoupled environment, developers can build on SDF and deploy across any environment, complete with built-in validation controls and dependency management.

In turn, partners and customers alike can automate business processes via SuiteScript, build and package customizations via the UI, build and distribute SuiteApps, perform configuration tasks for implementing NetSuite environments, perform audit tasks or manage changes in NetSuite environments.

Since its launch, we’ve made some major improvements to make this platform a “developer’s heaven,” as I like to call it. Here’s some of what I think will be the most helpful improvements.

Direct deployment. Developers can directly deploy applications to development, sandbox or production accounts, greatly simplifying development and deployment practices for ISVs and customers. For instance, customers can deploy customizations en masse from the sandbox to the QA sandbox to obtain QA approval, and then use the exact same SDF project to deploy the same project code and objects to production.

Version control. Customers can check an entire project including custom objects like custom records, custom fields, and saved searches into a Code Repository to be under version control. This eases change management.

Collaborative cloud development. Developers can now collaborate in a separate dedicated development environment that makes it easier to manage customizations and application code, providing more familiarity, more control and more autonomy than ever. Development teams can also easily manage cloud source code with the same process and rigor that they use in other projects, increasing quality, compliance and control.

The ability to handle dependencies. Say, for instance, you install a bundle that referenced a field that already existed in your production instance and ended up with a new redundant custom field. With SuiteCloud Development Framework, you can reference an existing field or object without having it be duplicated.

Same if your project contains an object that requires the existence of another object, SDF can detected before the project or customization is deployed which bring a great degree of certainty to the deployment process.

With the right foundation, software development can drive business value, without driving the joy out of our work. Check out more information on the SuiteCloud Developer Network here and consider joining our LinkedIn group to connect with other NetSuite developers.

Posted on Mon, October 30, 2017
by NetSuite filed under

Let’s block ads! (Why?)

The NetSuite Blog

Read More
« Older posts
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited