• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: quality

Quality Match raises $6 million to build better AI datasets

March 2, 2021   Big Data
 Quality Match raises $6 million to build better AI datasets

Data: Meet ad creative

From TikTok to Instagram, Facebook to YouTube, and more, learn how data is key to ensuring ad creative will actually perform on every platform.

Register Now


Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Quality Match, a Heidelberg, Germany-based quality data annotation provider, today announced that it raised a €5 million ($ 6 million) seed round from LEA Partners. The company says it’ll use the proceeds to expand its team and accelerate product development.

Training AI and machine learning algorithms requires plenty of annotated data. But data rarely comes with annotations. The bulk of the work often falls to human labelers, whose efforts tend to be expensive, imperfect, and slow. It’s estimated most enterprises that adopt machine learning spend over 80% of their time on data labeling and management. In fact, in a recent survey conducted by startup CloudFlower, data scientists said that they spend 60% of the time just organizing and cleaning data compared with 4% on refining algorithms.

Quality Match, which was bootstrapped in 2019 by a team of former Pallas Ludens, Apple, Google, Microsoft engineers, aims to improve the speed and quality of data labeling processes by disambiguating the potential sources of error. The platform explains the sources of errors in datasets, highlighting where edge cases originate and providing strategies on how to improve the data.

There’s no shortage of data labeling startups competing with Quality Match — the market was valued at $ 1.3 billion in 2020, according to Grand View Research. For instance, Scale AI has raised over $ 100 million for its suite of data annotation services. There’s also CloudFactory, which says it offers labelers growth opportunities and “metric-driven” bonuses. Hive, Alegion, Appen, SuperAnnotate, Dataloop, Cognizant, and Labelbox are other rivals of note.

But Quality Match uniquely begins building or enhancing datasets by optimizing the representativeness and diversity of the samples, ensuring they’re representative of the real world and contain difficult edge cases sprinkled throughout. Then, the platform exposes and quantifies ambiguity in the datasets before breaking the taxonomies into small, intuitive questions that form a fully automated decision tree. Quality Match runs multiple repeats of this decision tree to provide confidence scores on all of the annotations.

Moreover, Quality Match provides metrics including geometric, label, and definition accuracy that are intended to inform about wrong tags or attributes as well as missed or spurious detections of annotations. It also shows how factors like taxonomy version changes over time and varying criteria for quality scoring might be contributing to imbalances in the datasets.

“During the pandemic, our industrial customers, in particular, have increasingly realized that they will have to rely more on high-tech solutions in the future because, in these times, large groups of people can no longer work together in one room,” said cofounder and managing director Daniel Kondermann, who told VentureBeat that the goal this year is to reach €1 million in revenues. “To be successful, companies must adapt, which leads to an increasing demand for automation and therefore AI across a wide range of industries. Quality Match also started entering the market of medical technology. An industry that got even stronger due to the pandemic and therefore continued to develop new and improve existing AIs. All these industries are asking for our datasets and profiting from our work which is why so far, we have managed this pandemic very well.”

Twenty-employee Quality Match, which counts among its customers Mapillary, Bosch, and other companies engaged in health, 3D maps, autonomous driving, AR/VR, retail, and construction, received the whole of its latest funding from LEA Partners. Kondermann says that the immediate focus will be on hiring talent.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

November 9, 2020   Microsoft Dynamics CRM

xTrack completeness of Dynamics 365 CRM records with User Adoption Monitor 625x357.jpg.pagespeed.ic.51pHFlbEKe Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

In our previous post, you were briefly introduced to the latest three new features of User Adoption Monitor – one of our popular Dynamics 365 CRM user actions tracking app. Being one of the Preferred Apps on Microsoft AppSource, User Adoption Monitor has always helped organizations around the world to enhance its Dynamics 365 CRM/Power Apps user productivity by monitoring user actions and providing credible analysis of their performance.

In this post, we will see how User Adoption Monitor can further improve the overall user productivity throughout the organization with its latest feature – Data Completeness.

Data Completeness feature will help you to track and ensure the completeness of any entity record in Dynamics 365 CRM/Power Apps. With this feature, you can choose the field(s) in any entity record that are essential for an organization to be captured and ensure that data is within CRM. If it is not captured, then the status of that record will be shown as ‘Incomplete’.

Now, let’s see how this is done.

Consider a scenario where you have discovered that most of the records in Contact entity does not have the necessary contact details of customers such as Email id, Mobile no. or Company name. Since the details are not readily available, you often spend your time searching for the right contact information. And in business world where time is of essence, every minute counts as it determines whether you win or lose a potential deal or customer.

In order to overcome this situation, you can make use of the ‘Data Completeness’ feature of User Adoption Monitor. For this, you have to go to Entity Configuration in User Adoption Monitor, configure a new data completeness entity for ‘Contacts’, choose the mandatory fields for which data needs to be captured and save it.

xTrack completeness of Dynamics 365 CRM records with User Adoption Monitor 1 625x355.png.pagespeed.ic.wcb QfOBk5 Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

Now, whenever a new record is created in Contact entity, this feature will ensure that all the mandatory field selected by you is duly filled. If no data is entered in those fields then the status of that record will be displayed as ‘Incomplete’.

xTrack completeness of Dynamics 365 CRM records with User Adoption Monitor 2 625x216.png.pagespeed.ic.POPJGJwdM  Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

Once the data is duly filled in those fields the status of the record will be displayed as ‘Complete’.

xTrack completeness of Dynamics 365 CRM records with User Adoption Monitor 3 625x199.png.pagespeed.ic.Mz9A5gnCE3 Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

In this way, you can ensure that the user never skips information while entering data in the CRM and all the newly created records in Contact entity has the essential information as required.

But what about the already existing records in Contact entity?

No worries! User Adoption Monitor has made provision for this scenario also.

In order to check the completeness of the history records you have to just create a separate on-demand workflow for ‘Contact’ entity.

xTrack completeness of Dynamics 365 CRM records with User Adoption Monitor 4 625x332.png.pagespeed.ic.HCmzmm53jN Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

After the workflow is activated, open any existing record in ‘Contact’ entity, go to flow and click on the newly created workflow.

xTrack completeness of Dynamics 365 CRM records with User Adoption Monitor 5 625x239.png.pagespeed.ic.cOARQaXLXX Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

Once the workflow has run, the status of the record will be displayed as per the completeness of the record.

xTrack completeness of Dynamics 365 CRM records with User Adoption Monitor 6 625x237.png.pagespeed.ic.pAa5nLArxE Data Completeness a must for Data Quality – Track completeness of Dynamics 365 CRM records with User Adoption Monitor!

Quite handy, isn’t it?

With this feature in hand, you can rest easy and will no longer have to invest precious time in search of essential information. This will further enhance your productivity and overall output in terms of meeting business goals.

Now, all you have to do is go to our website or Microsoft AppSource and download User Adoption Monitor for a trial period of 15 days and explore this latest ‘Data Completeness’ feature for yourself.

While we have been talking about data quality and completeness, there are chances of errors while entering data and we have another cool app to ensure all is in order – Click2Undo (Preferred App on AppSource) – 1 Click to Undo changes in Dynamics 365 CRM Records in Bulk – Restore History Changes & Deleted Records as well!

Feel free to contact us for personal demo at crm@inogic.com or more information related to User Adoption Monitor or any of our Preferred Dynamics 365 / PowerApps apps.

Until then – Be Safe, Be Healthy!

Let’s block ads! (Why?)

CRM Software Blog | Dynamics 365

Read More

To Improve Data Quality, Stop Playing the Data Telephone Game

October 3, 2020   TIBCO Spotfire
TIBCO DataVirtualization scaled e1601656951841 696x365 To Improve Data Quality, Stop Playing the Data Telephone Game

Reading Time: 3 minutes

Do you remember when you used to play the telephone game with other children? You know, the game where the first person in a chain whispers a phrase to the second, then the second repeats it to the third, and on until the last person repeats it back to the first. 

It was such a joy to laugh about how what began as “The sun is in the sky” somehow transformed into “It is fun to eat pie” as the phrase passed from one friend to the next.  

Further laughs quickly followed when each person in the chain shared their whispered phrase with all, allowing everyone to see what went wrong and where. 

The Data Telephone Game

Interestingly, over the past thirty years, data management has adopted the same telephone game formula, copying data from one database to the next, with many stops on the journey. 

Take the classic enterprise data warehouse process as an example:

  • The data starts as transaction records stored in a transaction system’s database
  • Next it moves from the source system to a staging database
  • From staging it moves to a data warehouse 
  • With subsets of that data further advancing for storage within satellite data marts
  • Many of which soon feed individual Excel files resident on laptops

Or the more recent cloud data lake paradigm:

  • Source data from devices are consolidated on edge databases 
  • This edge data is then copied into a cloud data lake for further analysis
  • Additional data from transaction systems might also be added to the lake
  • And to inject historical context, warehouse data might also be copied into the lake

Conceptually, these data management best-practices provide the opportunity to improve data quality by applying selected value-add transformations at various steps. But with so many rigid links in the chain, this data version of the telephone game can often inadvertently turn “sky” into “pie.”  The business impact of this quality problem produces anything but childhood chuckles. 

How Big is this Repeated Copying of Data Problem?

Just how much data is getting copied?  In its Worldwide Global DataSphere Forecast 2019-2023, IDC estimates that for every terabyte of net-new data, over six additional terabytes of copied data is generated via replication and distribution. That is a lot of opportunities for “sky” to become “pie.”

Three Ways to Stop Playing the Telephone Game 

IDC’s numbers, when combined with everyone’s telephone game experience, suggest trying a different approach to improve data quality.  Here are three common-sense things organizations might consider.

  1. Copy Less, Virtualize More – Data Virtualization is a proven method for integrating data without physically copying it. This will substantially reduce the transformation errors and entropy inherent in typical multiple-copy, data warehouse, and data lake deployments. Beyond fewer copies, data virtualization directly improves data quality via metadata-driven syntactic and semantic transformations and enrichments that standardize datasets and encourage reuse. Everyone is on the same page. And when things change, as they inevitably do, it’s a lot easier to modify centrally managed metadata definitions than it is to modify multiple distributed ETLs and database schemas. 
  2. Share Reference Data Everywhere –  Reference Data Management improves data quality by enabling organizations to consistently manage standard classifications and hierarchies across systems and business lines. This lets them achieve needed consistency and compliance without extra copies.  And by adding data virtualization as the distribution method, organizations can easily share and reuse reference data held in one virtual location. 
  3. Think Data Domain, Not Database Technology – Today, there are lots of cool, fit-for-purpose database technologies. But “new and exciting” doesn’t necessarily translate into “high business value.” Instead, think about the most valuable data domains. For example, if customer excellence is your competitive advantage, then focus on improving quality within the customer data domain. Master Data Management is the key to success in this case, allowing organizations to ensure data integrity within selected data domains such as customer, employee, product, and more. 

Data Virtualization is a proven method for integrating data without physically copying it. This will substantially reduce the transformation errors and entropy inherent in typical multiple-copy, data warehouse, and data lake deployments. Click To Tweet

Stop Playing the Data Telephone Game

Let’s leave the telephone game to the kids. Instead, improve your data quality by executing the three common-sense recommendations above with TIBCO Unify. To learn more, talk to TIBCO and our partners.

Let’s block ads! (Why?)

The TIBCO Blog

Read More

Implementing a Data Quality Monitoring Framework

May 9, 2020   BI News and Info

I. Introduction

Data Quality is imperative to the success of any Business Intelligence solution. In today’s era, where the data is being ingested from multiple sources, the quality of the data might differ between sources. The first step to a successful reporting application implementation is to ensure that only the highest quality data makes it to the reporting tables. In order to achieve this, a Data Quality Monitoring framework should be implemented.

Some of the characteristics of the Data Quality Framework are as follows:

  • The Data Quality Monitoring framework should have a predefined set of rules specific to a Data Domain.
  • The Data Quality Monitoring framework should be scalable, i.e., it should support the addition of new rules and should be able to support data quality monitoring across multiple data domains.
  • The Data Quality Monitoring framework should have the ability to enable and disable rules to be executed against the data.
  • All the results from Data Quality monitoring should be stored in a rules results repository.

II. Data Quality Monitoring Framework Process Flow

The process starts with the source data file(s) landing on the SQL Server or any ETL Server. Once files are detected, the Pre-Stage Data Quality rules are executed. Data Stewards receive a notification once Pre-Stage rules are executed, and results are available for data stewards for review. The processing stops if any of the Pre-Stage “Gating” Data Quality rules have failed. The process continues to load data into Stage table only if none of the Pre-Stage Gating rules has failed or if the Data Steward has chosen to override the failure. The process then loads data into Stage Table. After this, the post-stage Data Quality Rules are executed, and Data Stewards are notified when the results are available for review. If there is NO Gating rules failure, then the process automatically publishes a validated data file for the downstream systems to use. If any of the post-stage Gating rules have failed, then the Data Steward could decide to either abandon the cycle and request a new file from source or override the failure in order to publish data files for downstream systems.

word image 61 Implementing a Data Quality Monitoring Framework

Figure 1: Process Flow Diagram for Data Quality Monitoring Framework. Data

The following section of the article will concentrate on the Implementation of Data Quality Rules (Pre-Stage & Post Stage) using Database tables.

III. Data Quality Monitoring Data Mart:

To implement Data Quality Monitoring Framework, a Data Quality Data Mart is needed.

A Data Quality Data Mart would have tables to provide the following capabilities.

  • A table to store all predefined Data Quality rules. (DATA_QUALITY_RULE table)
  • A table to provide the ability to turn on and turn off rules and store threshold percentages for each rule for its corresponding data domain (DATA_QUALITY_RULE_EXECUTE table).
  • A table to store the results of Data Quality Rules to serve as a Data Quality Rule Monitoring results repository (DATA_QUALITY_RULE_RESULTS).

A possible model would look like Figure 2

word image 62 Implementing a Data Quality Monitoring Framework

Figure 2: The data quality model

Here are the definitions for the three tables.

A. DATA_QUALITY_RULE Table

This table holds all the predefined Data Quality Rules. These rules could be for one data domain or multiple data domains.

DDL for creating the DATA_QUALITY_RULE table:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

CREATE TABLE DATA_QUALITY_RULE

(

    DATA_QUALITY_RULE_ID INTEGER  NOT NULL,

    RULE_NAME VARCHAR(60)  NOT NULL,

    RULE_DESCRIPTION VARCHAR(256)  NOT NULL,

    RULE_TYPE_CODE VARCHAR(20) NOT NULL,

    RULE_LEVEL_CODE VARCHAR(20) NOT NULL,

    SCHEMA_NAME VARCHAR(60),

    TABLE_NAME VARCHAR(60),

    DATA_NAME VARCHAR(60),

    RULE_SCRIPT_TYPE_CODE VARCHAR(20),

    RULE_SCRIPT_TEXT VARCHAR(256),

    CREATE_PROCESS_ID BIGINT NOT NULL,

    CREATE_PROCESS_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_USER_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_SYSTEM_TIMESTAMP DATETIME NOT NULL,

    CONSTRAINT [PK_DATA_QUALITY_RULE_ID] PRIMARY KEY  ([DATA_QUALITY_RULE_ID])

);

Column Name

Column Details

DATA_QUALITY_RULE_ID

Unique Rule Identifier

RULE_NAME

Name of the Rule

RULE_DESCRIPTION

Details about the Rule

RULE_TYPE_CODE

Describes if Rule is “GATING” or “PASSIVE”

RULE_LEVEL_CODE

Level at which rule is executed at file or table or column level.

SCHEMA_NAME

Schema of the Table or Schema of File.

TABLE_NAME

Table that holds the data on which Data Quality Rules need to execute.

DATA_NAME

Column Name on which rule is executed

RULE_SCRIPT_TYPE_CODE

Code for Detecting if Rule shall pass or Fail

RULE_SCRIPT_TEXT

Description regarding RULE_SCRIPT_TYPE_CODE

CREATE_PROCESS_ID

Process ID that loaded data in DATA_QUALITY_RULE Table

CREATE_PROCESS_NAME

Process Name that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_USER_NAME

Service Account that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_SYSTEM_TIMESTAMP

Timestamp when data that rule got inserted in DATA_QUALITY_RULE table

B. DATA_QUALITY_RULE_EXECUTE Table

This table holds information related to whether the rule is active or not, and the threshold percentage values against which data quality is measured to pass or fail a rule.

DDL for creating DATA_QUALITY_RULE_EXECUTE table

1

2

3

4

5

6

7

8

9

10

11

12

13

14

CREATE TABLE DATA_QUALITY_RULE_EXECUTE

(

    DATA_QUALITY_DATA_SOURCE_CODE VARCHAR(20) NOT NULL,

    DATA_QUALITY_RULE_ID INTEGER  NOT NULL,

    RULE_PROCESS_STEP_CODE VARCHAR(256)  NOT NULL,

    RULE_EXECUTE_STATUS_CODE VARCHAR(20) NOT NULL,

    THRESHOLD_PASS_PERCENT VARCHAR(20) NOT NULL,

    CREATE_PROCESS_ID BIGINT NOT NULL,

    CREATE_PROCESS_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_USER_NAME VARCHAR(35) NOT NULL,

    CREATE_RECORD_SYSTEM_TIMESTAMP DATETIME NOT NULL,

    FOREIGN KEY ([DATA_QUALITY_RULE_ID]) REFERENCES [DATA_QUALITY_RULE] ([DATA_QUALITY_RULE_ID])

   ON DELETE NO ACTION ON UPDATE NO ACTION

);

Column Name

Column Details

DATA_QUALITY_DATA_SOURCE_CODE

Data Domain or Source of the Data

DATA_QUALITY_RULE_ID

Unique Rule Identifier

RULE_PROCESS_STEP_CODE

Step at which data rule is being applied on Data (PRE_STAGE/POST_STAGE)

RULE_EXECUTE_STATUS_CODE

Indicates the Status of Rule. “P” Indicates “Pass”; “F” Indicates “Fail”, “O” Indicates “Override”

THRESHOLD_PASS_PERCENT

Threshold Percent that if met cause causes rule to “Pass” else will cause it to “Fail”

CREATE_PROCESS_ID

Process ID that loaded data in DATA_QUALITY_RULE Table

CREATE_PROCESS_NAME

Process Name that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_USER_NAME

Service Account that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_SYSTEM_TIMESTAMP

Timestamp when data that rule got inserted in DATA_QUALITY_RULE table

C. DATA_QUALITY_RULE_RESULTS Table

This table is a repository to store the results of Data Quality Rule monitoring.

DDL for creating DATA_QUALITY_RULE_RESULTS table:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

CREATE TABLE DATA_QUALITY_RULE_RESULTS

(

DATA_QUALITY_RESULT_ROW_ID BIGINT NOT NULL,  

DATA_QUALITY_DATA_SOURCE_CODE VARCHAR(20) NOT NULL,

DATA_QUALITY_RULE_ID INTEGER  NOT NULL,

RULE_PROCESS_STEP_CODE VARCHAR(20) NOT NULL,

TABLE_ROW_IDENTIFIER BIGINT,

   SCHEMA_NAME VARCHAR(60),

   TABLE_NAME VARCHAR(60),

   DATA_NAME VARCHAR(60),

   DATA_VALUE VARCHAR(250),

     RULE_TYPE_CODE VARCHAR(20),

     RULE_LEVEL_CODE VARCHAR(20),

     PASS_FAIL_OVERRIDE_CODE CHAR(1),

   PASS_COUNT INTEGER,

   FAIL_COUNT INTEGER,

   SYSTEM_PARTITION_IDENTIFIER VARCHAR(60),

     CREATE_PROCESS_ID BIGINT NOT NULL,

   CREATE_PROCESS_NAME VARCHAR(35) NOT NULL,

   CREATE_RECORD_USER_NAME VARCHAR(35) NOT NULL,

   CREATE_RECORD_SYSTEM_TIMESTAMP DATETIME NOT NULL,

   CONSTRAINT [PK_DATA_QUALITY_RESULT_ROW_ID] PRIMARY KEY  ([DATA_QUALITY_RESULT_ROW_ID]),

   FOREIGN KEY ([DATA_QUALITY_RULE_ID]) REFERENCES [DATA_QUALITY_RULE] ([DATA_QUALITY_RULE_ID])

ON DELETE NO ACTION ON UPDATE NO ACTION

);

Column Name

Column Details

DATA_QUALITY_RESULT_ROW_ID

Unique Identifier for each record in DATA_QUALITY_RULE

DATA_QUALITY_DATA_SOURCE_CODE

Data Domain or Source of the Data

DATA_QUALITY_RULE_ID

Unique Rule Identifier

RULE_PROCESS_STEP_CODE

Step at which data rule is being applied on Data (PRE_STAGE/POST_STAGE)

TABLE_ROW_IDENTIFIER

Unique Identifier from the Source table.

SCHEMA_NAME

Schema of the Table or Schema of File.

TABLE_NAME

Table that holds the data on which Data Quality Rules need to execute.

DATA_NAME

Column Name on which rule is executed

DATA_VALUE

Data Value

RULE_TYPE_CODE

Describes if Rule is “GATING” or “PASSIVE”

RULE_LEVEL_CODE

Level at which rule is executed at file or table or column level.

PASS_FAIL_OVERRIDE_CODE

Status of Data Quality Rule (Pass or Fail or Override)

PASS_COUNT

Count of Records that Passed the Rule

FAIL_COUNT

Count of Records that Failed the Rule

SYSTEM_PARTITION_IDENTIFIER

Partitioning key for DATA_QUALITY_RULE table

CREATE_PROCESS_ID

Process ID that loaded data in DATA_QUALITY_RULE Table

CREATE_PROCESS_NAME

Process Name that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_USER_NAME

Service Account that loaded data in DATA_QUALITY_RULE Table

CREATE_RECORD_SYSTEM_TIMESTAMP

Timestamp when data that rule got inserted in DATA_QUALITY_RULE table

IV. Understanding Data Quality Monitoring Implementation with an Example:

In order to understand the above Data Quality Framework, I will walk through an example as follows:

A. Source File

The Source File is a pipe-delimited file with Invoice data and its corresponding Control file.

InvoiceId|CustomerId|InvoiceDate|BillingAddress|BillingCity|BillingState|BillingCountry|BillingPostalCode|Total|StoreID

100|23|2007-01-11|69 Salem Street|Boston|MA|USA|2113|13.86|1

200|16|2007-02-19|1600 Amphitheatre Parkway|Mountain View|CA|USA|94043-1351|0.99|2

300|17|2007-03-04|1 Microsoft Way|Redmond|WA|USA|98052-8300|1.98|3

400|19|2007-03-05|1 Infinite Loop|Cupertino|CA|USA|95014|1.98|4

500|21|2007-03-06|801 W 4th Street|Reno|NV|USA|89503|3.96|5

B. Control File

The Control File has details related to the source file like record counts and other important details.

DataFileName|ControlFileName|DataFileCount|MinInvoiceDate|MaxInvoiceDate|TotalAmount

Invoice_202001.dat|invoice.ctl|5|2007-01-11|2007-02-28|22.77

C. DATA_QUALITY_RULE Table

The Data Quality Rule table is a repository of all rules that need to be executed against the data to gauge the quality of data.

There are 2 types of data quality rules.

  1. Gating Rules
  2. Passive Rules

Gating Rules: Gating Rules are critical rules. These are rules that are critical to certify the quality of data. If any one of these rules fails, that means the data is not good enough to be loaded into application tables. The source data should be either corrected, or a new version of file needs to be requested from the source system. In that case, the data quality rules need to be rerun on new file and gating rules should pass for the data to be loaded into underlying reporting application tables. When “Gating” data quality rule fails, the ETL cycle would stop and would not proceed until either the new data is requested or that gating rule passes, or data stewards do a manual override.

Passive Rules: Passive Rules are the rules that are good to have but are not very critical. That means data is still useful for analytical reporting. If a passive data quality rule fails, then also data would be allowed to flow down to downstream applications.

In order to understand I’ll walk you through the following five rules.

Rule 1:

Data Quality Rule to Check the Source Schema/Layout matches the expected layout.

Rule 2:

Data Quality Rule to verify the same file has not been processed earlier (Duplicate File Check)

Rule 3:

Data Quality Rule to verify each record in the data file has an Invoice Date between the Minimum Invoice Date and Maximum Invoice Data and write failed records to the results table.

Rule 4:

Data Quality Rule to provide a count of passed/failed records for Invoice Date between the Minimum Invoice Date and Maximum Invoice Data.

Rule 5:

Data Quality Rule to provide a count of passed/failed records for StoreID being null.

Please refer to DataQuality_InsertStatements.txt for insert statements.

Note: I am inserting rules in the DATA_QUALITY_RULE table using Insert Statement (for demonstration/explanation purposes for each record). A preferable way would be to load all rules from a pipe separated file using an ETL tool or script.

RULE_SCRIPT_TYPE_CODE: The rule script type code field defines when a rule passes or fails and when records are supposed to be loaded to the DATA_QUALITY_RULE_RESULTS table.

To understand how this works, review this example:

RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_NE_F_F”

‘COMP_THRE_VAL’ stands for Compare Threshold values, i.e. compare the threshold value of the actual threshold vs the expected threshold.

NE – Stands for Not Equal, i.e. if the actual data threshold percent % and expected threshold percent (from DATA_QUALITY_RULE_EXECUTE) are not equal. Other possible values are

“EQ” stands for “Equal to”

“GE” stands for “Greater than or equal to”

“GT” stands for “Greater than”

“LE” stands for “Less than or equal to”

“LT” stands for “Less than”

F – Stands for action to be taken regarding the status of the rule. In this case, if the expected threshold percent is not equal to actual the data threshold percent, then fail the rule. The other possible value is “P”.

F – Stands for write the failed records to the DATA_QUALITY_RULE_RESULTS table. “P” Stands for write-only passed records and “E” stands for loading everything, i.e. both pass and fail records to be written to DATA_QUALITY_RULE_RESULTS table.

D. DATA_QUALITY_RULE_EXECUTE

This table is used to control the activation and inactivation of a data quality rule. If a rule is marked as active in the DATA_QUALITY_RULE_EXECUTE table, then that rule will be executed against the data, and corresponding data metrics would be captured and loaded in DATA_QUALITY_RULE_RESULTS table. If a rule is marked inactive in the DATA_QUALITY_RULE_EXECUTE table, then that rule will NOT be executed against the source data, and its metrics are not captured in the table.

Data Quality rules would be executed mostly at two places in the process as follows:

  1. Pre-Stage DQ Rules – Before loading into the Stage Table
  2. Post Stage DQ Rules – After loading into the Stage Table

Pre-Stage DQ Rules:

The Pre-Stage DQ rules are executed before the data gets loaded into the Stage Table.

The intent of having Pre-Stage DQ rules is to avoid loading bad data in stage tables and to avoid abortion of the ETL load process because of an unexpected file layout.

Some of the examples for Pre-Stage Data Quality rules are:

  1. Schema/layout validation of input file.
  2. Duplicate File Check
  3. Control File and Data File Count Match.

Post-Stage DQ Rules: Post-Stage DQ rules are executed after the data is loaded into the staging table. Post-Stage DQ rules would typically validate data against business-critical fields and produce metrics.

Please refer to DataQuality_InsertStatements.txt for insert statements.

Below is an explanation for each Insert statement into DATA_QUALITY_RULE_EXECUTE table

Rule 1 (Schema Layout Verification Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “PRE_STAGE” rule that means it should be executed before the Invoice data file is even loaded into an Invoice Stage table

Rule 2 (Duplicate File Validation Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “PRE_STAGE” rule that means it should be executed before the Invoice data file is even loaded into Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 100, meaning that even if a single record does not meet the criteria, the rule will fail.

Rule 3 (Invoice Date Validation Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “POST_STAGE” rule meaning that it should be executed after the Invoice data file is loaded into Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 100, meaning that even if a single record does not meet the criteria, the rule will fail.

Rule 4 (Invoice Date Summary Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “POST_STAGE” rule that means it should be executed after the Invoice data file is loaded into the Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 100, meaning that even if a single record does not meet the criteria, the rule will fail.

Rule 5 (Store ID Summary Rule):

The rule belongs to the “Invoice” Data Domain as apparent from DATA_QUALITY_DATA_SOURCE_CODE set to “Invoice”. It is a “POST_STAGE” rule that means it should be executed after the Invoice data file is loaded into the Invoice Stage table. The rule is “Active” as RULE_EXECUTE_STATUS_CODE is set to “A”. The Threshold Pass Percentage is set to 90%, meaning that 90% or more of the records should have StoreID populated for the rule to pass.

E. DATA_QUALITY_RULE_RESULTS

The DATA_QUALITY_RULE_RESULTS table is a repository where all the data quality metrics for the rule executions are stored. The table is partitioned on SYSTEM_PARTITION_IDENTIFIER. A new partition is created for each month for each data source.

Data Quality Rule Results are generated by calculating the expected and actual results. Both expected and actual results are compared, and if the actual results match the expected results, the rule pass else would be marked as failed.

F. Expected Results:

Multiple steps are involved in generating expected results.

word image 63 Implementing a Data Quality Monitoring Framework

Figure 3: The expected results

In order to generate expected results, three steps are involved.

  1. Join the Data Quality Rule table with Data Quality Rule Execute table.
  2. Pivot the Control File Data
  3. Join the Data Quality Rule tables data with control file data

1. Join Data Quality Rule and Data Quality Rule Execute table as follows:

The SQL query below joins the Data_Quality_Rule table to the Data_Quality_Rule_Execute table on Rule ID to create a flattened record with all Data Quality Rules Information.

Please Note: You could add a where condition on DATA_QUALITY_DATA_SOURCE_CODE to filter the application for which you like to run the data quality rule.

1

2

3

4

5

6

7

8

9

10

11

12

13

select

dq.DATA_QUALITY_RULE_ID,

dq.RULE_TYPE_CODE,

dq.RULE_LEVEL_CODE,

dq.SCHEMA_NAME,

dq.TABLE_NAME,

dq.DATA_NAME,

dq.RULE_SCRIPT_TYPE_CODE,

dq.RULE_SCRIPT_TEXT,

dqr.RULE_PROCESS_STEP_CODE,

dqr.RULE_EXECUTE_STATUS_CODE,

dqr.THRESHOLD_PASS_PERCENT

from data_quality_rule dq inner join DATA_QUALITY_RULE_EXECUTE dqr on dq.DATA_QUALITY_RULE_ID=dqr.DATA_QUALITY_RULE_ID

The joined data would look something like this.

DATA_QUALITY_RULE_ID|RULE_TYPE_CODE|RULE_LEVEL_CODE|SCHEMA_NAME|TABLE_NAME|DATA_NAME|RULE_SCRIPT_TYPE_CODE|RULE_SCRIPT_TEXT|RULE_PROCESS_STEP_CODE|RULE_EXECUTE_STATUS_CODE|THRESHOLD_PASS_PERCENT

1|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|SCHEMA_VALIDATION|COMP_THRE_VAL_EQ_P_E|Val_SchemaFile_Chk|PRE_STAGE|A|100.00

2|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|DUPLICATE_VALIDATION|COMP_THRE_VAL_EQ_P_E|Val_DuplicateFile_Chk|PRE_STAGE|A|100.00

3|GATING|COLUMN|dbo|Invoice|InvoiceDate|COMP_THRE_VAL_NE_F_F|Val_InvoiceDate_Chk|POST_STAGE|A|100.00

4|PASSIVE|TABLE|dbo|Invoice|InvoiceDate|COMP_THRE_VAL_EQ_P_E|InvoiceDate_Chk_Summary|POST_STAGE|A|100.00

5|PASSIVE|TABLE|dbo|Invoice|StoreID|COMP_THRE_VAL_EQ_P_F|StoreID_CHK|POST_STAGE|A|90.00

2. Pivot the Control File Data.

This step pivots the data in the control file. Pivoting is nothing more than converting columns to rows.

Control File before pivoting (Also shown in above sections)

DataFileName|ControlFileName|DataFileCount|MinInvoiceDate|MaxInvoiceDate|TotalAmount

Invoice_202001.dat|invoice.ctl|5|2007-01-11|2007-02-28|22.77

Control File After pivoting:

DATA_NAME|DATA_VALUE

DataFileName|Invoice_202001.dat

ControlFileName|Invoice.ctl

DataFileCount|5MinInvoiceDate|2007-01-11

MaxInvoiceDate|2007-02-28

TotalAmount|22.87

Pivoting helps joining control file details with corresponding Data Quality Rule Information (from step 1).

3. Joining the Data Quality Rule Tables Data with Control File Data

This is the final step to create the expected results. In this, you join the pivoted control file data (Step2) with Data Quality Rule tables information (Step1) on “DATA_NAME.” This creates the expected results as follows. Expected results are nothing but control file data associated with corresponding data quality rule information.

DATA_QUALITY_RULE_ID|RULE_TYPE_CODE|RULE_LEVEL_CODE|DATA_NAME|DATA_VALUE|RULE_SCRIPT_TYPE_CODE|RULE_SCRIPT_TEXT|RULE_PROCESS_STEP_CODE|RULE_EXECUTE_STATUS_CODE|THRESHOLD_PASS_PERCENT

1|GATING|FILE|Schema_Validation|Y|COMP_THRE_VAL_EQ_P_E|Val_SchemaFile_Chk|PRE_STAGE|A|100.00

2|GATING|FILE|Duplicate_Validation|Y|COMP_THRE_VAL_EQ_P_E|Val_DuplicateFile_Chk|PRE_STAGE|A|100.00

3|GATING|COLUMN|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_NE_F_F|Val_InvoiceDate_Chk|POST_STAGE|A|100.00

4|PASSIVE|TABLE|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_EQ_P_E|InvoiceDate_Chk_Summary|POST_STAGE|A|0.00

5|PASSIVE|TABLE|StoreID|NOTNULL|COMP_THRE_VAL_EQ_P_F|StoreID_CHK|POST_STAGE|A|90.00

G. Actual Results:

There are two steps involved in creating the actual results as follows:

  1. Pivot the source data file.
  2. Join pivoted source data with Data Quality Rule table and ETL Process Metadata

word image 64 Implementing a Data Quality Monitoring Framework

Figure 4: The actual results

1. Pivot the source data file:

Below is the source file that has been mentioned in an earlier section of this article.

Source File

InvoiceId|CustomerId|InvoiceDate|BillingAddress|BillingCity|BillingState|BillingCountry|BillingPostalCode|Total|StoreID

100|23|2007-01-11|69 Salem Street|Boston|MA|USA|2113|13.86|1

200|16|2007-02-19|1600 Amphitheatre Parkway|Mountain View|CA|USA|94043-1351|0.99|2

300|17|2007-03-04|1 Microsoft Way|Redmond|WA|USA|98052-8300|1.98|3

400|19|2007-03-05|1 Infinite Loop|Cupertino|CA|USA|95014|1.98|4

500|21|2007-03-06|801 W 4th Street|Reno|NV|USA|89503|3.96|5

Pivot the source file data with InvoiceID as the key for each record. Pivoting is a process for transforming row column-level data to the row level. The data would look like below.

Please Note: This is a sample set and not the entire data set.

1

2

3

4

5

6

7

8

9

10

11

InvoiceID|DataName|DataValue

100|InvoiceDate|2007-01-11

200|InvoiceDate|2007-02-19

300|InvoiceDate|2007-03-04

400|InvoiceDate|2007-03-05

500|InvoiceDate|2007-03-06

100|StoreID|1

200|StoreID |2

300|StoreID |3

400|StoreID |4

500|StoreID |5

2. Join Source File Pivot Data with Data Quality Rule table and ETL Process Metadata

In this step, take the pivoted source data (step 1) and join with the Data Quality Rule table to attach Rule ID to each pivoted source data record.

Please Note: The join to ETL process metadata is optional. If there are no Data Quality Rules related to ETL metadata, this join is not needed.

The ETL process metadata is standard metadata information captured by ETL processes like start and end time of ETL executions, if the source file met the expected schema, if the source file is a new file or a duplicate file based on the file name.

Once joining pivoted source data with Data Quality Rule table, you would then join it with ETL process metadata to get information related to Rule 1. This is related to file layout/schema verification of source file and also for Rule 2 that is related to duplicate source file names.

The following shows after validating for Rule 1, there is a Data Value of 5 and Pass Count as 5. This indicates that there were 5 rows in the file, and all 5 of them passed the validation. This information is coming from the ETL process metadata. Similarly, for Rule 2 related to Duplication Validation, the pass count is 1, and a fail count is 0. If this were a duplicate file, there would be a pass count of 0 and fail count of 1. For all the other rules, Pass count and Fail count are not populated yet because they are dependent on matching this actual data with expected data.

Data After Joining source pivoted data with Data Quality rule and ETL process metadata.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

RuleID|InvoiceID|DataName|DataValue|Pass_Count|Fail_Count

1|0|Schema_Validation|5|5|0

2|0|Duplicate_Validation|0|1|0

3|100|InvoiceDate|2007-01-11|NULL|NULL

3|200|InvoiceDate|2007-02-19|NULL|NULL

3|300|InvoiceDate|2007-03-04|NULL|NULL

3|400|InvoiceDate|2007-03-05|NULL|NULL

3|500|InvoiceDate|2007-03-06|NULL|NULL

4|100|InvoiceDate|2007-01-11|NULL|NULL

4|200|InvoiceDate|2007-02-19|NULL|NULL

4|300|InvoiceDate|2007-03-04|NULL|NULL

4|400|InvoiceDate|2007-03-05|NULL|NULL

4|500|InvoiceDate|2007-03-06|NULL|NULL

5|100|StoreID|1|NULL|NULL

5|200|StoreID |2|NULL|NULL

5|300|StoreID |3|NULL|NULL

5|400|StoreID |4|NULL|NULL

5|500|StoreID |5|NULL|NULL

H. Generating Data Quality Rule Results.

Data Quality Rule Results are generated by comparing the Expected Results with Actual Results. The ETL logic compares expected results with actual results to derive Data Quality metrics and then loads it into the Data Quality Rule Results table.

Expected Results (These were derived in above section of article):

DATA_QUALITY_RULE_ID|RULE_TYPE_CODE|RULE_LEVEL_CODE|SCHEMA_NAME|TABLE_NAME|DATA_NAME|DATA_VALUE|RULE_SCRIPT_TYPE_CODE|RULE_SCRIPT_TEXT|RULE_PROCESS_STEP_CODE|RULE_EXECUTE_STATUS_CODE|THRESHOLD_PASS_PERCENT

1|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|SCHEMA_VALIDATION|NULL|COMP_THRE_VAL_EQ_P_E|Val_SchemaFile_Chk|PRE_STAGE|A|100.00

2|GATING|FILE|Invoice_schema.txt|Invoice_YYYYMM.dat|DUPLICATE_VALIDATION|NULL|COMP_THRE_VAL_EQ_P_E|Val_DuplicateFile_Chk|PRE_STAGE|A|100.00

3|GATING|COLUMN|dbo|Invoice|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_NE_F_F|Val_InvoiceDate_Chk|POST_STAGE|A|100.00

4|PASSIVE|TABLE|dbo|Invoice|InvoiceDate|2007-01-11:2007-03-06|COMP_THRE_VAL_EQ_P_E|InvoiceDate_Chk_Summary|POST_STAGE|A|100.00

5|PASSIVE|TABLE|dbo|Invoice|StoreID|NULL|COMP_THRE_VAL_EQ_P_F|StoreID_CHK|POST_STAGE|A|90.00

Actual Results (These were derived in above section of article):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

RuleID|InvoiceID|DataName|DataValue|Pass_Count|Fail_Count

1|0|Schema_Validation|5|5|0

2|0|Duplicate_Validation|0|1|0

3|100|InvoiceDate|2007-01-11|NULL|NULL

3|200|InvoiceDate|2007-02-19|NULL|NULL

3|300|InvoiceDate|2007-03-04|NULL|NULL

3|400|InvoiceDate|2007-03-05|NULL|NULL

3|500|InvoiceDate|2007-03-06|NULL|NULL

4|100|InvoiceDate|2007-01-11|NULL|NULL

4|200|InvoiceDate|2007-02-19|NULL|NULL

4|300|InvoiceDate|2007-03-04|NULL|NULL

4|400|InvoiceDate|2007-03-05|NULL|NULL

4|500|InvoiceDate|2007-03-06|NULL|NULL

5|100|StoreID|1|NULL|NULL

5|200|StoreID |2|NULL|NULL

5|300|StoreID |3|NULL|NULL

5|400|StoreID |4|NULL|NULL

5|500|StoreID |5|NULL|NULL

Comparing Expected Results with Actual Results

The expected results are joined with actual results on column Data Quality Rule ID, and the other ETL process compares the actual vs expected and produces Data Quality Rule Metrics.

word image 65 Implementing a Data Quality Monitoring Framework

Rule 1: For Rule 1, the expected threshold is set to “100”. If the schema validation has passed for the data file, then the Actual Threshold would also be derived as “100” in the ETL logic. RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_EQ_P_E”. That means if both thresholds are equal, set the rule status to pass “P”, and “E” represents that rule results are to be loaded into the table irrespective of rule status.

word image 66 Implementing a Data Quality Monitoring Framework

word image 67 Implementing a Data Quality Monitoring Framework

Rule 2: For Rule 2, the expected threshold is set to “100”. If the data file is not a duplicate file, i.e. if the same file name has not been processed earlier, then the Actual Threshold would also be derived as “100” in the ETL logic. RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_EQ_P_E”, so if both thresholds are equal, set the rule status to pass “P”. “E” means that rule results are to be loaded into the table irrespective of rule status.

word image 68 Implementing a Data Quality Monitoring Framework

word image 69 Implementing a Data Quality Monitoring Framework

Rule 3: For Rule 3, the expected threshold is set to “100”, and the InvoiceDate of each record from the actual results dataset is compared with the expected InvoiceDate range of 2007-01-11 to 2007-02-28. As each row is checked, any record that has an InvoiceDate outside the range is marked as failed with the “F” status code. Since the RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_NE_F_F”, any record that fails would be loaded into the Data Quality Rule Results table.

word image 70 Implementing a Data Quality Monitoring Framework

word image 71 Implementing a Data Quality Monitoring Framework

Please Note: Pass count and Fail count loaded as “0” as this is column level rule and not a table-level rule. It is possible to increment the fail count with each row being loaded to into Data Quality Rule results table, but as this will require sorting and ranking rows. It is preferable that pass_count and fail_count be used for table level rules where the aggregation is done and total pass and fail count can be easily identified.

Rule 4: For Rule 4, the expected threshold is set to “100” and from the actual results records for which InvoiceDate does not fall within the expected range of 2007-01-11 to 2007-02-28 are counted, and Actual Threshold percentage is derived at 40. If no records have failed, then the actual Threshold percentage would be 100. As this is a table-level rule, only one record for the entire batch load will be loaded in the Data Quality Rule Results table. RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_EQ_P_E”, so that means if both thresholds are equal set the rule status to pass “P” and “E” (Everything) represents that rule results to be loaded into the table irrespective of rule status.

word image 72 Implementing a Data Quality Monitoring Framework

word image 73 Implementing a Data Quality Monitoring Framework

Rule 5: Rule 5 is to check if the StoreID column is not null. The expected threshold is set to “90” and RULE_SCRIPT_TYPE_CODE is set to “COMP_THRE_VAL_GE_P_F”. In the data file, there are no records that have StoreID as null. In this case, the ETL would calculate the actual threshold as “100” which is greater than 90% (expected threshold), and the rule will pass “P”. Since the final character is “F” in COMP_THRE_VAL_GE_P_F, it means only failed rules should be loaded in the data quality results table. In this case, no results would be loaded for Rule5. The final rule results table would look like this.

select * from DATA_QUALITY_RULE_RESULTS

word image 74 Implementing a Data Quality Monitoring Framework

word image 75 Implementing a Data Quality Monitoring Framework

word image 76 Implementing a Data Quality Monitoring Framework

In the above example, I have walked through all the 5 data quality rules. Rules 1 and 2 are pre-stage rules, and they should be executed before data is loaded in-to stage table and the Rules 3, 4 and 5 are post-stage data quality rules and should be executed after the data is loaded to stage table.

A dashboard could be created on DATA_QUALITY_RULE_RESULTS table to provide data quality metrics to data stewards for review.

V. Conclusion

By implementing the above Data Quality Monitoring Data Framework, an organization can ensure that the highest quality data is loaded into underlying application tables. Detail metrics related to data quality could be made available to data stewards by building dashboard on top of DATA_QUALITY_RULE_RESULTS table.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

Bolly4U; Choose the Best Online Streaming Website to Spend Your Time in a Quality Way

April 23, 2020   Humor
Bolly4U 696x435 Bolly4U; Choose the Best Online Streaming Website to Spend Your Time in a Quality Way

Bolly4U has been presenting quality entertaining videos for free, for over many years, and thus has become a hot topic among all the age groups. In this rapid and modern life, you might have noticed that the movies and TV series are loved, awaited, and appreciated among the young and the new generations. Bolly4U is going to be the best option for you if you want to spend your leisure time watching quality movies.

This is one of the best web-based platforms, where you can get access to the latest Hindi, Tamil, and English dubbed movies for free. The site owns a large catalog of all sorts of movies with the best ones of all times absolutely for free. In the upcoming lines, you are going to read all about this amazing website and the way to watch movies within a safe and secure environment. 

Types Of Entertaining Stuff On Bolly4u?

You may find there incredibly a huge variety of latest Bolly4U movies. The site has made it possible to find almost all types of movies available on its platform. There are different types of movies including the Telugu movies, Tamil movies, Tamil dubbed movies, Telugu dubbed movies, Hindi new movies, Hollywood latest movies, Kannada movies, Telugu dubbed Hollywood movies and Telugu AVI movies.

Relaunching with New Clone Names

Though the Bolly4U online movie streaming site has been banned many times, yet they have got tricks for this problematical issue. They are running all of their illegal activities by changing multiple clone domains. As they get shut down with one domain name, they move to the other one and resume sharing the latest movies, mega shows, and web series on their web portal. At the current time, they are running all of their unlawful activities from different domains.

Here is a small list of some of the banned, and active domain names of Bolly4U.

https://bolly4u.tube/
https://bolly4u.trade/
https://bolly4u.best/
https://bolly4u.info/
https://bolly4u.org/
All the links are going to redirect you to the official site

The old domains that were shut down

bolly4u.rocks
bolly4u.net
bolly4u.video
boll-y4u.com
boll-y4u.live
boll-y4u.rocks
boll-y4u.site
bolly4u.ninja
bolly4u.work
boll-y4u.online
boll-y4u.video
boll-y4u.asia
boll-y4u.website
boll-y4u.ninja

Incredible Video Quality with Crystal Clear Results 

Bolly4U presents the best quality movies to its visitors without charging them any monthly or annual fee. If you type some queries about the movie you want to watch in the search bar, you will get results instantly based on your search. If you are seeking a movie to download, there is also an option to choose the resolution quality like 720p, 1080p before downloading it. You will get every movie listed with additional information like name, story, duration, genre, cast, rating, and release date, etc. with the movie.

Easy to Navigate Web Portal, and User-Friendly Interface

The site has been designed with a user-friendly interface and owns an easy to navigate plan. Thus it makes its user feel comfortable while browsing the site. In this web-based free movie store, there are many sections and categories, where you can find your favorite music, movies, and shows. The navigation to the site is very simple, that you can find all your desired stuff within just a few steps or clicks. You will never find yourself alone or feel perplexed to navigate the site.

Huge Variety of Movies and Shows

When you visit the site, you may find the main menu that leads you to different sections. All the sections are categorized into different genres of the movies, TV series, and mega shows. If you want to watch romantic movies, you will get big data regarding your taste. For kids, there is a kids’ corner containing cartoons and animation movies. The women can find fashion shows on the site. If you are a sports lover, you can watch any small or major sport event. The videos about International mega shows can also be had on the site. To find it all, you have to just log on the site, and visit all the data freely.

Don’t Forget The Site is Presenting Pirated Content

Bolly4U is a well-known platform to present pirated movies from Bollywood, Hollywood, and South Indian film making industry. If you consider the site on moral values, it is not suitable for you to watch pirated movies, because it causes a lot more loss to the filmmakers. It’s therefore not appreciated to watch movies on such a site. 

The Care, You Should Take

Though you can browse the Internet without any worry, yet if you are visiting free streaming sites like Bolly4U. You are suggested to be cautious, and careful while staying on the site. If you want to be safe and secure when you are visiting such websites. You are recommended to follow the instructions, discussed in the upcoming lines.

Reliable VPN and a Quality Anti-Virus

If you are a frequent user to Bolly4U and love watching pirated content. You must remember that watching illegal and pirated movies is not encouraged by cyber law enforcement authorities and is considered a crime. Therefore you are advised to use a reliable VPN to hide your location. It would help you to be possibly secured. The site presents free stuff to its viewers, and therefore you must know that the free contents are always threatening. And never free from the danger of viruses, malware or threat to your devices. Most of the free website contains the threat of fatal viruses that can harm your Internet-connected device. You are warned to use some quality anti-virus program before staying on such websites. Watching movies on pirated streaming sites could be a straight virus threat to your pc, laptop or smartphone

Never Share Any Of Your Personal Details On The Site

The site is operated by unknown people and unknown locations. Therefore it is never wise to share your personal, financial, or business details with anyone on such platforms. At the very first you should conceal all of your financial details and share them with anyone. Second, beware of online credit card hackers; who always wander on such sites and keep on befooling the people to deprive them of their assets. 

Disclaimer

You are therefore notified that watching pirated content is illegal and not a good practice. As it is said ‘Safety First’, you should avoid visiting such sites. And use Netflix, Amazon, and HBO site to watch all the latest movies, but in a safe and secure atmosphere. On our site, we don’t recommend piracy in any way.

Let’s block ads! (Why?)

Mefunnysideup

Read More

Amazon’s AI predicts product quality from customer feedback

March 10, 2020   Big Data

Amazon scientists are prototyping algorithms that use crowdsourcing to identify product data, the company reports in a blog post. The researchers believe these algorithms could be used to predict human judgments of product quality on Amazon, which might improve people’s shopping experience by matching only high-quality products to search queries.

The work is something of a follow-up to a study Amazon published in early January that examined why Amazon customers buy seemingly irrelevant products while shopping for specific items. In an analysis, a team of Amazon researchers found that customers are partial to products that are broadly popular or cheaper than products relevant to a given search query. Additionally, their results suggested people are more likely to buy or engage with irrelevant products in categories like toys and digital goods than in categories like beauty and groceries.

In this latest study, which is scheduled to be presented next week at the ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR) in Vancouver, the researchers presented crowd workers with images of pairs of related products, along with product information supplied by both sellers and customers. The researchers then asked the crowd workers which products were of higher quality and which terms extracted from the product information best explained their judgments.

 Amazon’s AI predicts product quality from customer feedback

Each product pair in the study included one product that was actually purchased and one that was clicked on but not purchased during the same customer search query. Products also shared the most fine-grained classification available in the Amazon.com product classification hierarchy (e.g., Electronics, Home, Kitchen, Beauty, Office Products), and the terms presented to the crowd workers were chosen based on how frequently they appeared in texts associated with these categories.

 Amazon’s AI predicts product quality from customer feedback

The team found that while perceived quality wasn’t a good predictor of customers’ purchase decisions, it was highly correlated with price, such that customers generally chose lower-quality products if the items were correspondingly priced. Furthermore, the terms that best described the crowd workers’ judgment criteria came from the public customer-supplied information — that is, customer reviews and question-and-answer sequences in which customers answered other shoppers’ product-related questions — as opposed to the seller information.

“Existing research on product recommendation has mainly focused on modeling purchases directly, without attempting to find the reasons behind customer decisions. We believe that understanding the processes that underlie customers’ purchasing decisions will help us make better product recommendations,” wrote study coauthors Jie Yang, Rongting Zhang, and Vanessa Murdock. “This work represents one of several steps we’re taking in that direction.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Google Translate improves offline translation quality by up to 20%

December 19, 2019   Big Data

Bound for a foreign country where you don’t speak the language? Good news: Google has you covered. This afternoon, the Mountain View tech giant announced that Translate, its free multilingual machine translation service, is now more robust. Offline, translation accuracy has improved by an order of magnitud in some cases.

Across the 59 languages supported offline by Translate on Android and iOS, accuracy is now 12% better, with enhanced word choice, grammar, and sentence structure. For some languages, including Japanese, Korean, Thai, Polish, and Hindi, the quality gain is more than 20%. In a related improvement, Translate now offers offline transliteration support for 10 new languages, including Arabic, Bengali, Gujrati, Kannada, Marathi, Tamil, Telugu and Urdu, such that now it provides an equivalent spelling in a familiar alphabet (for example, showing “नमस्ते” and “namaste” for “hello” where “namaste” is the transliteration of “नमस्ते.”)

The new and improved Translate is available as of today, Google says. Users who’ve used offline translation before will see a banner on the app’s home screen that will take them to the right place to update their offline files, but those who haven’t will have to head to the offline translation settings screen and tap the arrow next to the language name to download that language. (Each language package takes up about 35-45 MB, roughly equivalent in size to previous offline packages but at a higher quality.)

 Google Translate improves offline translation quality by up to 20%

Translate’s online translations have been powered by neural machine translation (NMT) since 2016, and a few algorithms made their way to mobile platforms in 2018. Rather than the previous machine learning approach that provided interpretation by scanning phrases of a sentence, offline translations with NMT analyze entire chunks of text at once, allowing for more natural-sounding, grammatically sound, context-aware translations. Like the previously used phrase-based machine learning approach, NMT leverages hundreds of millions of example translations of things like articles, books, documents, and search results. But while there shouldn’t be a noticeable difference in quality from previous offline translations, Google notes that online translations will still be more accurate than offline translations, as the company made concessions to reduce the size of language packages.

The Translate improvements come after the rollout of Google Assistant’s interpreter mode to Android devices, which enables Assistant to recognize and speak up to two languages at once. (Interestingly, Google says the same technology that underpins Assistant’s bilingual capabilities powers interpreter mode.) Google’s conversational AI more recently became conversant in over 30 languages in 80 countries (up from 8 languages and 14 countries in 2017), and it added nine new AI-generated variants of English voice tailored to English speakers in the U.K., India, France, Germany, Japan, Netherlands, Norway, Korea, and Italy. Perhaps more significantly, a new and improved version of Assistant launched on the Pixel 4 in early fall. Its English model works offline and processes speech at “nearly zero” latency, delivering answers up to 10 times faster than on previous-generation devices.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

The Post-Digital Age: Spending Quality Time With Customers

October 17, 2019   SAP
 The Post Digital Age: Spending Quality Time With Customers

The central idea of “post-digital” is that digital technology is being embedded so deeply in everything – work, play, industry, day-to-day life – that it’s becoming almost invisible. We stop seeing it. And we stop thinking of it as something special or interesting. It’s just a baseline we expect every product or service to meet, as Accenture’s Technology Vision for SAP Solutions 2019 explores.

One of the places where I think this is having the biggest impact is in the way brands and customers relate to each other. In the post-digital world, it’s about bringing all the data together to create strong emotional experiences and support deeper, closer, higher-quality, and more valuable relationships.

From the head to the heart

For years, we’ve all been talking about knowing the customer better, developing a 360-degree view of who they are, what they’ve bought in the past, and what they might want to buy in the future. Some have gone further and started looking at external data sources like social media to understand their customers a little better.

These are all important and necessary steps. But as we move into the post-digital world, all they do is buy you a seat at the table. To really differentiate your brand and win over customers, you now need to go much further.

You need to build an individual, personal relationship with each customer that doesn’t just pay lip service to their unique digital identity, but responds to it consistently and meaningfully when it really matters.

To do that, you have to truly understand who that person is and what drives them on an emotional level. And traditional KPI-driven analytics simply can’t deliver that level of sophistication. Instead, you need a much more holistic approach that analyzes the entirety of the customer’s digital footprint and how it evolves over time.

This is where your company can really benefit from combining operational data with experience data from multiple sources and channels using experience management software. This combination allows you to better “sense” your customers and respond in real time.

Building a “living” brand

When you can do that, you can start making real headway in post-digital customer relationships. One of the stories I like to share that illustrates what’s possible is about a young lingerie business based in France.

This business started small, playing up its local manufacturing credentials and using analytics and digital tech together with smart and humorous advertising to quickly build up a sizeable customer base.

So far, so digital. But it’s what it did next that really highlights its post-digital mentality. The company has developed a deep relationship with its customers, bringing them into the heart of the business, allowing them to devise and advise on marketing campaigns and testing new product and service ideas with them through the customer ecosystem.

By building this closely knit community, this brand is redefining customer loyalty for the post-digital age. It recognizes that customers aren’t really loyal to brands anymore; they’re loyal to experience. They know customers want to be intimately involved in developing products that they really care about. And they see that experiences are now just as important because they speak to the heart as well as the mind.

Connected, post-digital commerce

So, yes, you need a website that’s easy to navigate and looks great. You need omnichannel sales and distribution. You need to be able to support post-sale services. And you need to use an integrated platform to bring all these capabilities together and tie them into the enterprise core.

But to be truly post-digital, you also need to create experiences around your products that genuinely reflect the emotional needs of your customers and can adapt and respond to changes in those needs in a much more agile and effective way. That’s where experience management solutions are essential.

Other post-digital technologies like artificial intelligence are just as important. AI means companies can automate some aspects of customer service, offering fast and friction-free interactions through chatbots. It can also be used to design more effective customer portals and support smarter post-sales interactions with predictive analytics.

Every moment matters

A key point to remember in all this: Never underestimate the value of any customer interaction, no matter how incidental or even difficult it may be. These are all opportunities to build the relationship further.

Equipped with the right tools, the right data insights, and enough time, customer service agents, support technicians, and any other employees in direct contact with customers can relate to each customer on a more personal level. They can understand what the customer really wants and needs in that situation, deepen the connection, and ultimately increase the value to the business.

So that’s why genuine personalization is so important to customer experience in the post-digital age.

This article originally appeared on the Accenture Technology Innovation blog and is republished permission. Accenture is an SAP global partner.

Let’s block ads! (Why?)

Digitalist Magazine

Read More

Apple Digital Masters want to help you stream studio quality audio

August 21, 2019   Humor
148903 apps news apple digital masters wants to help you stream studio quality audio image1 umehl39awa Apple Digital Masters want to help you stream studio quality audio

One of the huge reactions of music administrations is quality. Particularly for the individuals who need to tune in on quality Hi-Fi gear at home. Apple is quick to ensure you’re getting the best music you can with the Apple Digital Masters program.

Supplanting what was recently known as Mastered for iTunes. Apple Digital Masters expects to remaster the first craftsman accounts with the goal. That audience members show signs of improvement experience from their music.

Apple says that it’s Digital Masters are “practically unclear” from those unique bosses. So, Giving a scope of acing instruments for makers, craftsmen, and marks to enable them to put the best quality they can into Apple’s biological system.

Apple Digital Masters

Apple Digital have gradually been taking off, supplanting tracks in Apple’s advanced index with new 24-bit AAC records. These new records ought to have higher loyalty, with less commotion that may have been presented through a lower quality encoding process, so they sound better when they get to your ears.

There’s a great deal of data on how the procedure functions directly here (pdf), so in case you’re an audiophile or music lover, you should investigate what’s happening.

Apple says there’s no extra expense for Apple Digital Master, it’s simply part of the way toward improving the nature of the music that Apple offers from Apple Music spilling and iTunes downloads.

Apple Digital Masters Wants To Help You Stream Studio Quality Audio picture 2

“The sound quality is mind-blowing! The piano is the hardest instrument to get right and this sounds stunning,” said Lang. And you’ll see that Piano Book is an Apple Digital Master, so an extraordinary method to encounter the quality yourself.

You’ll additionally discover the Apple Digital Masters logo showing up on Apple gadgets in different spots. We’ve seen it in iTunes against certain collections on the Mac and in the Music application on the Catalina beta. For instance, so it ought to be simpler to distinguish higher quality music – and there’s a considerable amount of it out there.

Let’s block ads! (Why?)

Title

Read More

Data Integrity vs. Data Quality: How Are They Different?

January 19, 2019   Big Data
Data Integrity vs. Data Quality How Are They Different Data Integrity vs. Data Quality: How Are They Different?

Syncsort Editors

January 16, 2019

Data is incredibly valuable, but that doesn’t mean it’s always an asset. When companies work with data that is substandard for any reason, it delivers incorrect insights, skewed analysis, and reckless recommendations.

Two terms describe the condition of data: Data integrity and data quality. These two terms are often used interchangeably, but there are important distinctions. Any company working to maximize the utility and value of data needs to understand the difference:

Data Quality

This is the broader of the two terms. Data can have integrity but not have quality. To understand why, consider what defines data quality:

  • Completeness –  The data present is a large percentage of the total amount of data needed.
  • Uniqueness –  Unique data sets are free of redundant or extraneous entries.
  • Validity – Data conforms to the syntax and structure defined by the business requirements.
  • Timeliness –  Data is up to date or relevant to the timelines referenced by the business requirements.
  • Accuracy –  Data accurately describes the real-world context it refers to.
  • Consistency –  Data is consistent throughout the dataset.

Quality data must meet all these criteria. If it is lacking in just one way, it could compromise any data-driven initiative.

Data Integrity

Data quality, overall, really refers to whether data is useful. Data integrity, by contrast, refers to whether data is trustworthy. Obviously, it must be trustworthy to be useful, but just because it is trustworthy does not mean it’s also useful. Data integrity is judged based on these variables:

  • Physical Integrity – Utilizing data is difficult when there are problems either storing or retrieving it. It raises questions about the data’s completeness, accuracy, and validity. The physical integrity of data hardware can become compromised due to age, malfunction, maintenance problems, natural disasters, or power outages.
  • Logical Integrity – Data becomes illogical when it is incorrect or irrational in some way. Data has to “make sense” for its context. Otherwise, it distorts the perspective of anyone basing decisions off the data. Logical problems can happen because of design flaws, human errors, or software bugs.


bigstock mail 728854 600x Data Integrity vs. Data Quality: How Are They Different?

A Simple Example About Quality and Integrity

In 1993, the United States Postal Service discovered that 23% of all mail is incorrectly addressed. That figure is surprising considering that addresses are fairly simple pieces of data that are easy to store, reference, or remember. Widespread address mistakes are a good example of how easily data can suffer from quality or integrity problems.

If the street number in an address if off by just one digit, the mail would not arrive at the right house. This is an example of poor data integrity because the data itself is invalid and untrustworthy. Now think about a person who has moved but not updated their address. The mail would arrive at the right house but not to the right recipient. This is an example of poor data quality because the data lacks timeliness, accuracy, and consistency.

Data integrity and quality issues are inevitable, but they are not unresolvable. Companies that make a proactive effort to fix existing data issues and prevent future ones see better outcomes from all their data-driven initiatives. To explore this issue further, read our eBook: 4 Ways to Measure Data Quality.

Let’s block ads! (Why?)

Syncsort Blog

Read More
« Older posts
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited