• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: privacy

Data privacy startup Mine raises $9.5 million and expands to the U.S.

October 21, 2020   Big Data
 Data privacy startup Mine raises $9.5 million and expands to the U.S.

The audio problem

Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences.

Access here

Mine, which is developing a platform that helps discover and manage personal data on the web, today closed a $ 9.5 million funding round led by Gradient Ventures, Google’s AI-focused venture firm. The cash infusion comes as Mine prepares to launch in the U.S. following debuts in Israel and Europe earlier in the year. According to a spokesperson, 100,000 people have used Mine’s app to send over 1.3 million data-reclaim requests to more than 150,000 companies to date.

A majority of people in the U.S. (60%) believe their online and offline activities are being tracked and monitored by companies and the government with some regularity, according to a Pew Research Center survey. It’s a sentiment reflected in reports like PricewaterhouseCoopers’ Consumer Intelligence Series, which found that only 10% of consumers felt they had complete control over their personal information.

Mine’s technology discovers companies that are holding data by analyzing email subject lines, as opposed to the actual content of emails (e.g., “Welcome to Facebook” or “Your Amazon Receipt”). Once connected with an inbox, Mine maps all companies a user has interacted with via email, generating visualizations that show the type of data collected and the corresponding risk level. A dashboard provides a detailed overview of this data, along with shortcuts to remove the data from services no longer in use.

Mine’s algorithms identify companies most likely to have collected personal data based on the sender and the number of times they appear in an inbox. (According to Mine, almost 90% of traces can be found through email subject lines.) The platform independently locates and scans these companies’ privacy policies and data structures to determine what kinds of information they’re collecting from users.

An initial scan takes about 30 seconds, after which the system continues in the background. Mine sends a notification and an email once the audit is completed, and the app continuously provides updates about new traces of companies in a footprint. Mine claims users typically have a 40% cleaner footprint by the end of a session. The company also claims 15,000 Mine users have narrowly avoided data breaches since 2018 by deleting their data right before the breach occurred.

Mine, which has offices in Tel Aviv and New York, is free to use in 10 languages for Gmail, Yahoo, and Microsoft Outlook accounts, though the company plans to eventually charge a subscription fee. For now, it will focus on products that enable businesses to automatically process “right-to-be-forgotten” requests, helping people demand the deletion of certain data pertaining to them. In May 2018, the European Union’s General Data Protection Regulation (GDPR) codified right-to-be-forgotten protections into law. This was followed in January 2020 by the California Consumer Privacy Act (CCPA), which similarly requires that people be able to access and delete certain data.

“Having been part of the tech world from different angles, I noticed how people, myself included, started to lose trust in how companies are collecting and treating personal data. My cofounders and I created Mine to do what we know best — use technology as a solution to shape a new future of equal choice and control to the internet,” CEO Gal Ringel told VenureBeat via email. “Our smart non-intrusive AI makes it easy for people to take back ownership of their online data and minimize digital risks without changing their online behavior.”

Mine’s Gradient Ventures-led round included participation from E.ventures, MassMutual Ventures, and existing investors Battery Ventures and Saban Ventures. It brings the company’s total raised to over $ 12.5 million.

Sign up for Funding Weekly to start your week with VB’s top funding stories.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

The Social Impact of Artificial Intelligence and Data Privacy Issues

September 8, 2020   BI News and Info

The current era of AI and Big Data is already considered by many as the start of the fourth industrial revolution that will reshape the world in years to come. Google searches, Map navigation, voice assistants like Alexa, personalized recommendations on portals like Facebook, Netflix, Amazon, and YouTube are just a few examples where artificial intelligence is already playing an important role in our day to day lives. Perhaps people do not even realize this. In fact, a report suggests that the AI market will reach a whopping $ 169.41 Billion by the year 2025.

But there is a negative aspect of AI as well, which poses great privacy and social risk. The risk is associated with how some organizations are collecting and processing a vast amount of user data in their AI-based system without their knowledge or consent, which can lead to concerning social consequences.

word image 27 The Social Impact of Artificial Intelligence and Data Privacy Issues

(Source)

Is your Data Private?

Every time you go to the internet for searches, browsing websites, or when you use mobile apps, you do not even realize that you are giving away your data either explicitly or without your knowledge. And most of the time you allow these companies to collect and process your data legally since you would have clicked “I agree” button of terms and conditions of using their services.

Apart from your information that you explicitly submit to your websites like Name, Age, Emails, Contacts, Videos, or Photo uploads, you also allow them to collect your browsing behavior, clicks, likes and dislikes. Reputed companies like Google and Facebook use this data to improve their services and do not sell this to anyone. Still, there have been instances where third party companies have scrapped sensitive user data by loopholes or data breaches. In fact, the sole intention of many companies is to collect user data by luring them into using their online services, and they sell this data for vast amounts of money to third parties.

The situation has worsened with the surge in malicious mobile apps whose primary purpose is to collect even that data from the phone for which it did not seek permission. These are primarily data collection apps disguised as a game or entertainment app. In today’s world, smartphones contain very sensitive data like personal images, videos, GPS location, call history, messages, etc., and we do not even know that our data is getting stolen by these mobile apps. Every now and then, such malware apps are removed from Play Store and Apple Store but not before they have already been download millions of times.

Why are Artificial Intelligence and Data Privacy a Big Concern?

People are becoming increasingly aware that their data is not safe online. However, most of them still do not realize the gravity of the situation when AI-based systems process their social data in an unethical manner.

Let us go through some well-known incidents to understand what type of risk we are talking about.

The Cambridge Analytica-Facebook Scandal

word image 28 The Social Impact of Artificial Intelligence and Data Privacy Issues

(Source)

In 2018, news broke out that a data analytics firm Cambridge Analytica had analyzed the psychological and social behavior of users through their Facebook likes and targeted them with Ad campaigns for the 2016 US Presidential election.

The issue was that FB does not sell its users’ data for such purpose. It was revealed that a developer created a FB quiz app that utilized the loophole in a FB API to collect data of users and their friends as well. He then sold it to the Cambridge Analytica firm who was accused of playing an important role in the outcome of the 2016 US Presidential elections, by unethically obtaining and mining users’ data from Facebook. And the worst thing is that users who had used the quiz app would have blindly given all permission to the app without knowing they are exposing their friends’ data also.

Clearview Face Recognition Scandal

word image 29 The Social Impact of Artificial Intelligence and Data Privacy Issues

(Source)

Clearview is an artificial intelligence company that created a face recognition system to help police officers identify criminals. They claim that their software has helped law enforcement agencies to track down many pedophiles, terrorists, and sex traffickers.

But in January 2020, The New York Times covered a long story about how Clearview made the mockery of data privacy by scraping around three billion photos of users from social media platforms like Facebook, YouTube, Twitter, Instagram, and Venmo to create the AI system. Its CEO, Mr. Ton-That, claimed that its system only scraped public images from these platforms. Shockingly in an interview with CNN Business, the software fetched the images from the Instagram account of the show’s producer.

Google, Facebook, YouTube, and Twitter sent cease and desist letters to prevent Clearview from scrapping photos from their platform. However, the images that you uploaded online may be included in that AI software without your knowledge. If this software gets into the hands of a rogue police officer, or if the system itself gives false positive, many innocent people might fall prey to police investigations.

DeepFakes

word image 30 The Social Impact of Artificial Intelligence and Data Privacy Issues

DeepFake puts the face of a person to another’s body (Source)

Images and videos that are created using deep learning and contain a real person acting or saying things they didn’t do or say are called Deepfakes. If used for entertainment purposes, Deepfakes are fun, but people are creating Deepfakes for fake news and information and, worse, Deepfake porn.

In 2019, an application called DeepNude was launched where you could upload any image of women, and it would generate a real-like nude image from it. It is quite disturbing that anyone could exploit woman images available online by creating nude images from DeepNude. After too much controversy, it was shut down. Still, it is just a matter of time that someone can again misuse Deepfake technologies by taking your publicly available videos or photos.

word image 31 The Social Impact of Artificial Intelligence and Data Privacy Issues

word image 32 The Social Impact of Artificial Intelligence and Data Privacy Issues

(Source)

Mass Surveillance in China

word image 33 The Social Impact of Artificial Intelligence and Data Privacy Issues

(Source)

In recent years, China has received severe criticism due to mass surveillance of its people without their consent. They use over 200 million surveillance cameras and facial recognition to keep a constant watch on their people. China also mines their behavioral data captured on the cameras.

To make it worse, China implemented a social credit system to rate the trustworthiness of its citizens and give them ratings accordingly based on their surveillance. People with high credit get more benefits and low credits loose benefits. But the worse part is that all this is being determined by AI-based surveillance without people’s knowledge and consent.

How to prevent misuse of data with AI

The above case studies should have made it clear that the unethical processing of private data with artificial intelligence can lead to very dangerous social consequences. Let us see how we can play our part to stop this malpractice of private data misuse with artificial intelligence.

Government Responsibility

Now many countries have created their own data regulation policies to bring more transparency between these online platforms and the users. Most of these policies are centered around giving users more authority in what data they can share and be informed about how the platform would process their data. A very well known example of this is the GDPR law that came into existence a couple of years back for the EU countries. It gives EU people more control of their personal data and how it is processed by the companies.

Company Responsibility

Large companies like Google, Facebook, Amazon, Twitter, YouTube, Instagram, and LinkedIn literally own a majority of users’ social data across the world. Being, reputed giants, they should be extra careful not to leak any data to malicious people either intentionally or unintentionally.

AI Community Responsibility

The people of AI communities, especially the thought leaders, should raise their voices against the unethical use of AI on the personal data of the users without their knowledge. And also they should educate the world that this practice can lead to such a disastrous social impact. Already many institutes are teaching AI ethics as a subject and also offering it as course.

User’s Responsibility

Finally, we should remember that, despite government regulations, these are just policies and the responsibility lies with us as individuals. We have to be careful about what data we are uploading on social platforms and mobile apps and always inspect what permissions we are giving them to access and process our data, let us not merely “accept” anything blindly in “terms and conditions” that comes our way on these online platforms.

Conclusion

There are many concerns around the ethics in AI within the artificial intelligence community due to the social biases and the prejudices it can create. But processing personal data with AI without people’s consent and further misusing it raises the concerns of AI ethics to the next level. AI emergence in the real world is still in nascent age, and we all have to step up to make sure that our creating AI by misusing personal information should not become a regular occurrence in the future.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

Facebook open-sources Opacus, a PyTorch library for differential privacy

August 31, 2020   Big Data
 Facebook open sources Opacus, a PyTorch library for differential privacy

Automation and Jobs

Read our latest special issue.

Open Now

Facebook today open-sourced Opacus, a library for training PyTorch models with differential privacy that’s ostensibly more scalable than existing methods. With the release of Opacus, Facebook says it hopes to provide an easier path for engineers to adopt differential privacy in AI and to accelerate in-the-field differential privacy research.

Typically, differential privacy entails injecting a small amount of noise into the raw data before feeding it into a local machine learning model, thus making it difficult for malicious actors to extract the original files from the trained model. An algorithm can be considered differentially private if an observer seeing its output cannot tell if it used a particular individual’s information in the computation.

“Our goal with Opacus is to preserve the privacy of each training sample while limiting the impact on the accuracy of the final model. Opacus does this by modifying a standard PyTorch optimizer in order to enforce (and measure) differential privacy during training. More specifically, our approach is centered on differentially private stochastic gradient descent,” Facebook explained in a blog post. “The core idea behind this algorithm is that we can protect the privacy of a training dataset by intervening on the parameter gradients that the model uses to update its weights, rather than the data directly.”

Opacus uniquely leverages hooks in PyTorch to achieve an “order of magnitude” speedup compared with existing libraries, according to Facebook. Moreover, it keeps track of how much of the “privacy budget” — a core mathematical concept in differential privacy — has been spent at any given point in time to enable real-time monitoring.

Opacus also employs a cryptographically safe, pseudo-random, GPU-accelerated number generator for security-critical code, and it ships with tutorials and helper functions that warn about incompatible components. The library works behind the scenes with PyTorch, Facebook says, producing standard AI models that can be deployed as usual without extra steps.

“We hope that by developing PyTorch tools like Opacus, we’re democratizing access to such privacy-preserving resources,” Facebook wrote. “We’re bridging the divide between the security community and general machine learning engineers with a faster, more flexible platform using PyTorch.”

The release of Opacus follows Google’s decision to open-source the differential privacy library used in some its core products, such as Google Maps, as well as an experimental module for TensorFlow Privacy that enables assessments of the privacy properties of various machine learning classifiers. More recently, Microsoft released WhiteNoise, a platform-agnostic toolkit for differential privacy in Azure and in open source on GitHub.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

The end of Privacy Shield: Why it matters and what businesses can do about it

August 28, 2020   Big Data
 The end of Privacy Shield: Why it matters and what businesses can do about it

Automation and Jobs

Read our latest special issue.

Open Now

The rules that facilitate much of the digital commerce between the EU and US have been thrown into a state of flux in recent weeks. Last month, the Court of Justice of the European Union (CJEU) passed a landmark judgement to invalidate the Privacy Shield, a framework governing the flow of EU citizens’ personal data into US companies. Then, just last week, Austrian privacy advocate Max Schrems, who brought the initial case to the CJEU, filed fresh complaints against 101 companies that he alleges are failing to provide adequate protection to the data of EU citizens, in spite of the CJEU’s landmark judgement.

What does all this mean in practice? The Privacy Shield allowed US companies to self-certify that they would adhere to loftier data principles than those required of them at home, allowing for the transfer of personal data from the EU to the US. More than 5,000 organizations relied on the arrangement, and the freedom to move data between markets that it gave them has been critical to businesses’ ability to sell physical and digital goods and services to customers in Europe: activities that make up a large part of the $ 7 trillion in transatlantic trade conducted annually. The CJEU’s initial decision left businesses in the US and the EU in a precarious position and cast doubt over their ability to trade seamlessly.

A turning point?

The CJEU’s move to invalidate the Privacy Shield has not, yet, meant that businesses are prohibited from moving EU data to the US. For the moment at least, businesses can rely on what are known as the Standard Contractual Clauses (SCCs) as a valid means of transfer (and in some instances, Binding Corporate Rules, although these are less common). These are a special set of terms designed to guarantee data privacy standards. SCCs are common, so many businesses have been able to continue as they had before.

However, the complaints that Schrems filed last week seek to remove this option for businesses. The complaints against 101 companies, including the likes of Airbnb and the Huffington Post, argue that SCCs do not provide adequate protection for EU personal data because US companies fall under US surveillance laws.

The 2013 Snowden leaks illustrated the extent to which US security agencies had been making use of personal data stored by companies. The ECJ determined that the Privacy Shield was an inadequate mechanism to protect data on EU residents from US surveillance programs — and Schrems argues that SCCs are no better.

With significant reform to US surveillance law unlikely in the near future, companies are being left in an awkward predicament. It is suddenly becoming less viable to rely on SCCs to move data, and businesses are supposed to carry out a comprehensive analysis of local laws and, if necessary, use supplementary measures to protect personal information. We await further guidance from the key regulatory and political stakeholders in this regard.

A patchwork agreement for a Privacy Shield replacement could follow, but there is a real possibility that we could reach a point where data can no longer move freely from the EU to the US. This could lead to a requirement that all data on EU citizens is stored within the EU. This could dramatically limit US providers’ ability to access and process this data and the range of digital services available to EU citizens.

A key issue in Brexit negotiations

The ECJ’s decision on the Privacy Shield may also have a big impact on Brexit, with just a few months remaining for the UK and EU to ratify the terms of a post-Brexit trade deal. Sadly, the issues of data rights and privacy frameworks have not been a major talking point in negotiations thus far, with hot button political issues such as fishing rights seemingly taking priority — despite the huge economic impact that a failure to reach an agreement on data flows would bring. Whatever the outcome, the EU will need to make a decision on the UK’s “data adequacy,” meaning the extent to which UK law protects personal data in comparison with the EU’s own General Data Protection Regulation (GDPR).

The ECJ’s decision on the Privacy Shield was an indication of the level of scrutiny the EU will employ in assessing the UK. In the meantime, the UK needs to decide whether to align itself more with the EU or the US. Will it make it more difficult for companies to export data from the UK, as the EU has? Or will it favor a closer relationship with the US and risk facing the same kind of regulatory uncertainty that the US is now experiencing?

This decision will have a huge impact on the way British businesses operate internationally and how international businesses operate in the UK. If a data adequacy agreement is not reached, the system that allows the free flow of personal data between the EU and the UK could be uprooted. And if one is reached, it could have an impact on a possible free trade deal between the UK and US.

Reacting in the face of uncertainty

So, whether you’re a UK business facing the unpredictability of the Brexit negotiations, or a US company worrying about the future of data flows from the EU, what can you do now to prepare for the changes that are coming? As always, it starts by getting the basics in place. Here are four steps any organization can take to ensure they can adapt quickly and effectively to any regulatory outcome:

  • Understand how you use data: If they are to react quickly, businesses have to know exactly what data they are using, where it came from, and how it is moving through their organization. This should be a continual undertaking, but right now too many companies don’t have a clear understanding of these issues.
  • Think long-term: With so much uncertainty, businesses must factor in potential data compliance requirements into their growth strategies. The privacy regime operating in each region must be a key consideration for any business planning to expand into new markets. Carefully evaluate data regulations when considering where to invest for growth and budget accordingly so you know that you’ll be able to comply with all local regulations.
  • Stay agile: Wherever they are headquartered, it is critical that startups and digital businesses are monitoring developments in the EU-US and the EU-UK negotiations. Progress won’t be steady: nothing could change for a while, and then it will all move very quickly. Make sure someone in the organization is responsible for keeping a close eye on the latest news and flagging anything important.
  • Communicate! Consumers are increasingly aware of how their data is being handled by businesses. Transparency is therefore crucial to building and maintaining trusted relationships. Be proactive about keeping customers informed about your policies and day-to-day operations. You should consider publishing your law enforcement guidelines and transparency reports to make it clear how your organization interacts with data requests from government agencies.

Mark Kahn is General Counsel at customer data platform Segment.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Storage 101: Data Security and Privacy

July 23, 2020   BI News and Info

The series so far:

  1. Storage 101: Welcome to the Wonderful World of Storage
  2. Storage 101: The Language of Storage
  3. Storage 101: Understanding the Hard-Disk Drive 
  4. Storage 101: Understanding the NAND Flash Solid State Drive
  5. Storage 101: Data Center Storage Configurations
  6. Storage 101: Modern Storage Technologies
  7. Storage 101: Convergence and Composability 
  8. Storage 101: Cloud Storage
  9. Storage 101: Data Security and Privacy 

Most discussions around storage inevitably lead to the topics of data security and privacy. You cannot operate in today’s climate without careful attention to both. If data protection is not built into your storage infrastructure, you’re doing something wrong.

Data protection is an ongoing, organization-wide effort in which storage plays a key role. A secure storage infrastructure is essential to safeguarding sensitive information. Even so, it takes more than secure storage to guarantee the data’s safekeeping throughout its lifespan. For that, an organization needs a comprehensive data protection strategy that takes into account all aspects of data management, including how data is stored.

Securing Data and Protecting Privacy

For many organizations, their most important asset is their data, the bulk of which must be protected against unauthorized access. The data might include intellectual property, legal documents, passwords, encryption keys, personally identifiable information (PII), or a variety of other sensitive material.

An organization that handles sensitive data should have a comprehensive data protection strategy in place to contend with potential threats. Unfortunately, the exact meaning of data protection is not always clearcut and can vary depending on usage and circumstances. It might refer to securing data, safeguarding privacy, protecting storage systems, implementing disaster recovery (DR), or any combination of these.

According to the SNIA (formerly the Storage Networking Industry Association), data protection is the “assurance that data is not corrupted, is accessible for authorized purposes only, and is in compliance with applicable requirements.” In other words, data protection goes beyond just encrypting data or guaranteeing its availability. Data protection ensures that the data remains viable, is safeguarded against all unauthorized access at all times, and is controlled in a way that adheres to applicable compliance laws and regulations, e.g., local, provincial, and federal.

In this view of data protection, storage security is only part of a larger effort to keep sensitive data out of the wrong hands, while ensuring its accuracy and availability to authorized users. To this end, you’ll sometimes see storage security described in terms of confidentiality, integrity, and availability—or CIA—which go hand-in-hand with the larger goal of data protection.

A comprehensive data protection strategy ensures both data security and data privacy. Although the two are related, they’re not the same. Data security protects sensitive information from unauthorized access and from loss and corruption, whether intentional or accidental. Data privacy refers to the appropriate handling of PII and the rights of individuals to control and access their personal information.

With the increasing number of regulations that govern PII, organizations are under greater pressure than ever to protect confidential information and provide a full accounting of how it’s managed. Regulations can vary from region to region and differ significantly. Many organizations operate across multiple regions, making them subject to a mix of governing laws. The General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), and California Consumer Privacy Act (CCPA) are only some of the regulations that organizations now face. Even if an organization operates in only one region, they can still be subject to a confusing mix of laws.

Despite the differences between regulations, however, they all have one thing in common: to implement strict security controls that ensure personal information cannot be compromised when under the organization’s control. Most also define other obligations, such as retention or auditing requirements, but data protection lies at the heart of each one, which is why data security and privacy are intrinsically linked.

The Cybersecurity Threat Landscape

Data is not only an asset. It’s a burden. A data breach can lead to lost revenue, stiff penalties, downtime, legal liabilities, loss of intellectual property, unexpected expenses, and a tarnished reputation from which a company might never recover. No organization is immune to the potential threats that await them, from both inside and outside their domains.

External threats can come from governments, organized crime, terrorists, cybercriminals, competitors, or everyday hackers looking for a bit of sport or profit. And threats can arrive in many forms, often implemented through social engineering strategies that attempt to introduce malware or ransomware or steal user credentials.

word image 48 Storage 101: Data Security and Privacy

Figure 1. The challenges of data security (image by madartzgraphics)

Attackers might also go after an organization’s networks or systems directly, leveraging vulnerabilities to carry out SQL injections, denial-of-service attacks, or other nefarious acts in an attempt to steal data or bring down operations. Their reasons for attacking an organization can vary. They might want to damage the organization’s credibility, acquire sensitive information for competitive advantages, access government secrets, or make money by selling stolen data or locking up systems until a ransom is paid.

Many organizations and even people have fallen victim to cybercrime, ranging from individuals, household, and small municipalities to multinational corporations, including Twitter, Facebook, Yahoo, Equifax, eBay, LinkedIn, and Marriot International. And the threats don’t only come from external players. Organizations must also guard against internal threats, whether from disgruntled or greedy employees, malicious insiders, or careless staff falling victim to weak IT security policies. Data compromised as a result of internal behavior can be just as devastating as an external attack.

Many organizations are also turning to the cloud to store data and support their workloads. Although cloud platforms can often be more secure than an organization’s own environment, they also add storage and data complexity, while increasing data exposure. An organization must rely completely on the provider to ensure that data is being protected from internal and external threats. At the same time, the cloud raises compliance concerns, especially when spanning multiple regions.

Organizations must also contend with the increased risks that come with a remote workforce, whose numbers have grown dramatically with COVID-19. The more people working offsite, the more difficult it becomes to ensure that sensitive data is not being exposed when it is transferred or stored. A home worker might use an unsanctioned cloud service, mislay a peripheral device that contains business data, collaborate on a project through an unsecure platform, or engage in other risky behavior. Even under the best circumstances, few home offices can achieve the same level of physical security you get in a data center.

Implementing a Data Protection Strategy

To ensure data security and privacy, you need a comprehensive plan that specifies how data will be protected both at rest and in motion. As part of this process, you should develop policies that define where data can be stored, who can access it, and what levels of protection the data requires. The policies should also address such issues as when data is deleted, what happens when an employee is terminated, how to handle a data breach and any other issues related to data protection.

Another important part of the planning process is to conduct a thorough assessment of your current data environment to identify potential risks and the steps that must be taken to mitigate those risks. You need to know where sensitive data is located, how it’s being used, and who can access it. You should also look for issues such as whether sensitive data is being transmitted as cleartext, credentials are being sent in an unencrypted format, or users are accessing internal web services via insecure HTTP.

From this assessment, you’ll have a good sense of what data you have and where it’s located. You can then classify the data based on security and compliance requirements. This will help you determine what levels of access to assign to each data type, as reflected in your security policies. Public data, for example, requires far less security than data covered by HIPAA or the GDPR or data governed by national security laws.

If your organization is subject to multiple regulations, you might consider a protection-by-default approach for personal data, rather than trying to create too many data classifications. For example, one regulation might require that you protect user IP addresses, while another does not. It might be better to create a single category that covers both. Too many data categories, which can complicate data management, may lead to a greater risk for regulatory violations.

A simpler category structure can also make it easier to address other compliance-related issues, such as providing users with visibility into their PII or supporting their deletion requests. At the same time, you must still take into account issues such as data retention and disposal requirements, which might force you to create additional categories.

Another part of the planning process is to ensure that you have the tools you need to safeguard your systems and their data. For example, you might implement a data loss prevention (DLP) solution to help automatically discover, monitor, and protect sensitive data. You might benefit from an intrusion detection system (IDS) that identifies traffic anomalies and warns you if something doesn’t look right.

Essential tools for protecting your data include anti-malware, anti-ransomware, and anti-spyware, as well as protections such as firewalls and proxy servers. And, of course, you want to be sure you deploy the proper storage protections. For example, you might implement RAID and other redundancies to provide storage fault tolerance, which can help protect against intentional or unintentional data destruction.

There are plenty of other tools as well. Just remember that no one solution can address all your data protection requirements, and you’ll have to come up with just the right mix to meet your specific needs.

Protecting Data and Privacy

Data protection must take into account both physical and operational security. Physical security ensures that unauthorized individuals cannot access the physical structures where the data is housed or the equipment within those structures. It also protects against circumstances that could lead to data loss, such as power failures or natural disasters. To implement physical security, an organization might employ backup and restore protocols, CCTV monitoring, biometric readers, geofencing, backup generators, and numerous other protections.

Organizations must also protect the individual systems within their secure structures, such as servers or workstations. No one on the inside should be able to walk off with equipment or get at their internal workings unless they’re authorized to do so. IT teams must also take steps to protect portable devices that leave the premises, such as laptops, tablets, or cell phones. This typically means implementing a mobile device management strategy that supports such features as remote lock or remote wipe.

In addition to ensuring the physical security, organizations must implement operational protections, which provide the technical safeguards necessary to protect the data itself. This starts with using advanced algorithms to encrypt sensitive data both at rest and in motion. In addition, IT teams might consider such tools as tokenization or data masking for further protection. They should also have in place a system for securely storing and managing encryption keys.

Another important component of operational security is role-based access control, which determines who can and who cannot view or modify specific sets of data. Access should be based on the principle of least privilege, that is, individuals should be granted only the access they need to do their jobs—and no more. In conjunction with access control, IT should also implement such safeguards as multi-factor authentication or virtual private networks (VPNs), as appropriate, to further protect data access.

An effective data protection strategy also requires a comprehensive infrastructure for continuously monitoring sensitive data, issuing real-time alerts, and generating comprehensive reports on-demand. All data access and modifications should be logged, with an auditing system in place to determine who accessed what data and when that access took place.

Operational protections also include DR systems that ensure data can be made available in the event of data loss or corruption, no matter what the cause. At the same time, IT must be careful that their DR mechanisms, such as backups or replication, don’t violate applicable regulations. Also, they must ensure that PII can be accessed in a timely manner, if required by law, and that it adheres to retention and disposition rules.

The Ongoing Challenges of Data Protection

To implement effective data protections, an organization must take into account the entire data lifecycle, regardless of how the data is being used or where it resides—whether on a remote workstation, on a mobile device, in a data center, on a cloud platform, at a remote facility, or on a server in an office corner. Data protection must be a unified effort that moves beyond infrastructure boundaries to ensure that data is secure, and privacy is protected at all times and under all circumstances.

One of the most important tools that organizations have for protecting data is an effective training and education program that helps employees understand the risks involved with handling data and the steps they can take to minimize those risks. Everyone in an organization should have a clear understanding of that organization’s data usage policies and how best to protect sensitive data. All it takes is one careless act to create a data nightmare.

Data protection is an ongoing, all-encompassing process that extends from the backend storage systems to the smartphones that employees carry in their pockets. Storage security is an integral part of this process and can serve as your last line of defense against intrusion. That said, storage security cannot protect against all risks, just like a firewall alone can’t eliminate all network threats. Any place where data might reside or that provides a means for its access should be considered a potential risk and dealt with accordingly.

Let’s block ads! (Why?)

SQL – Simple Talk

Read More

Google releases experimental TensorFlow module that tests the privacy of AI models

June 24, 2020   Big Data
 Google releases experimental TensorFlow module that tests the privacy of AI models

Google today released an experimental module for TensorFlow Privacy, its privacy-preserving TensorFlow toolkit, that enables assessments of the privacy properties of various machine learning classifiers. The company says it’s intended to be the foundation of a privacy testing suite that can be used by any AI developer regardless of skill level.

The merits of various AI privacy techniques remain a topic of debate within the community. There are no canonical guidelines to produce a private model, but a growing body of research suggests AI models can leak sensitive information of training data sets, creating a privacy risk. The mitigation approach favored by TensorFlow privacy is differential privacy, which adds noise to hide individual examples in the training data. But this noise is designed for academic worst-case scenarios and can significantly affect model accuracy.

This motivated researchers at Google to pursue an alternative in membership inference attacks. The membership inference attacks method, which the new TensorFlow Privacy module supports, builds classifiers that infer whether a particular sample was present in the training data set. The more accurate the classifier is, the more memorization is present and thus the less privacy-preserving the model is — the intuition being that attackers who make predictions with high accuracy will succeed in figuring out which data was used in the training set.

The tests provided by the new module are black-box, meaning they only use the outputs of models rather than the internals (weights) or input samples. They produce a vulnerability score that determines whether the model leaks information from the training set, and they don’t require any retraining, making them relatively easy to perform.

VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free livestream.

“After using membership inference tests internally, we’re sharing them with developers to help them build more private models, explore better architecture choices, use regularization techniques such as early stopping, dropout, weight decay, and input augmentation, or collect more data,” Google Brain’s Shuang Song and Google software engineer David Marn wrote in a post on the TensorFlow blog. “Ultimately, these tests can help the developer community identify more architectures that incorporate privacy design principles and data processing choices.”

Google says that moving forward, it’ll explore the feasibility of extending membership inference attacks beyond classifiers and develop new tests. It also plans to explore adding the new test to the TensorFlow ecosystem by integrating it with TensorFlow Extended (TFX), an end-to-end platform for deploying production machine learning pipelines.

In related news, Google today added support for Go and Java to the foundational differential privacy library it open-sourced last summer. It also made available Privacy on Beam, an end-to-end differential privacy solution built on Apache Beam (a model and set of language-specific SDKs) that relies on the lower-level building blocks from the differential privacy library and combines them into an “out-of-the-box” solution that takes care of the steps essential to differential privacy. In addition, Google launched a new Privacy Loss Distribution tool for tracking privacy budgets that allows developers to maintain an estimate of the total cost to user privacy for collections of differentially private queries and to better evaluate the overall impact of their pipelines.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

ACLU sues facial recognition startup Clearview AI for privacy and safety violations

May 29, 2020   Big Data
 ACLU sues facial recognition startup Clearview AI for privacy and safety violations

The American Civil Liberties Union (ACLU) and a group of organizations filed a lawsuit against Clearview AI today in an Illinois court alleging privacy and safety violations and asserting that Clearview “will end privacy as we know it if it isn’t stopped.” According to ACLU vs Clearview AI court filings, the startup violated the Illinois Biometric Information Privacy Act, or BIPA, to create its facial recognition system.

“If allowed, Clearview will destroy our rights to anonymity and privacy — and the safety and security that both bring. People can change their names and addresses to shield their whereabouts and identities from individuals who seek to harm them, but they can’t change their faces,” the ACLU said in a statement accompanying the lawsuit.

The court filing goes on to call facial recognition in general technology that poses a severe risk to privacy and safety. The ACLU has been a direct supporter of a string of facial recognition bans put into effect last year in cities like San Francisco and Somerville, Massachusetts near Boston.

“The capture and storage of faceprints leaves people vulnerable to data breaches and identity theft,” the filing reads. “It can also lead to unwanted tracking and invasive surveillance by making it possible to instantaneously identify everyone at a protest or political rally, a house of worship, a domestic violence shelter, an Alcoholics Anonymous meeting, and more. And, because the common link is an individual’s face, a faceprint can also be used to aggregate countless additional facts about them, gathered from social media and professional profiles, photos posted by others, and government IDs.”

VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free livestream.

Clearview AI has been a target of privacy advocates since January when the New York Times reported a story with the headline “The Secretive Company That Might End Privacy As We Know It.” The story alleges that Clearview AI scraped more than 3 billion images from social media websites and the web without people’s consent to train its facial recognition system. Clearview has previously claimed its data gathering methods are protected by the First Amendment of the U.S. Constitution. A data breach in February also called Clearview AI’s data security practices into question, and company security practices can result in a BIPA violation.

Clearview AI has attracted controversy for reasons beyond data privacy. Huffington Post reporting in April revealed connections between CEO Hoan Ton-That and a number of white supremacists, far-right groups, and Trump-affiliated elites. Earlier this month, Clearview AI pledged to sell its tech exclusively to government and law enforcement agencies following BuzzFeed News reporting in February that found Clearview works with thousands of organizations around the world, including companies like Macy’s and Best Buy.

In other biometric data collection news, BIPA was also used to file a lawsuit last October against Amazon’s Alexa for recording voice data. Illinois lawmakers nearly passed a bill last year requiring consent before recording people who speak with devices like Echo smart speakers, but that effort seemed to fall apart following interference by the Internet Association, a group associated with tech giants like Amazon and Google.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Google’s federated analytics method could analyze end user data without invading privacy

May 28, 2020   Big Data
 Google’s federated analytics method could analyze end user data without invading privacy

In a blog post today, Google laid out the concept of federated analytics, a practice of applying data science methods to the analysis of raw data that’s stored locally on edge devices. As the tech giant explains, it works by running local computations over a device’s data and making only the aggregated results — not the data from the particular device — available to authorized engineers.

While federated analytics is closely related to federated learning, an AI technique that trains an algorithm across multiple devices holding local samples, it only supports basic data science needs. It’s “federated learning lite” — federated analytics enables companies to analyze user behaviors in a privacy-preserving and secure way, which could lead to better products. Google for its part uses federated techniques to power Gboard’s word suggestions and Android Messages’ Smart Reply feature.

“The first exploration into federated analytics was in support of federated learning: how can engineers measure the quality of federated learning models against real-world data when that data is not available in a data center? The answer was to re-use the federated learning infrastructure but without the learning part,” Google research scientist Daniel Ramage and software engineer Stefano Mazzocchi said in a statement. “In federated learning, the model definition can include not only the loss function that is to be optimized, but also code to compute metrics that indicate the quality of the model’s predictions. We could use this code to directly evaluate model quality on phones’ data.”

As an example, in a user study, Gboard engineers measured the overall quality of word prediction models against raw typing data held on phones. Participating phones downloaded a candidate model, locally computed a metric of how well the model’s predictions matched words that were actually typed, and then uploaded the metric without any adjustment to the model itself or any change to the Gboard typing experience. By averaging the metrics uploaded by many phones, engineers learned a population-level summary of model performance.

VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free livestream.

In a separate study, Gboard engineers wanted to discover words commonly typed by users and add them to dictionaries for spell-checking and typing suggestions. They trained a character-level recurrent neural network on phones, using only the words typed on these phones that weren’t already in the global dictionary. No typed words ever left the phones, but the resulting model could then be used in the datacenter to generate samples of frequently typed character sequences — i.e., the new words.

Beyond model evaluation, Google uses federated analytics to support the Now Playing feature on its Pixel phones, which shows what song might be playing nearby. Under the hood, Now Playing taps an on-device database of song fingerprints to identify music near a phone without the need for an active network connection.

When it recognizes a song, Now Playing records the track name into the on-device history, and when the phone is idle and charging while connected to Wi-Fi, Google’s federated learning and analytics server sometimes invites it to join a “round” of computation with hundreds of phones. Each phone in the round computes the recognition rate for the songs in its Now Playing history and uses a secure aggregation protocol to encrypt the results. The encrypted rates are sent to the federated analytics server, which doesn’t have the keys to decrypt them individually; when combined with the encrypted counts from the other phones in the round, the final tally of all song counts can be decrypted by the server.

The result enables Google’s engineers to improve the song database without any phone revealing which songs were heard, for example, by making sure the database contains truly popular songs. Google claims that in its first improvement iteration, federated analytics resulted in a 5% increase in overall song recognition across all Pixel phones globally.

“We are also developing techniques for answering even more ambiguous questions on decentralized datasets like ‘what patterns in the data are difficult for my model to recognize?’ by training federated generative models. And we’re exploring ways to apply user-level differentially private model training to further ensure that these models do not encode information unique to any one user,” wrote Ramage and Mazzocchi. “It’s still early days for the federated analytics approach and more progress is needed to answer many common data science questions with good accuracy … [B]ut federated analytics enables us to think about data science differently, with decentralized data and privacy-preserving aggregation in a central role.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Evervault raises $16 million for ‘privacy cages’ that simplify data protection

May 21, 2020   Big Data
 Evervault raises $16 million for ‘privacy cages’ that simplify data protection

Evervault has raised $ 16 million in venture capital to develop technology the Dublin-based company claims will recalibrate the traditional tradeoffs between privacy and regulation. Index Ventures led the round, which includes previous investors Sequoia Capital, Kleiner Perkins, and Frontline Ventures. In addition, the fundraising attracted money from a string of notable angel investors, such as Eventbrite cofounder Kevin Hartz and former Facebook chief security officer Alex Stamos.

The funding is notable because Evervault has not yet developed a product that’s ready for customers. Instead, investors are betting on a concept that promises to effectively eliminate the need for privacy regulations, a shift that would save businesses considerable expenses on compliance while delivering greater data protection for consumers.

“Data privacy is very complicated for people who don’t want to read thousands of pages of regulations,” said Evervault founder and CEO Shane Curran. “I started Evervault with the belief that data privacy is a technology problem rather than a regulatory problem.”

In Europe, the advent of Europe’s General Data Protection Regulation (GDPR) in May 2018 has sparked a wave of startups promising to help businesses comply with the rules. Curran said he first began working on the problem in high school and officially started the company in 2018.

VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free livestream.

He began by establishing a privacy manifesto that lays out eight rules any such system should follow. Primary among these is the idea that “data should always be caged.” That means any sensitive data leaving your device should be encrypted so that it can only be treated and processed in a “privacy cage.”

Curran described these cages as containers that allow for fully encrypted data to be processed so that only the owner of the original data can access the resulting information. The company processing the data would be excluded from accessing it directly and would really only have momentary possession of a user’s data to perform a promised service. Once that service is completed, the data would essentially vanish from the company’s system.

“Containers let you run computing over really sensitive data,” Curran said. “Companies can manage without them having to sacrifice privacy or security.”

Evervault plans to offer this service as an API, with the aim of allowing developers to seamlessly build it into the software stack of whatever they are creating. Curran likes to say that developers will be able to do more with one line of code than the entire set of GDPR regulations.

This would theoretically speed product development while also lifting the burden on companies who might otherwise spend huge resources ensuring they are complying with privacy regulations. Likewise, it would help regulators whose enforcement resources are stretched thin, Curran said.

Still, the investors are taking a leap of faith here. But Evervault’s eight employees are hard at work developing the technology and are currently working alongside partners in sectors with sensitive data issues, like finance.

Curran said he doesn’t have a firm timeline for an initial public release yet. “The roadmap is what I think they really bought into,” Curran said. “That, and the team.”

Sign up for Funding Weekly to start your week with VB’s top funding stories.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

AI Weekly: CDPA bill shows progress on coronavirus-tracking data privacy, but there’s still a ways to go

May 9, 2020   Big Data
 AI Weekly: CDPA bill shows progress on coronavirus tracking data privacy, but there’s still a ways to go

Contact tracing has quickly emerged as the go-to method of tracking the spread of the coronavirus among the general population, but there have been crucial questions around the most effective, ethical, and legal ways of doing so. New legislation introduced this week, the COVID-19 Consumer Data Protection Act (CDPA), seeks to enact legal guardrails around the collection and use of people’s data.

It’s a sign of progress that legislation is emerging around this issue, but it also highlights that there’s a way to go yet. The CDPA has some issues that privacy experts are concerned about, and the lack of any Democrat co-sponsors indicates a lack of bipartisan support. The Dems actually have their own version of this type of legislation, called the Consumer Online Privacy Rights Act (COPRA), which was introduced in December. Both bills emerged from the same committee — the Senate Committee on Commerce, Science, and Transportation — and so the lack of bipartisanship is especially notable.

The CDPA was introduced by Senators Roger Wicker (R-MS), John Thune (R-SD), Deb Fischer (R-NE), Jerry Moran (R-S), and Marsha Blackburn (R-TN). COPRA is sponsored by Senators Maria Cantwell (D-WA), along with Brian Schatz (D-HI), Amy Klobuchar (D-MN), and Ed Markey (D-MA).

Despite the partisanship, the CDPA includes much that all sides can agree on. And in an announcement about the bill, the Republican Senators said all the right things. For example, Senator Wicker’s statement reads, “As the coronavirus continues to take a heavy toll on our economy and American life, government officials and health-care professionals have rightly turned to data to help fight this global pandemic. This data has great potential to help us contain the virus and limit future outbreaks, but we need to ensure that individuals’ personal information is safe from misuse.”

VB Transform 2020 Online – July 15-17: Join leading AI executives at the AI event of the year. Register today and save 30% off digital access passes.

Per the announcement, the CDPA includes the following:

  • Require companies under the jurisdiction of the Federal Trade Commission to obtain affirmative express consent from individuals to collect, process, or transfer their personal health, device, geolocation, or proximity information for the purposes of tracking the spread of COVID-19.
  • Direct companies to disclose to consumers at the point of collection how their data will be handled, to whom it will be transferred, and how long it will be retained.
  • Establish clear definitions about what constitutes aggregate and de-identified data to ensure companies adopt certain technical and legal safeguards to protect consumer data from being re-identified.
  • Require companies to allow individuals to opt out of the collection, processing, or transfer of their personal health, geolocation, or proximity information.
  • Direct companies to provide transparency reports to the public describing their data collection activities related to COVID-19.
  • Establish data minimization and data security requirements for any personally identifiable information collected by a covered entity.
  • Require companies to delete or de-identify all personally identifiable information when it is no longer being used for the COVID-19 public health emergency.
  • Authorize state attorneys general to enforce the Act.

In a statement to VentureBeat, Liz O’Sullivan, cofounder of ArthurAI and technology director of STOP (Surveillance Technology Oversight Project), said that the CDPA is a step in the right direction, but she’s concerned that it doesn’t go far enough. “There’s nothing stopping companies from using this data to profit after the crisis, and it won’t protect people in the event that ICE or other law enforcement agencies subpoena identifiable information while the crisis is ongoing,” she said.

In a way, the issues here are business as usual for data privacy. “All the usual concerns apply: This data is a great source of power in any hands, to be politicized or used for personal gain. If companies are left with a choice to ‘delete or de-identify,’ it’s pretty clear which one they will choose,” she said, adding that “It’s telling, in fact, that Palantir, a company typically associated with national security, has already won contracts to handle this data.”

She emphasized that the danger with any bill that fails to keep a divide between public and private data is the creation of the illusion of privacy while handing governments, and “state-adjacent corporate entities,” expanded surveillance capabilities.

Andrew Burt, chief legal officer at Immuta and managing partner at bnh.ai, said in a statement to VentureBeat that the CDPA does serve to reinforce how important data and data analytics are to combatting the pandemic. “There’s a reason, for example, that the most thorough plans to get Americans back to work pre-vaccine start with contact tracing and monitoring — knowing who might be a carrier of the virus, and where they’ve gone and who they’ve been in close proximity to, is the first step to getting us to a state of reasonable safety,” he said. “Data collection and data analytics will form the backbone of those efforts. So I see the CDPA as a very clear acknowledgement of that fact.”

But Burt also noted that there is much more that needs to be discussed around data protection laws, such as what a bill like this says about the broader state of data protection laws, the current and future role of the FTC around privacy, what counts as “health data” in a world of ubiquitous data generation and collection, applying time limits to “new surveillance mechanisms” for COVID-19, and more.

The fact that legislators are moving forward with data privacy laws is a welcome sign of progress. But Republicans and Democrats will need to do more to come to consensus lest the U.S. ends up with data laws that fail to strike the best balance between protecting people from the coronavirus and protecting people from future abuses.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More
« Older posts
  • Recent Posts

    • solve for variable in iterator limit
    • THE UNIVERSE: A WONDROUS PLACE
    • 2020 ERP/CRM Software Blog Award Winners
    • Top 10 CRM Software Blog Posts in 2020
    • Database trends: Why you need a ledger database
  • Categories

  • Archives

    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited