Category Archives: Big Data

Notes from 2017 MIT CDOIQ Symposium: CDO’s, and Data, go Mainstream

During the conclusion of this year’s annual MIT Chief Data Officer and Information Quality Symposium (MITCDOIQ) held July 12-14 in Cambridge, MA, the organizers noted that the focus of the event was at last shifting from the rise and role of the CDO to a focus on the key, strategic business initiatives undertaken by the CDO and their organization.  But this shift was evident right from the start of the event – one featured session called this the process of “Shifting from Survival to Success.”

Front and center were key topics around the business value of data:  building out the data infrastructure and democratizing data, the data privacy requirements of GDPR, the value and use of machine learning and advanced analytics to drive business initiatives and achieve better business outcomes, and adaption to the accelerating pace of change.  One panel of CDO’s – including Christina Clark of General Electric (GE), Mark Ramsey of Glaxo Smith Kline, and Venkat Varadachary of American Express – addressed a number of these points in detail.

blog MITCDOIQ presentation Notes from 2017 MIT CDOIQ Symposium: CDO’s, and Data, go Mainstream

Managing Data for Business Value and Growth

Though the CDO role emerged out of the financial crisis and the need for broader controls around regulatory requirements and data governance, that role has changed and most new CDO’s are addressing the need for business value and revenue growth.  This doesn’t mean that the foundation and fundamentals of data governance and compliance are forgotten or dropped.  In fact, those aspects are critical to this shift as business growth is dependent on an organization’s ability to structure and manage data for speed and flexibility.  The approaches taken to succeed may vary considerably though.

blog banner landscape Notes from 2017 MIT CDOIQ Symposium: CDO’s, and Data, go Mainstream

One participant noted the example of Google absorbing YouTube.  Google took 2 years to build the infrastructure and information needed to effectively monetize the volumes of data acquired.  In the case of financial services firms, that approach is rare, particularly with mergers.  In those cases, the customer support and client relationship aspects are critical and it is more important to leave the systems in place and develop an approach to span across those systems, even where they have similar data.

The participants particularly noted that the benefits of this data-driven business approach are large, but that there is a need for a clear vision of what the organization wants to do and address.  This includes establishing a scorecard with quantifiable metrics for business value.

Some of these may be as straightforward as: identifying how many different revenue-generating use cases have been deployed; or determining how much faster deployment is with new approaches to data delivery.  Often, it’s the critical data elements, perhaps no more than 50 per subject area, that provide the key metrics around operational value (and the impact to business processes that will break if incorrect) tied in with the costs to acquire, manage, and consume such data.

blog MITCDOIQ trillium Notes from 2017 MIT CDOIQ Symposium: CDO’s, and Data, go Mainstream

I was on hand with colleagues at MITCDOIQ to demonstrate how our Trillium data quality software can help organization’s data governance initiatives 

The CDO of Glaxo Smith Kline noted a goal to change the time to discover new drugs from 8 years to 1 year, and transform the pharmaceutical industry by leveraging sensor-based and genomics data.  Many steps are needed to reach that business goal including standardizing internal data and being able to connect the internal data to new external sources.

Democratization of Data

For individuals in an organization to be effective, they need trusted data at hand to move forward with speed and efficiency.  Data scientists are just a part of that equation.  Sales, marketing, operations, and others in the lines of business all need data.  Getting data into the hands of employees, even if imperfect, is valuable – it creates incentives sooner as people can see the data issues and work to solve them.  Helping solve the problems for these people in accessing and using data not only democratizes the data, but provides them the ability to act in a more agile way with faster time-to-value.

At the same time, it is important to remember that data must be served in a manner that is consumable to these varied users.  Some will want visualizations and dashboards, some need alerts and notifications for faster action, some need data in Excel, and some could care less about visualization and want access via tools such as Python.

To achieve this democratization, it’s important then to understand what data people want access to, how it may be delivered and consumed, and how individuals can accelerate this process.  Shifting the cultural mindset to a process of collecting and accessing data rather than modeling and structuring the data first helps to more readily identify where the business challenges are and how data may be applied to solve the issues and drive value.

Barbara Latulippe, CDO of Dell, reiterated many of the panels themes in her MITCDOIQ presentation on “Governance and Stewardship in the Big Data Era.”  She noted that the data scientists in her organization were struggling to find data.  In one case, it took 35 phone calls by a data scientist to determine all the context around the data!  Democratizing data means it is critical to make the data easy to find, easy to understand, and easy to determine trust and quality.

One metric for Dell is simply reducing the time needed to find and consume data for prescriptive value with a goal to move from 70% of a data scientist’s available time spent in finding data (a statistic regularly reported) to 30% of their time.  Achieving this requires data governance, echoing the earlier panel’s comments that governance is foundational to success in this area.  Dell’s approach follows a Lean Data Governance model, a practice that Trillium Software has noted in the past, including:  starting small, showing success, visualizing results, and breaking down silos by showing others “what’s in it for them.”

Finding Data Skills, Building Data Literacy

On the final day of MITCDOIQ, Natalie Evans Harris, VP of Ecosystem Development at The Impact Lab, discussed the perceived issue in finding individuals with the data skills needed to help organizations achieve business value and growth.  She noted that this is often a “signaling” problem.

The focus by organizations on finding the “data scientist” who can understand and communicate with the business while finding and accessing data, testing hypotheses, building algorithms and models, and ramping these up into ongoing executable frameworks is misguided.  What organizations need to focus on is bringing teams together with the mix of skills that can empower all involved to move the organization forward.  This is the approach noted by Booz, Allen, Hamilton in their Field Guide to Data Science.

It’s important to remember that the range of skills needed to work effectively with data exist in many individuals and consider whether we are really looking for specialists or trying to take advantage of competencies (e.g. biologists, linguists, etc. can provide data science) and blend those with the subject matter experts who understand the business, understand business opportunities, and can present ideas in a manner that makes sense in the organization.

My own topic at MITCDOIQ, Finding Relevance in the Big Data World, touched on an aspect of data literacy, specifically how to approach the challenge of considering what data is important, i.e. relevant, for a given business initiative. Wolf Ruzicka, the Chairman of the Board at EastBanc Technologies, noted in his blog “Grow A Data Tree Out Of The “Big Data” Swamp” of June 1, 2017. “If you don’t know what you want to get out of the data, how can you know what data you need – and what insight you’re looking for?”

A fundamental step then in bringing data into the mainstream is ensuring that the individuals working with the data to establish a goal (whether generating new revenue, meeting compliance goals such as GDPR, or reducing operational costs).  Only with a business goal in mind can you test hypotheses, evaluate and measure data, and determine whether the data is fit for purpose.  The results must be documented in a way that they can be communicated out through a repeatable data governance process.  Such a process should start small, but it provides an approach to build a practice, show success, and build business value while democratizing and measuring the data used and highlighting which data has value for which business purpose.

As Harris noted, it’s important to address change management services and processes, particularly to understand how people can use, interpret, and understand their data and their dashboards.  This means not only thinking about data literacy, but building data literacy!

lessons learned Notes from 2017 MIT CDOIQ Symposium: CDO’s, and Data, go Mainstream

From MITCDOIQ: Lessons Learned in Dell’s CDO/Data Governance Journey

Data-driven Success

As the CDO panel noted, having both data governance and data science teams together in the organization helps ensure that regulatory obligations are met while building for growth.  It’s the underlying foundation needed to achieve success at data-driven initiatives.  And it’s hard to get people bought in fully, and requires culture change, but that is part of the CDO’s work.  This shift is evidenced in even at MITCDOIQ in its topics – no longer is the focus on creating a CDO office but on sharing the stories of organizational change and the adoption of fundamental data-driven processes and data literacy.

Discover the new rules for how data is moved, manipulated, and cleansed – download our new eBook The New Rules for Your Data Landscape today!

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

5 Big Data Experts Who Caught Your Attention This Year

Not all expert interviews are created equal. Here are the five most popular Big Data experts we’re spoken to this year. Did your favorite make the list?

#1  Dakshinamurthy V. Kolluru

blog kolluru banner 5 Big Data Experts Who Caught Your Attention This Year

As the founder and president of the International School of Engineering, aka INSOFE (@INSOFEedu), Kolluru could teach his own master class in communicating complex ideas with clarity and excitement.

Kolluru has done extensive work in data science, particularly mathematical algorithms and pattern extraction. He has helped establish several data science centers of excellence, and proactively steered INSOFE into the globally acclaimed School of Applied Engineering that it is today.

We asked him about the use of data science skills in the workplace, and where the field of data analytics was headed.

See what he had to say about Data Science and Big Data Analytics

#2  Wayne W. Eckerson

blog eckerson banner 5 Big Data Experts Who Caught Your Attention This Year

Eckerson (@weckerson), the founder and principal consultant for Eckerson Group, is an internationally recognized thought leader in the business intelligence. He is a sought-after consultant, noted speaker and bestselling author.

In our interview, Eckerson offers insight on the evolution of business analytics with additional thoughts on what business intelligence professionals will need to do survive the future of self-service technology.

Read his interview for more on the Evolution of Business Intelligence

#3  Reynold Xin, Databricks

blog xin banner 5 Big Data Experts Who Caught Your Attention This Year

Xin (@rxin) is the chief architect for Spark core at Databricks, and also one of Spark’s founding fathers. At this year’s Strata + Hadoop World in San Jose, he gave a presentation on the full history of Spark, from taking inspiration from mainframe databases to the cutting edge features of Spark 2.x.

Syncsort’s Paige Roberts sat down with Xin to get the details on the driving factors behind Spark 2.x and its newest features, such as structured streaming.

Follow along in his popular two-part series:

#4  Neha Narkhede, Confluent

blog neha banner 5 Big Data Experts Who Caught Your Attention This Year

Narkhede (@nehanarkhede) is the co-founder and chief technology officer at Confluent. Prior to Confluent, she built Apache Kafka with two of her colleagues at LinkedIn.

At a recent Confluent partner event, Syncsort’s Paige Roberts spoke with Narkhede about Apache Kafka, Confluent, the future of streaming data processing, and what it’s like to be a “Girl Boss” in Big Data.

You won’t want to miss any of this three-part conversation:

#5  Robert Corace, SoftServe

blog corace banner 5 Big Data Experts Who Caught Your Attention This Year

Corace is the executive vice president of digital disruption at SoftServe, a digital solutions company. As a seasoned industry professional with more than 25 years of experience leading sales and implementation teams, technology groups and global delivery centers, Corace is an expert in digital and technology enablement, digital transformation, digital disruption and digital futurism.

We checked in with Robert to get his insight on Big Data and current trends and challenges in data and the Internet of Things (IoT).

Check out his two-part interview:

More Big Data Experts

blog banner BBDtL ExpertsSay 5 Big Data Experts Who Caught Your Attention This Year

For additional words of wisdom from Big Data experts, download our free eBook: Bringing Big Data to Life: What the Experts Say

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Data Governance Review: From Basics to the Latest News & Trends

Similar to last week’s Data Quality Study Guide, we wanted to continue to take advantage of the slower summer season to review the latest in data governance. Take a break from the heat and spend a few moments to get yourself caught up.

New to Data Governance?

If you’re just getting started, we’ve got the perfect data governance primer for you. This article explains the basics of developing an effective data governance process for taming unruly data.

blog data governance policies Data Governance Review: From Basics to the Latest News & Trends

Data Governance + Data Quality = Trust

Data governance requires data quality because ensuring data quality is the only way to be certain that your data governance policies are consistently followed and enforced. It’s likely that is why both data governance and data quality were top of mind at this year’s Collibra Data Citizens event.

At this year’s Data Governance and Information Quality Conference (DGIQ), our own Keith Kohl lead the session about how data governance and data quality are intrinsically linked, and as the strategic importance of data grows in an organization, the intersection of these practices grows in importance, too.

During her Enterprise Data World presentation, Laura Sebastian-Coleman of the Data Quality Center of Excellence Lead for Cigna, noted specifically that data quality depends on fitness for purpose, representational effectiveness and data knowledge. And, without this knowledge, which depends on the data context, our data lakes or even our data warehouses are doomed to become “data graveyards.”

As our new eBook “The New Rules for Your Data Landscape” points out, data is shifting from IT to the business. The result is a new data supply chain which impacts data movement, manipulation and cleansing.

blog banner landscape Data Governance Review: From Basics to the Latest News & Trends

Today’s business leaders rely on Big Data analytics to make informed decisions. But according to figures presented at the recent Gartner Data and Analytics Summit, C-Level executives believe that 33% of their data is inaccurate.

It appears there is an abundance of data, but a scarcity of trust, and the need for data literacy. It’s important to understand what your data MEANS to your organization. Defining data’s value wedge may be key to developing confidence in your enterprise data.

For more information about the data value wedge, watch this educational webcast hosted by ASG and Trillium Software. The recorded discussion explores the importance – and challenge – of determining what data MEANS to your organization, as well as solutions to empower both your technical (IS) and business users (DOES) to collaborate in an efficient, zero-gap-lineage user interface.

Data Governance for Hadoop

Keeping track of data, data security, data access, and regulatory compliance are more critical and more challenging than ever before. Data governance in Hadoop — including auditing, lineage, and metadata management — requires a scalable approach that is easy to interoperate across multiple platforms.

In 2015, Syncsort joined Cloudera to provide a unified foundation for open metadata and end-to-end visibility for governance, effectively bridging the gap between mainframe and Hadoop.

Just last year, Hortonworks CTO Scott Gnau recognized that data governance in Hadoop was still in early development, but definitely a priority at his organization.

At this year’s DataWorks Summit, Gnau made a joint appearance on theCUBE with Syncsort CTO Tendü Yoğurtçu. Gnau was bullish on Hortonworks’ partnership with Syncsort, pointing out that it is built on the foundation of accelerating joint customers time to value and leveraging our mutual strengths.

Syncsort’s Focus on Data Governance

Also during her DataWorks theCUBE appearance, Yoğurtçu explained how the Trillium Software acquisition has been transformative for Syncsort, allowing the organization to deliver joint solutions from data integration and data quality & profiling portfolios. She shared that recent first steps have been focused on data governance use cases leveraging Trillium’s solutions.

Yoğurtçu also touched on the recent announcement of  Syncsort’s partnership with Collibra, noting the importance of making business rules and technical metadata available thru dashboards for data scientists.

For more information on how data governance is changing to match the new flow of data delivery, download our new eBook:The New Rules for Your Data Landscape

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

ProBeat: Wearables are gimmicks

 ProBeat: Wearables are gimmicks

It’s been a tough month for wearables. Two weeks ago, The Information revealed that Jawbone is being liquidated. This week, CNBC reported that Intel had axed its wearables division.

As my colleagues and friends know, I’m one of the biggest skeptics of wearables in the world. As such, neither of these stories shocked me in the slightest.

At the same time, though, I’m also incredibly bullish on what wearables will one day accomplish. The technology just isn’t here yet.

None of today’s wearables excite me (many concepts and prototypes do, but that’s the case for almost any space). I’ve thought about this for a long time, and the reality is that wearables simply don’t do anything that I wish they could.

I want a device that can truly accomplish what my phone can’t. I don’t care for a wearable that can tell the time, make phone calls, send messages, run apps, and count my steps.

I don’t want a shitty phone on my wrist. Nor on my face.

Google Glass Explorer Edition relaunched this week as Enterprise Glass Edition. I’m happy to see that Google has found a niche for the product, but it’s depressing the company has put the dream of prescription glasses and contact lenses with AR functionality on the back burner.

I want a device that can monitor exactly what I’ve consumed and measure what I have gained (or lost) from it. I want a device that can measure how long I’ve rested and whether it is enough for the life I live. I want a device that can determine what my body really needs based on the information it gathers. That means anything from a recommendation to go for a run today because I’ve been immobile for too long or to eat a specific vegetable because I’m missing a given nutrient.

I strongly believe this is coming. But until the technology arrives, I’m not surprised that startups are folding and tech giants are looking elsewhere.

The good news is that many people do find wearables in their current iteration to be useful. Companies are clearly interested in augmented glasses, while consumers are still buying smartwatches and fitness trackers.

Indeed, IDC estimated last month that wearables grew 17.9 percent in Q1 2017. The top five companies (Xiaomi, Apple, Fitbit, Samsung, and Garmin) aren’t throwing in the towel.

As long as there is at least some demand, money will be invested in the space. And hopefully, those investments will one day pay off with a device that stands on its own.

Attached to your body, of course.

ProBeat is a column in which Emil rants about whatever crosses him that week.

Let’s block ads! (Why?)

Big Data – VentureBeat

Your WordPress plugins might be silently losing business data

 Your WordPress plugins might be silently losing business data

If your WordPress site uses third-party plugins, you may be experiencing data loss and other problematic behavior without even knowing it.

Like many of you, I’ve become quite attached to WordPress over the past 15 years. It is by far the most popular content management system, powering 28 percent of the Internet, and still the fastest growing, with over 500 sites created on the platform each day. Considering myself well versed in the software, I was surprised to discover — while working on a digital design project for a client — what could be the Y2K of WordPress. Many WordPress plugins are suffering data loss, and it looks like this problem will soon explode if not properly addressed.

The issue is essentially due to the fact that WordPress discards entire datasets even when only one of the data elements within the set contains too many characters for the insertion field. Because WordPress doesn’t log the data loss or any errors related to it, few developers are aware of the issue. And because of one particular scenario involving storing a visitor’s data when they’re connecting with an IPv6 address, the situation is exponentially worse.

Example: Say a WordPress site owner has a plugin installed that lets users add comments. Plugins like that typically store the user’s IP address along with comments they submit, for analytics purposes. For years, plugin developers have assumed that IP addresses were always in the standard IPv4, 15-character format that looks like this: 216.123.123.123. Thus, plugin developers typically set the maximum allowed characters for the IP address database field their plugin uses to about 15-20 characters. However, IPv6 has a much longer 39-character format that looks like this: 2001:0db8:85a3:0000:0000:8a2e:0370:7334.

Unbeknownst to many users, site owners, and developers alike, these longer IPv6 addresses are becoming increasingly widespread. Those new addresses won’t fit into the database fields developers have been using for years. Furthermore, for security purposes, WordPress specifically validates that each part of a data set about to be stored will fit. In the example above, if the IP address is too long, WordPress discards the entire data set (not just the oversized IP address string). Worse, WordPress doesn’t log an error when this happens. The data is simply lost to the ether, without leaving a trace. This two-year-old WordPress bug thread shows how long the WP core devs have known that the community didn’t like this, but they still haven’t addressed it.

Yes, this currently just affects data coming from IPv6 addresses (currently about 17 percent of users). But while IPv6 use may be in the minority right now, it won’t be for long, and as it becomes the majority, these unexplained issues with data loss will reach pandemic proportions if left untreated.

Just how widespread is this?

1.02 million active WordPress plugin installs are silently discarding real visitor logs, content submissions curated by users, and more, right now, all because IPv6 addresses are present in the data being stored. Here are some other interesting stats:

  • 50,336 plugins are available at wordpress.org today
  • 200 plugins (~1 in 250) create IP address fields that are too short
  • Those 200 plugins have over 1 million active installs — a total of 1,023,280.
  • Here’s a publicly-accessible Google Sheet my team created that lists all known offending plugins. For each plugin, that sheet includes one example where that plugin declares an IP address field that is too short.

The fix is easy peasy: You simply need to change the table schema for the column that stores IP addresses from 15 to 39 (or more).

This problem can affect applications other than WordPress; really, any application that utilizes IP addresses and stores them in MySQL/PostgreSQL tables (especially in STRICT mode, which would prevent row inserts) where the column max is expecting a 15-character IPv4 IP address.

Debuggin’ the plugin

I uncovered this situation while recently working on a site that needed a rating system that allowed authenticated users to vote on specific post types. So naturally, I did a search of existing plugins that could meet the requirements and found one fairly quickly, CBX Rating, and it was a breeze to configure and get working. Then came the intermittent reports of the form submissions not going through.

I spent hours deactivating other plugins, digging through code, and guiding users via screenshare. I was unable to narrow it down or find any smoking gun. No success message, no error message, no errors in the console log, nothing in the server logs. How could form submissions be failing without errors?

I remembered something I had seen in WordPress before: row inserts silently failing if the data strings were longer than the table column maximums. So I shifted my attention to the back end, and that’s where I found the problem and my boss, Erik Neff (the company’s CTO), helped identify exactly why it was happening.

MySQL databases, not in STRICT mode, will truncate values if they’re over the max character count for a particular column and will insert the new record with a warning. When in STRICT mode, MySQL will not accept the record and will return an error. WordPress, on the other hand, won’t execute a query if it determines the length is longer than the max, and will instead return false, with no error or warning.

When using the WordPress $ wpdb->insert method, you get back a 1 upon success and a 0 upon failure. But a function is called before any mySQL statements are executed, and that’s where the problem lies. The function is called protected function process_field_lengths, and it checks to see if the data’s length is less than the max allowable length for that table column. If the length is longer than allowed, the entire insert is aborted and false is returned with no error message or explanation. This is a known issue with WordPress core, and makes debugging that much harder.

The CBX Rating plugin we were using didn’t account for this failure point. I checked the plugin’s table schema and started increasing varchar max lengths across the board. Touchdown! Soon after, I got wind from users of all types that all forms were now being submitted successfully.

My mind raced to how this could be an epidemic, so Erik and I set out to determine the scale. The result of a (rather lengthy) check of WordPress plugins yielded a list of every place an IP address field was declared with an incorrect length. You can find those results in the Google sheet that I’ve made public.

Brett Exnowski is senior developer at Primitive Spark and specializes in complex web applications.

Let’s block ads! (Why?)

Big Data – VentureBeat

Welcome to the Hitachi-Specific z13

This post is an update of an article about the Hitachi-Specific z13 that originally appeared on the Dancing Dinosaur blog.

Old timers will remember when the computer industry boasted multiple vendors offering mainframe computers compatible with the IBM mainframe OS at that time. They were called plug-compatible and they constituted their own acronym, BUNCH (Burroughs, Univac, NCR, Control Data, and Honeywell). The BUNCH served two purposes: 1) they stopped the Feds from hassling IBM over monopolistic practices and 2) they gave mainframe users alternatives that still ran their core IBM systems.

The BUNCH is long gone as mainframe providers. IBM revived them in spirit, however, with a recent joint IBM-Hitachi announcement of what, in effect, amounts to a plug-compatible z13 that also runs Hitachi’s operating system, VOS3.

blog moving forward Welcome to the Hitachi Specific z13

Start of a New BUNCH

The new Hitachi deal, as this reporter previously wrote, adds a new dimension to the z. The plans call for using Hitachi’s operating system, VOS3, running on the latest IBM z13 hardware to provide Hitachi users with better performance while sustaining their previous investments in business-critical Hitachi data and software. VOS3 started as a fork of MVS and has been repeatedly modified since as Hitachi keeps updating its own software.

According to IBM, Hitachi will exclusively adopt the IBM z Systems high-performance mainframe hardware as the only hardware for the next generation of Hitachi’s AP series. These systems primarily serve major organizations in Japan. This work expands Hitachi’s previous cooperation with IBM to make mainframe development more efficient through IBM’s global capabilities in developing and manufacturing mainframe systems.

blog banner MFX2017 Webcast Welcome to the Hitachi Specific z13

Mainframe Innovation

The collaboration, IBM further noted, reinforces its commitment to delivering new innovations in mainframe technology and fostering an open ecosystem for the mainframe to support a broad range of software and applications. IBM recently launched offerings for IBM z Systems that leverage the platform’s capabilities for high capacity, speed, scale and security to deliver cloud-based blockchain services for building new transaction systems and machine learning for analyzing large amounts of data.

IBM also is opening up the mainframe through its Open Mainframe Project, which is a Linux initiative. That’s unlikely to deliver many innovations that bolster Hitachi’s specific z customer base, but others certainly should.

blog innovagtion Welcome to the Hitachi Specific z13

A New OS for z

If you count VOS3, the mainframe now runs a variety of operating systems, including z/OS, z/TPF, VSE, and z/VM operating systems as well as Linux. Reportedly, Hitachi plans to integrate its new mainframe with its Lumada Internet of Things (IoT) offerings. With z scalability, security, massive I/O, and performance the z makes an ideal IoT platform, and IoT is a market IBM targets today. Now IBM is seeding a competitor with the z running whatever appealing capabilities Hitachi’s Lumada offers. Hope the revenue or royalties IBM gets is worth it.

IBM and Hitachi, as explained in the announcement, have a long history of cooperation and collaboration in enterprise computing technologies. Hitachi decided to expand this cooperation to utilize IBM’s most advanced mainframe technologies. Hitachi will continue to provide its customers with a highly reliable, high-performance, highly secure mainframe environment but running the Hitachi VOS3 operating system. Hitachi also agrees to strengthen mainframe functionality and services, which, they claim, should lower TCO, improve ease of system introduction and operation, and boost serviceability.

Syncsort’s mainframe optimization solutions such as it ZP Saver Suite are also targeted to lower TCO. To learn more, watch the recorded webcast Mainframe Optimization in 2017, which discusses key mainframe optimization problems, opportunities and use cases spanning DB2 and network management on z/OS, as well as new ways to save on your monthly IBM MLC charges and new options for long-standing mainframe issues.

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

LiveStories raises $10 million to help you access public health and census data

 LiveStories raises $10 million to help you access public health and census data

LiveStories, which provides software that simplifies access to civic data on poverty, health, economics, and more, today announced that it has raised $ 10 million in funding. Ignition Partners led the round, with participation from returning investors True Ventures and Founders Co-Op.

The Seattle-based startup sources data from federal, state, and local governments, including The Bureau of Labor Statistics, the U.S. Census, and the Centers for Disease Prevention and Control.

“The civic data workflow is fragmented across multiple tools and vendors,” wrote LiveStories founder and CEO Adnan Mahmud, in an email to VentureBeat. “For example, you might use Google to find the data, Excel to clean it up, Tableau to explore it, and Word to create a static report.”

According to Mahmud, LiveStories’ software allows customers to find and communicate civic data in a more interactive way — across charts, videos, and images. “Our platform automatically visualizes the data, down to city and county localities,” wrote Mahmud. The data can then be shared on social media networks like Facebook and Twitter.

LiveStories claims to have more than 120 customers, which include LA County, CDPH, San Diego County, UCLA, and the Gates Foundation.

Today’s funding will be used to further develop the product and increase sales and marketing. Founded in 2015, LiveStories has raised a total of $ 14 million and currently has 20 employees.

Sign up for Funding Daily: Get the latest news in your inbox every weekday.

Let’s block ads! (Why?)

Big Data – VentureBeat

Alation raises $23 million for its data cataloging software

Alation, which provides software that allows businesses to have a single source of reference for their data, today announced funding of $ 23 million, in a round led by Icon Ventures. New investor Harmony Partners, as well as existing investors Costanoa Ventures and Data Collective, also joined.

Alation indexes enterprise databases like Teradata, business intelligence (BI) tools like Tableau, and file systems like Cloudera to simplify the search for data. “Much like Google crawls the public internet, the Alation data catalog automatically crawls, parses, and indexes all of an organization’s data and the logs of how data is used,” wrote Alation cofounder and CEO Satyen Sangani, in an email to VentureBeat.

 Alation raises $23 million for its data cataloging software

Above: Alation platform

Image Credit: Alation

The Redwood City, California-based startup uses artificial intelligence (AI) to understand patterns in usage — how the data is being used, what techniques are being applied to derive insight, and who has used the data to drive that insight. “In essence, what Alation does for data is similar to pagerank in Google,” wrote Sangani.

The chief executive claims to have more than 70 customers — including eBay, Pfizer, Pinterest, Square, and Tesla — who purchase a yearly subscription to access the software. The price point can go up to millions of dollars a year, says Sangani.

Alation has raised a total of $ 32 million to date, from investors that include Andreessen Horowitz, Bloomberg Beta, and General Catalyst Partners. The new money will be used to increase sales and marketing, further research and development (R&D), and grow the team of 85.

The company was founded in 2013.

Sign up for Funding Daily: Get the latest news in your inbox every weekday.

Let’s block ads! (Why?)

Big Data – VentureBeat

Data Quality Study Guide – A Review of Use Cases & Trends

Our summer school series continues with today’s fully loaded study session. Have you been taking note of all the use cases and current trends for data quality? Maybe now is a good time for a review!

Data Quality Saves You Money

A big reason to pay attention to data quality is that it can save you money. First and foremost, it can help you maximize the return on your Big Data investments. And there are additional cost-related benefits (areas that we will discuss below) to help you save even more.

blog intl moneys Data Quality Study Guide – A Review of Use Cases & Trends

It Builds Trust

Business leaders rely on Big Data analytics to make informed decisions. But according to figures presented at the recent Gartner Data and Analytics Summit, C-Level executives believe that 33% of their data is inaccurate. Ensuring quality data can help organizations trust the data.

And further, customers can trust businesses who are confident in their data. If your data is inaccurate, inconsistent or otherwise of low quality, you risk misunderstanding your customers and doing things that undermine their trust in you.

It appears there is an abundance of data, but a scarcity of trust, and the need for data literacy. It’s important to understand what your data MEANS to your organization. Defining data’s value wedge may be key to developing confidence in your enterprise data.

blog banner ASG webcast 2 Data Quality Study Guide – A Review of Use Cases & Trends

For more information, watch this educational webcast, hosted by ASG and Trillium Software, which explores the importance – and challenge – of determining what data MEANS to your organization, as well as solutions to empower both your technical (IS) and business users (DOES) to collaborate in an efficient, zero-gap-lineage user interface.

Data Quality’s Link to Data Governance

Data quality is essential for data governance because ensuring data quality is the only way to be certain that your data governance policies are consistently followed and enforced.

During her Enterprise Data World presentation, Laura Sebastian-Coleman, the Data Quality Center of Excellence Lead for Cigna, noted specifically that data quality depends on fitness for purpose, representational effectiveness and data knowledge. And, without this knowledge, which depends on the data context, our data lakes or even our data warehouses are doomed to become “data graveyards.”

At this year’s Data Governance and Information Quality Conference (DGIQ), our own Keith Kohl lead the session about how data governance and data quality are intrinsically linked, and as the strategic importance of data grows in an organization, the intersection of these practices grows in importance, too.

Data Quality and Your Customers

Engaging your customers is vital to driving your business. Data quality can help you improve your customer records by verifying and enriching the information you already have. And beyond contact info, you can manage customer interaction by storing additional customer preferences such as time of day they visit your site and which content topics and type they are most interested in.

The more customer information you have, the better you can understand your customers and achieve “Customer 360,” or full-view of your customer. But you need to be aware that more data means more complexity – creating a data integration paradox.

blog eBook Customer360 Data Quality Study Guide – A Review of Use Cases & Trends

For a more detailed overview of the different sources of this data, which data points are critical in obtaining, and tips for customer 360 success, download our eBook Getting Closer to Your Customers in a Big Data World.

Its Role in Cyber Security

You may be aware of all the ways you can leverage Big Data to detect fraud, but maybe you’re wondering how data quality can fight security breaches?

Think about it. If the machine data that your intrusion-detection tools collect about your software environments is filled with incomplete or inaccurate information, then you cannot expect your security tools to effectively detect dangerous threats.

Keep in mind, too, that when it comes to fraud detection, real-time results are key. By extension, your data quality tools covering fraud analysis data will also need to be work in real time.

Additional Data Quality Trends

Of course, we’re always thinking about what’s next for data quality. In March, Syncsort’s CEO Josh Rogers was interviewed on theCUBE, where he discussed his vision for its future.

One additional area of interest that’s gaining momentum is machine learning. While machine learning may seem like a “silver bullet,” because of the technologies it enables for us today, it’s important to understand that without high-quality data on which to operate, it is less magical.

Download theGartner Magic Quadrant Report to learn how leading solutions including Trillium can help you achieve your long-term data quality objectives.

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Even if you analyze your data in real time, storing data for extended periods is important for compliance and other reasons. But what types of data should be retained and how long should you keep it? Keep reading for some insights on data storage.

blog data storage hard drive Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Data Analytics Requires Data Storage

These days, real-time data analytics should be the foundation of most organizations’ approach to working with data. (For more on just how important real time has become, check out Syncsort ‘s Hadoop Market Adoption Survey report, which explains best practices for data management and analysis. But that doesn’t mean that you should interpret data as it streams in, then delete it forever.

On the contrary, keeping data around for a while – even after you’ve interpreted it – is important. It helps keep you compliant by ensuring that data remains available for audits or other reviews. It also provides you an opportunity to review historical data to identify long-term trends, or investigate incidents that you may not discover until long after the data related to them has been generated and processed.

blog banner landscape Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Types of Data to Retain

The first step in building an effective data storage policy is to answer the question: Which types of data should I store for an extended period, and which can I delete instantly?

The short answer is that, to the extent possible, you should retain as much data as your storage capacity can support.

blog big data storage Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

But since most organizations must prioritize some data types for long-term data storage, here’s a general hierarchy that outlines which types of data to keep on hand. The data at the top of the list is the most important to store for as long as possible, while the data at the bottom is least important:

  1. Data that is required to be retained by compliance or regulatory policies. If you’re required by law to store a certain type of data, you should definitely keep that data around.
  2. Data that relates to your customers and helps you engage with them by achieving “customer 360.” Understanding your customers is hard, and you don’t want to give up the data that helps you with that challenge.
  3. Business documents, contracts and so on. This is important to store for as long as possible.
  4. Data that is generated by everyday business operations but is not regulated. This data can be helpful to have on hand for historical reviews or planning purposes, but it’s not essential.
  5. Machine data generated by your networking equipment, servers, sensors or other types of automated sources. Machine data tends to be the least useful type of data to store long term. It is sometimes useful to be able to review machine data when researching a technical incident or planning infrastructure expansions, but for the most part, machine data is only useful in real time, because the state of your infrastructure changes so quickly.

The exact types of data to prioritize for long-term storage will vary from organization to organization, of course. This hierarchy is just a general guide.

But there are practical limitations on how long you can store data. In tomorrow’s blog, we’ll discuss, just how long you need to keep it.

Discover the new rules for today’s data landscape – Download this eBook today!

 Best Practices in Data Storage (Part 1): What Types of Data Should be Retained?

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog