• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: bias

Studies find bias in AI models that recommend and diagnose diseases

February 19, 2021   Big Data
 Studies find bias in AI models that recommend and diagnose diseases

Research into AI- and machine learning model-driven methods for health care suggests that they hold promise in the areas of phenotype classification, mortality and length-of-stay prediction, and intervention recommendation. But models have traditionally been treated as black boxes in the sense that the rationale behind their suggestions isn’t explained or justified. This lack of interpretability, in addition to bias in their training datasets, threatens to hinder the effectiveness of these technologies in critical care.

Two studies published this week underline the challenges yet to be overcome when applying AI to point-of-care settings. In the first, researchers at the University of Southern California, Los Angeles evaluated the fairness of models trained with Medical Information Mart for Intensive Care IV (MIMIC-IV), the largest publicly available medical records dataset. The other, which was coauthored by scientists at Queen Mary University, explores the technical barriers for training unbiased health care models. Both arrive at the conclusion that ostensibly “fair” models designed to diagnose illnesses and recommend treatments are susceptible to unintended and undesirable racial and gender prejudices.

As the University of Southern California researchers note, MIMIC-IV contains the de-identified data of 383,220 patients admitted to an intensive care unit (ICU) or the emergency department at Beth Israel Deaconess Medical Center in Boston, Massachusetts between 2008 and 2019. The coauthors focused on a subset of 43,005 ICU stays, filtering out patients younger than 15 years old who hadn’t visited the ICU more than once or who stayed less than 24 hours. Represented among the samples were married or single male and female Asian, Black, Hispanic, and white hospital patients with Medicaid, Medicare, or private insurance.

In one of several experiments to determine to what extent bias might exist in the MIMIC-IV subset, the researchers trained a model to recommend one of five categories of mechanical ventilation. Alarmingly, they found that the model’s suggestions varied across different ethnic groups. Black and Hispanic cohorts were less likely to receive ventilation treatments, on average, while also receiving a shorter treatment duration.

Insurance status also appeared to have played a role in the ventilator treatment model’s decision-making, according to the researchers. Privately insured patients tended to receive longer and more ventilation treatments compared with Medicare and Medicaid patients, presumably because patients with generous insurance could afford better treatment.

The researchers caution that there exist “multiple confounders” in MIMIC-IV that might have led to the bias in ventilator predictions. However, they point to this as motivation for a closer look at models in health care and the datasets used to train them.

In the study published by Queen Mary University researchers, the focus was on the fairness of medical image classification. Using CheXpert, a benchmark dataset for chest X-ray analysis comprising 224,316 annotated radiographs, the coauthors trained a model to predict one of five pathologies from a single image. They then looked for imbalances in the predictions the model gave for male versus female patients.

Prior to training the model, the researchers implemented three types of “regularizers” intended to reduce bias. This had the opposite of the intended effect — when trained with the regularizers, the model was even less fair than when trained without regularizers. The researchers note that one regularizer, an “equal loss” regularizer, achieved better parity between males and females. This parity came at the cost of increased disparity in predictions among age groups, though.

“Models can easily overfit the training data and thus give a false sense of fairness during training which does not generalize to the test set,” the researchers wrote. “Our results outline some of the limitations of current train time interventions for fairness in deep learning.”

The two studies build on previous research showing pervasive bias in predictive health care models. Due to a reticence to release code, datasets, and techniques, much of the data used to train algorithms for diagnosing and treating diseases might perpetuate inequalities.

Recently, a team of U.K. scientists found that almost all eye disease datasets come from patients in North America, Europe, and China, meaning eye disease-diagnosing algorithms are less certain to work well for racial groups from underrepresented countries. In another study, Stanford University researchers claimed that most of the U.S. data for studies involving medical uses of AI come from California, New York, and Massachusetts. A study of a UnitedHealth Group algorithm determined that it could underestimate by half the number of Black patients in need of greater care. Researchers from the University of Toronto, the Vector Institute, and MIT showed that widely used chest X-ray datasets encode racial, gender, and socioeconomic bias. And a growing body of work suggests that skin cancer-detecting algorithms tend to be less precise when used on Black patients, in part because AI models are trained mostly on images of light-skinned patients.

Bias isn’t an easy problem to solve, but the coauthors of one recent study recommend that health care practitioners apply “rigorous” fairness analyses prior to deployment as one solution. They also suggest that clear disclaimers about the dataset collection process and the potential resulting bias could improve assessments for clinical use.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Cashierless tech could detect shoplifting, but bias concerns abound

January 24, 2021   Big Data

How open banking is driving huge innovation

Learn how fintechs and forward-thinking FIs are accelerating personalized financial products through data-rich APIs.

Register Now


As the pandemic continues to rage around the world, it’s becoming clear that COVID-19 will endure longer than some health experts initially predicted. Owing in part to slow vaccine rollouts, rapidly spreading new strains, and politically charged rhetoric around social distancing, the novel coronavirus is likely to become endemic, necessitating changes in the ways we live our lives.

Some of those changes might occur in brick-and-mortar retail stores, where touch surfaces like countertops, cash, credit cards, and bags are potential viral spread vectors. The pandemic appears to have renewed interest in cashierless technology like Amazon Go, Amazon’s chain of stores that allow shoppers to pick up and purchase items without interacting with a store clerk. Indeed, Walmart, 7-Eleven, and cashierless startups including AiFi, Standard, and Grabango have expanded their presence over the past year.

But as cashierless technology becomes normalized, there’s a risk it could be used for purposes beyond payment, particularly shoplifting detection. While shoplifting detection isn’t problematic on its face, case studies illustrate that it’s susceptible to bias and other flaws that could, at worst, result in false positives.

Synthetic datasets

The bulk of cashierless platforms rely on cameras, among other sensors, to monitor the individual behaviors of customers in stores as they shop. Video footage from the cameras feed into machine learning classification algorithms, which identify when a shopper picks up and places an item in a shopping cart, for example. During a session at Amazon’s re:Mars conference in 2019, Dilip Kumar, VP of Amazon Go, explained that Amazon engineers use errors like missed item detections to train the machine learning models that power its Go stores’ cashierless experiences. Synthetic datasets boost the diversity of the training data and ostensibly the robustness of the models, which use both geometry and deep learning to ensure transactions are associated with the right customer.

The problem with this approach is that synthetic datasets, if poorly audited, might encode biases that machine learning models then learn to amplify. Back in 2015, a software engineer discovered that the image recognition algorithms deployed in Google Photos, Google’s photo storage service, were labeling Black people as “gorillas.” Google’s Cloud Vision API recently mislabeled thermometers held by people with darker skin as guns. And countless experiments have shown that image-classifying models trained on ImageNet, a popular (but problematic) dataset containing photos scraped from the internet, automatically learn humanlike biases about race, gender, weight, and more.

Jerome Williams, a professor and senior administrator at Rutgers University’s Newark campus, told NBC that a theft-detection algorithm might wind up unfairly targeting people of color, who are routinely stopped on suspicion of shoplifting more often than white shoppers. A 2006 study of toy stores found that not only were middle-class white women often given preferential treatment, but also that the police were never called on them, even when their behavior was aggressive. And in a recent survey of Black shoppers published in the Journal of Consumer Culture, 80% of respondents reported experiencing racial stigma and stereotypes when shopping.

 Cashierless tech could detect shoplifting, but bias concerns abound

“The people who get caught for shoplifting is not an indication of who’s shoplifting,” Williams told NBC. In other words, Black shoppers who feel they’ve been scrutinized in stores might be more likely to appear nervous while shopping, which might be perceived by a system as suspicious behavior. “It’s a function of who’s being watched and who’s being caught, and that’s based on discriminatory practices.”

Some solutions are explicitly designed to detect shoplifting track gait — patterns of limb movements — among other physical characteristics. It’s a potentially problematic measure considering that disabled shoppers, among others, might have gaits that appear suspicious to an algorithm trained on footage of able-bodied shoppers. As the U.S. Department of Justice’s Civil Rights Division, Disability Rights Section notes, some people with disabilities have a stagger or slurred speech related to neurological disabilities, mental or emotional disturbance, or hypoglycemia, and these characteristics may be misperceived as intoxication, among other states.

Tokyo startup Vaak’s anti-theft product, VaakEye, was reportedly trained on more than 100 hours of closed-circuit television footage to monitor the facial expressions, movements, hand movements, clothing choices, and over 100 other aspects of shoppers. AI Guardsman, a joint collaboration between Japanese telecom company NTT East and tech startup Earth Eyes, scans live video for “tells” like when a shopper looks for blind spots or nervously checks their surroundings.

NTT East, for one, makes no claims that its algorithm is perfect. It sometimes flags well-meaning customers who pick up and put back items and salesclerks restocking store shelves, a spokesperson for the company told The Verge. Despite this, NTT East claimed its system couldn’t be discriminatory because it “does not find pre-registered individuals.”

Walmart’s AI- and camera-based anti-shoplifting technology, which is provided by Everseen, came under scrutiny last May over its reportedly poor detection rates. In interviews with Ars Technica, Walmart workers said their top concern with Everseen was false positives at self-checkout. The employees believe that the tech frequently misinterprets innocent behavior as potential shoplifting.

Industry practices

Trigo, which emerged from stealth in July 2018, aims to bring checkout-less experiences to existing “medium to small” brick-and-mortar convenience stores. For a monthly subscription fee, the company supplies both high-resolution, ceiling-mounted cameras and an on-premises “processing unit” that runs machine learning-powered tracking software. Data is beamed from the unit to a cloud processing provider, where it’s analyzed and used to improve Trigo’s algorithms.

Trigo claims that it anonymizes the data it collects, that it can’t identify individual shoppers beyond the products they’ve purchased, and that its system is 99.5% accurate on average at identifying purchases. But when VentureBeat asked about what specific anti-shoplifting detection features the product offers and how Trigo trains algorithms that might detect theft, the company declined to comment.

Grabango, a cashierless tech startup founded by Pandora cofounder Will Glaser, also declined to comment for this article. Zippin says it requires shoppers to check in with a payment method and that staff is alerted only when malicious actors “sneak in somehow.” And Standard Cognition, which claims its technology can account for changes like when a customer puts back an item they initially considered purchasing, says it doesn’t and hasn’t ever offered shoplifting detection capabilities to its customers.

“Standard does not monitor for shoplifting behavior and we never have … We only track what people pick up or put down so we know what to charge them for when they leave the store. We do this anonymously, without biometrics,” CEO Jordan Fisher told VentureBeat via email. “An AI-driven system that’s trained responsibly with diverse sets of data should in theory be able to detect shoplifting without bias. But Standard won’t be the company doing it. We are solely focused on the checkout-free aspects of this technology.”

 Cashierless tech could detect shoplifting, but bias concerns abound

Above: OTG’s Cibo Express is the first confirmed brand to deploy Amazon’s “Just Walk Out” cashierless technology.

Separate interviews with The New York Times and Fast Company in 2018 tell a different story, however. Michael Suswal, Standard Cognition’s cofounder and chief operating officer, told The Times that Standard’s platform could look at a shopper’s trajectory, gaze, and speed to detect and alert a store attendant to theft via text message. (In the privacy policy on its website, Standard says it doesn’t collect biometric identifiers but does collect information about “certain body features.”) He also said that Standard hired 100 actors to shop for hours in its San Francisco demo store in order to train its algorithms to recognize shoplifting and other behaviors.

“We learn behaviors of what it looks like to leave,” Suswal told The Times. “If they’re going to steal, their gait is larger, and they’re looking at the door.”

A patent filed by Standard in 2019 would appear to support the notion that Standard developed a system to track gait. The application describes an algorithm trained on a collection of images that can recognize the physical features of customers moving in store aisles between shelves. This algorithm is designed to identify one of 19 different on-body points including necks, noses, eyes, ears, shoulders, elbows, wrists, hips, ankles, and knees.

Santa Clara-based AiFi also says its cashierless solution can recognize “suspicious behavior” inside of stores within a defined set of shopping behaviors. Like Amazon, the company uses synthetic datasets to generate a set of training and testing data without requiring customer data. “With simulation, we can randomize hairstyle, color, clothing, and body shape to ensure that we have a diverse and unbiased datasets,” a spokesperson told VentureBeat. “We respect user privacy and do not use facial recognition or personally identifiable information. It is our mission to change the future of shopping to make it automated, privacy-conscious, and inclusive.”

A patent filed in 2019 by Accel Robotics reveals the startup’s proposed anti-shoplifting solution, which optionally relies on anonymous tags that don’t reveal a person’s identity. By analyzing camera images over time, a server can attribute motion to a person and purportedly infer whether they took items from a shelf with malintent. Shopper behavior can be tracked over multiple visits if “distinguishing characteristics” are saved and retrieved for each visitor, which could be used to identify shoplifters who’ve previously stolen from the store.

“[The system can be] configured to detect shoplifting when the person leaves the store without paying for the item. Specifically, the person’s list of items on hand (e.g., in the shopping cart list) may be displayed or otherwise observed by a human cashier at the traditional cash register screen,” the patent description reads. “The human cashier may utilize this information to verify that the shopper has either not taken anything or is paying/showing for all items taken from the store. For example, if the customer has taken two items from the store, the customer should pay for two items from the store.”

Lack of transparency

For competitive reasons, cashierless tech startups are generally loath to reveal the technical details of their systems. But this does a disservice to the shoppers subjected to them. Without transparency regarding the applications of these platforms and the ways in which they’re developed, it will likely prove difficult to engender trust among shoppers, shoplifting detection capabilities or no.

Zippin was the only company VentureBeat spoke with that volunteered information about the data used to train its algorithms. It said that depending on the particular algorithm to be trained, the size of the dataset varies from a few thousand to a few million video clips, with training performed in the cloud and models deployed to the stores after training. But the company declined to say what steps it takes to ensure the datasets are sufficiently diverse and unbiased, whether it uses actors or synthetic data, and whether it continuously retrains algorithms to correct for errors.

Systems like AI Guardsman learn from their mistakes over time by letting store clerks and managers flag false positives as they occur. It’s a step in the right direction, but without more information about how these system work, it’s unlikely to allay shoppers’ concerns about bias and surveillance.

Experts like Christopher Eastham, a specialist in AI at the law firm Fieldfisher, call for frameworks to regulate the technology. And even Ryo Tanaka, the founder of Vaak, argues there should be notice before customers enter stores so that they can opt out. “Governments should operate rules that make stores disclose information — where and what they analyze, how they use it, how long they use it,” he told CNN.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Salesforce’s Simulation Cards spell out uses, risks, and bias to make AI models more transparent

October 20, 2020   Big Data

Salesforce recently open-sourced Foundation (formerly AI Economist), an AI research simulation for exploring tax policies. To accompany its release, the company this week published what it’s calling a Simulation Card, a file to document the use, risks, and sources of bias in published versions of the simulation.

Simulation Cards join ongoing efforts to bring transparency to historically black-box systems. Over the past year, Google launched Model Cards, which sprang from a Google AI whitepaper published in October 2018. Model Cards specify model architectures and provide insight into factors that help ensure optimal performance for given use cases. The idea of Model Cards emerged following Microsoft’s work on “datasheets for data sets,” or datasheets intended to foster trust and accountability through documenting datasets’ creation, composition, intended uses, maintenance, and other properties. Two years ago, IBM proposed its own form of model documentation in voluntary factsheets called “Supplier’s Declaration of Conformity” (DoC) to be completed and published by companies developing and providing AI.

The objective of Simulation Cards is similar to that of Model Cards and Data Sheets. However, Simulation Cards reflect the fact that simulations differ from trained models and datasets because they’re designed to create scenarios of interest, according to Salesforce. These scenarios can contain bias, which might be purposefully built-in or an unexpected side effect of the design choices made during creation. Because simulations create many datasets of various shapes and sizes, the potential for misuse is greater than that of a single fixed dataset that might contain bias.

 Salesforce’s Simulation Cards spell out uses, risks, and bias to make AI models more transparent

Above: The Simulation Card for Foundation.

Image Credit: Salesforce

The Simulation Card for Foundation is divided into several sections: Simulation Details, Basic Information, Intended Use, Factors, Metrics, Quantitative Analyses, Ethical Considerations, and Caveats and Recommendations. Simulation Details provides the date of the simulation and the name of the publishing organization, as well as any keywords, licenses, contact information, and relevant version numbers. The Basic Information and Intended Use sections cover top-level info about the simulation and the applications for it that the coauthors had in mind. Factors canvases the modeling assumptions the simulations make, while Metrics and Quantitative Analyses outline the metrics used to measure the results. Finally, Ethical Considerations and Caveats and Recommendations provide guidelines for (or warnings against) applying the outputs to real-world systems.

It remains to be seen what sort of third-party adoption Simulation Cards might gain, if any, but Salesforce itself says it’s committed to releasing cards alongside future simulations. “We encourage researchers and developers to publish similar Simulation Cards for software releases, to broadly promote transparency and the ethical use of simulation frameworks. AI simulations offer researchers the power to generate data and evaluate outcomes of virtual economies that capture a part of the real world,” Salesforce wrote in a blog post. “An unethical simulation poses an order-of-magnitude larger ethical risk. As a result, our commitment to transparency is all that much more critical.”


The audio problem:

Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here


Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

AI Weekly: Nvidia’s Maxine opens the door to deepfakes and bias in video calls

October 10, 2020   Big Data
 AI Weekly: Nvidia’s Maxine opens the door to deepfakes and bias in video calls

Will AI power video chats of the future? That’s what Nvidia implied this week with the unveiling of Maxine, a platform that provides developers with a suite of GPU-accelerated AI conferencing software. Maxine brings AI effects including gaze correction, super-resolution, noise cancellation, face relighting, and more to end users, while in the process reducing how much bandwidth videoconferencing consumes. Quality-preserving compression is a welcome innovation at a time when videoconferencing is contributing to record bandwidth usage. But Maxine’s other, more cosmetic features raise uncomfortable questions about AI’s negative — and possibly prejudicial — impact.

A quick recap: Maxine employs AI models called generative adversarial networks (GANs) to modify faces in video feeds. Top-performing GANs can create realistic portraits of people who don’t exist, for instance, or snapshots of fictional apartment buildings. In Maxine’s case, they can enhance the lighting in a video feed and recomposite frames in real time.

Bias in computer vision algorithms is pervasive, with Zoom’s virtual backgrounds and Twitter’s automatic photo-cropping tool disfavoring people with darker skin. Nvidia hasn’t detailed the datasets or AI model training techniques it used to develop Maxine, but it’s not outside of the realm of possibility that the platform might not, for instance, manipulate Black faces as effectively as light-skinned faces. We’ve reached out to Nvidia for comment.

Beyond the bias issue, there’s the fact that facial enhancement algorithms aren’t always mentally healthy. Studies by Boston Medical Center and others show that filters and photo editing can take a toll on people’s self-esteem and trigger disorders like body dysmorphia. In response, Google earlier this month said it would turn off by default its smartphones’ “beauty” filters that smooth out pimples, freckles, wrinkles, and other skin imperfections. “When you’re not aware that a camera or photo app has applied a filter, the photos can negatively impact mental wellbeing,” the company said in a statement. “These default filters can quietly set a beauty standard that some people compare themselves against.”

That’s not to mention how Maxine might be used to get around deepfake detection. Several of the platform’s features analyze the facial points of people on a call and then algorithmically reanimate the faces in the video on the other side, which could interfere with the ability of a system to identify whether a recording has been edited. Nvidia will presumably build in safeguards to prevent this — currently, Maxine is available to developers only in early access — but the potential for abuse was a question the company hasn’t so far addressed.

None of this is to suggest that Maxine is malicious by design. Gaze correction, face relighting, upscaling, and compression seem useful. But the issues Maxine raises point to a lack of consideration for the harms its technology might cause, a tech industry misstep so common it’s become a cliche. The best-case scenario is that Nvidia takes steps (if it hasn’t already) to minimize the ill effects that might arise. The fact that the company didn’t reserve airtime to spell out these steps at Maxine’s unveiling, however, doesn’t instill confidence.

For AI coverage, send news tips to Khari Johnson, Kyle Wiggers, and Seth Colaner — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI Channel.

Thanks for reading,

Kyle Wiggers

AI Staff Writer

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Researchers create dataset to advance U.S. Supreme Court gender bias analysis

September 22, 2020   Big Data
 Researchers create dataset to advance U.S. Supreme Court gender bias analysis

Automation and Jobs

Read our latest special issue.

Open Now

University of Washington language researchers and legal professionals recently created a labeled dataset for detection of interruptions and competitive turn-taking in U.S. Supreme Court oral arguments. They then used the corpus of “turn changes” to train AI models to experiment with ways to automatically classify turn changes as competitive or cooperative as a way to analyze gender bias.

“In-depth studies of gender bias and inequality are critical to the oversight of an institution as influential as the Supreme Court,” reads the paper University of Washington researchers Haley Lepp and Gina-Anne Levow published on preprint repository arXiv one week ago. “We find that as the first person in an exchange, female speakers and attorneys are spoken to more competitively than are male speakers and justices. We also find that female speakers and attorneys speak more cooperatively as the second person in an exchange than do male speakers and justices.”

Attorneys who speak before the Supreme Court are allotted 30 minutes of oral argument and are expected to stop talking when a justice speaks. Linguists have observed men interrupting women routinely in professional environments and other settings.

Turn changes are defined as instances when one person stops speaking and another person starts speaking. Short audio clips of each turn change were annotated as competitive or cooperative by 77 members of the U.S. legal community who identify as an attorney, judge, legal scholar, or law student in their second year or higher. Lepp and Levow’s work focuses on measuring whether the turn change was cooperative or competitive, based on oral argument audio the Supreme Court made available, in part because previous work by Deborah Tannen found that interruptions in speech can be part of regular discourse and that the context of the conversation can be a factor.

The paper devoted to gender bias analysis was published days before the death of Supreme Court Justice Ruth Bader Ginsburg at the age of 87. Ginsburg was the second woman ever appointed to the U.S. Supreme Court. As a litigator for the American Civil Liberties Union (ACLU), Ginsburg successfully argued cases before the Supreme Court that greatly extended women’s rights in the United States. On Wednesday and Thursday, she will be the first woman and the first Jewish person in U.S. history to lie in state at the U.S. Capital building for members of the public to say goodbye. She was the longest-serving female justice in U.S. history.

Although voting has already begun in some parts of the country and Ginsburg pleaded in her final days to let the winner of the presidential election fill her vacancy, President Trump is expected to nominate a pick to fill her seat Friday or Saturday. Two Republican Senators pledged not to vote until the presidential election is decided, but Senate Majority Leader Mitch McConnell said just hours after her death that the president’s nominee will get a vote.

Details of the turn changes corpus dataset follow a 2017 study that used automation to identify the number of interruptions that occurred from 2004-2015. The study “Justice, Interrupted: The Effect of Gender, Ideology and Seniority at Supreme Court Oral Arguments” by Tonja Jacobi and Dylan Schweers found that women are interrupted three times as often as male Supreme Court justices are. Female Supreme Court justices were interrupted by attorneys as well as other Supreme Court justices, led by Anthony Kennedy, Antonin Scalia, and William Rehnquist. Scalia and Stephen Breyer also interrupted each other a lot.

A producer of the podcast More Perfect noticed people repeatedly interrupting Ginsburg, which led to an episode on the subject. Jacobi spoke on the podcast and said Ginsburg developed tactics to adapt to frequent interruptions, first by asking to ask a question, then pivoting to ask questions more like male justices who interrupt.

The episode also highlighted that Justice Sonia Sotomayor was found to speak as often as men in the Jacobi study, but has still drawn criticism from media commentators at times for being aggressive. Gender is pervasive in coverage of Supreme Courts, according to a 2016 analysis of media coverage in five democratic countries. The analysis found that generally women who ask questions like male justices are labeled abrasive, militant, or mean by critics.

Last year, the U.S. Supreme Court introduced a rule that justices will try to give attorneys two minutes to speak without interruption at the start of oral arguments.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Researchers quantify bias in Reddit content sometimes used to train AI

August 9, 2020   Big Data
 Researchers quantify bias in Reddit content sometimes used to train AI

VB Transform

Watch every session from the AI event of the year

On-Demand

Watch Now

In a paper published on the preprint server Arxiv.org, scientists at the King’s College London Department of Informatics used natural language to show evidence of pervasive gender and religious bias in Reddit communities. This alone isn’t surprising, but the problem is that data from these communities are often used to train large language models like OpenAI’s GPT-3. That in turn is important because, as OpenAI itself notes, this sort of bias leads to placing words like “naughty” or “sucked” near female pronouns and “Islam” near words like “terrorism.”

The scientists’ approach uses representations of words called embeddings to discover and categorize language biases, which could enable data scientists to trace the severity of bias in different communities and take steps to counteract this bias. To spotlight examples of potentially offensive content on Reddit subcommunities, given a language model and two sets of words representing concepts to compare and discover biases from, the method identifies the most biased words toward the concepts in a given community. It also ranks the words from the least to most biased using an equation to provide an ordered list and overall view of the bias distribution in that community.

Reddit has long been a popular source for machine learning model training data, but it’s an open secret that some groups within the network are unfixably toxic. In June, Reddit banned roughly 2,000 communities for consistently breaking its rules by allowing users to harass others with hate speech. But in accordance with the site’s policies on free speech, Reddit’s admins maintain they don’t ban communities solely for featuring controversial content, such as those advocating white supremacy, mocking perceived liberal bias, and promoting demeaning views on transgender women, sex workers, and feminists.

To further specify the biases they encountered, the researchers took the negativity and positivity (also called “sentiment polarity”) of biased words into account. And to facilitate analyses of biases, they combined semantically related terms under broad rubrics like “Relationship: Intimate/sexual” and “Power, organizing” that they modeled on the UCREL Semantic Analysis System (USAS) framework for automatic semantic and text tagging. (USAS has a multi-tier structure, with 21 major discourse fields subdivided into fine-grained categories like “People,” “Relationships,” or “Power.”)

One of the communities the researchers examined — /r/TheRedPill, ostensibly a forum for the “discussion of sexual strategy in a culture increasingly lacking a positive identity for men” — had 45 clusters of biased words. (/r/TheRedPill is currently “quarantined” by Reddit’s admins, meaning users have to bypass a warning prompt to visit or join.) Sentiment scores indicated that the first two biased clusters toward women (“Anatomy and Physiology,” “Intimate sexual relationships,” and “Judgement of appearance”) carried negative sentiments, whereas most of the clusters related to men contained neutral or positively connotated words. Perhaps unsurprisingly, labels such as “Egoism” and “Toughness; strong/weak” weren’t even present in female-biased labels.

Another community — /r/Dating_Advice — exhibited negative bias toward men, according to the researchers. Biased clusters included the words “poor,” “irresponsible,” “erratic,” “unreliable,” “impulsive,” “pathetic,” and “stupid,” with words like “abusive” and “egotistical” among the most negative in terms of sentiment. Moreover, the category “Judgment of appearance” was more frequently biased toward men than women, and physical stereotyping of women was “significantly” less prevalent than in /r/TheRedPill.

The researchers chose the community /r/Atheism, which calls itself “the web’s largest atheism forum,” to evaluate religious biases. They note that all the mentioned biased labels toward Islam had an average negative polarity except for geographical names. Categories such as “Crime, law and order,” “Judgement of appearance,” and “Warfare, defense, and the army” aggregated words with evidently negative connotations like “uncivilized,” “misogynistic,” “terroristic,” “antisemitic,” “oppressive,” “offensive,” and “totalitarian.” By contrast, none of the labels were relevant in Christianity-biased clusters, and most of the words in Christianity-biased clusters (e.g., “Unitarian,” “Presbyterian,” “Episcopalian,” “unbaptized,” “eternal”) didn’t carry negative connotations.

The coauthors assert their approach could be applied by legislators, moderators, and data scientists to trace the severity of bias in different communities and to take steps to actively counteract this bias. “We view the main contribution of our work as introducing a modular, extensible approach for exploring language biases through the lens of word embeddings,” they wrote. “Being able to do so without having to construct a-priori definitions of these biases renders this process more applicable to the dynamic and unpredictable discourses that are proliferating online.”

There’s a real and present need for tools like these in AI research. Emily Bender, a professor at the University of Washington’s NLP group, recently told VentureBeat that even carefully crafted language data sets can carry forms of bias. A study published last August by researchers at the University of Washington found evidence of racial bias in hate speech detection algorithms developed by Google parent company Alphabet’s Jigsaw. And Facebook AI head Jerome Pesenti found a rash of negative statements from AI created to generate humanlike tweets that targeted Black people, Jewish people, and women.

“Algorithms are like convex mirrors that refract human biases, but do it in a pretty blunt way. They don’t permit polite fictions like those that we often sustain our society with,” Kathryn Hume, Borealis AI’s director of product, said at the Movethedial Global Summit in November. “These systems don’t permit polite fictions. … They’re actually a mirror that can enable us to directly observe what might be wrong in society so that we can fix it. But we need to be careful, because if we don’t design these systems well, all that they’re going to do is encode what’s in the data and potentially amplify the prejudices that exist in society today.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Researchers discover evidence of gender bias in major computer vision APIs

August 7, 2020   Big Data

VB Transform

Watch every session from the AI event of the year

On-Demand

Watch Now

The computer vision APIs offered by Google, Microsoft, and IBM exhibit gender bias when tested on self-portraits of people wearing partial face masks. That’s according to data scientists at marketing communications agency Wunderman Thompson, who found that popular computer vision services like Cloud Vision API and Azure Cognitive Services Computer Vision more often misidentify the kinds of masks worn during the pandemic as “duct tape” and “fashion accessories” on women as opposed to “beards” and “facial hair” on men.

Ilinca Barsan, director of data science at Wunderman Thompson, wasn’t looking for bias in commercial computer vision APIs. She had intended to build a tool that would allow users to connect to thousands of street cameras around the country and determine the proportion of pedestrians wearing masks at any given time. Google’s Cloud Vision API was supposed to power the tool’s mask detection component, providing labels for elements of images, along with confidence scores associated with those labels.

When Barsan uploaded a photo of herself wearing a mask to test Cloud Vision API’s accuracy, she noticed one unexpected label — “duct tape” — surfaced to the top with 96.57% confidence. (A high confidence score indicates the model believes the label is highly relevant to the image.) Donning a ruby-colored mask returned 87% confidence for “duct tape” and dropped the “mask” label — which had been 73.92% — from the list of labels. A blue surgical mask yielded “duct tape” once again (with a 66% confidence score) and failed to elicit the “mask” label for the second time.

Barsan took this as a sign of bias within the computer vision models underlying Cloud Vision API. She theorized they might be drawing on sexist portrayals of women in the data set on which they were trained — women who had perhaps been victims of violence.

It’s not an unreasonable assumption. Back in 2015, a software engineer pointed out that the image recognition algorithms in Google Photos were labeling his Black friends as “gorillas.” A University of Washington study found women were significantly underrepresented in Google Image searches for professions like “CEO.” More recently, nonprofit AlgorithmWatch showed Cloud Vision API automatically labeled a thermometer held by a dark-skinned person as a “gun” while labeling a thermometer held by a light-skinned person as an “electronic device.”

In response, Google says it adjusted the confidence scores to more accurately reflect when a firearm is in a photo. The company also removed the ability to label people in images as “man” or “woman” with Cloud Vision API because errors had violated Google’s AI principle of not creating biased systems.

To test whether Cloud Vision API might classify appearances differently for mask-wearing men versus mask-wearing women, Barsan and team solicited mask images from friends and colleagues, which they added to a data set of photos found on the web. The final corpora consisted of 265 images of men in masks and 265 images of women in masks in varying contexts, from outdoor pictures and office snapshots with DIY cotton masks to stock images and iPhone selfies showing N95 respirators.

 Researchers discover evidence of gender bias in major computer vision APIs

According to Barsan, out of the 265 images of men in masks, Cloud Vision API correctly identified 36% as containing personal protective equipment (PPE) and seemed to make the association that something covering a man’s face was likely to be facial hair (27% of the images had the label “facial hair”). Around 15% of images were misclassified as “duct tape” with a 92% average confidence score, suggesting it might be an issue for both men and women. But out of the 265 images of women in masks, Cloud Vision API mistook 28% as depicting duct tape with an average confidence score of 93%. It returned “PPE” 19% of the time and “facial hair” 8% of the time.

“At almost twice the number for men, ‘duct tape’ was the single most common ‘bad guess’ for labeling masks [for women],” Barsan said. “The model certainly made an educated guess. Which begged the question — where exactly did it go to school?”

In a statement provided to VentureBeat, Cloud AI director of product strategy Tracy Frey said Google has reached out to Wunderman directly to learn more about the team’s research, methodology, and findings. “Fairness is one of our core AI principles, and we’re committed to making progress in this area. We’ve been working on the challenge of accurately detecting objects for several years, and will continue to do so,” Frey said. “In the last year, we’ve developed tools and data sets to help identify and reduce bias in machine learning models, and we offer these as open source for the larger community so their feedback can help us improve.”

Google isn’t the only vendor with apparent bias in its computer vision models. After testing Cloud Vision API, Barsan and team ran the same data set through IBM’s Watson Visual Recognition service, which returned the label “restraint chains” for 23% of the images of masked women (compared with 10% of the images of men) and “gag” for 23% (compared with 10% of the male images). Furthermore, Watson correctly identified 12% of the men to be wearing masks, while it was only right 5% of the time for the women.

The average confidence score for the “gag” label hovered around 79% for women compared to 75% for men, suggesting that Watson Visual Recognition was more hesitant than Cloud Vision API to assign those labels. IBM declined to comment, but it took issue with the way the data set was compiled, and a spokesperson says the company is conducting tests to find evidence of the bias Barsan claims to have uncovered.

 Researchers discover evidence of gender bias in major computer vision APIs

In a final experiment, Barsan and colleagues tested Microsoft’s Azure Cognitive Services Computer Vision API, which two years ago received an update ostensibly improving its ability to recognize gender across different skin tones. The service struggled to correctly tag masks in pictures, correctly labeling only 9% of images of men and 5% of images of women as featuring a mask. And while it didn’t return labels like “duct tape,” “gag,” or “restraint,” Azure Cognitive Services identified masks as “fashion accessories” for 40% of images of women (versus only 13% of images of men), as “lipstick” for 14% of images of women, and as a beard for 12% of images of men.

Microsoft also declined to comment.

“In terms of research contribution or anything like that, it’s sort of repeating a point that’s been said,” Mike Cook, an AI researcher with a fellowship at Queen Mary University of London, told VentureBeat. “But it’s an interesting point … It made me think a lot about the myth of the ‘good’ data set. Honestly, I feel like some things just cannot hope to have data sets built around them without being hopelessly narrow or biased. It’s all very well to remove the ‘man’ label from a data set, but are there any photos of women with facial hair in that data set, or men with lipstick on? Probably not, because the data set reflects certain norms and expectations that are always aging and becoming less relevant.”

Barsan doesn’t believe the results to be indicative of malicious intent on the part of Google, IBM, or Microsoft, but she says this is yet another example of the prejudices that can emerge in unbalanced data sets and machine learning models. They have the potential to perpetuate harmful stereotypes, she says, reflecting a culture in which violence against women is often normalized and exploited.

“A simple image search of ‘duct tape man’ and ‘duct tape woman’ respectively revealed images of men mostly (though not exclusively) pictured in full-body duct tape partaking in funny pranks, while women predominantly appeared with their mouth duct-taped, many clearly in distress,” Barsan noted. When it came to masks, “Across the board, all three computer vision models performed poorly at the task at hand. However, they were consistently better at identifying masked men than women.”

 Researchers discover evidence of gender bias in major computer vision APIs

That’s certainly not surprising in the context of computer vision, which countless studies have shown to be susceptible to bias. A study last fall by University of Colorado, Boulder researchers showed that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Separate benchmarks of major vendors’ systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) suggest that facial recognition technology exhibits racial and gender bias and facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.

“Beyond damage control and Band-Aid solutions, we must work diligently to ensure that the artificial intelligences we build have the full benefit of our own natural intelligence,” Barsan said. “If our machines are to work accurately and to responsibly reflect society, we must help them understand the social dynamics that we live in, to stop them from reinforcing existing inequalities through automation, and put them to work for good instead … After all, we’d quite like our street-cam analyzer to suggest that 56% of people on the street are staying safe — not being gagged and restrained.”

Via email, Barson later clarified that the street-cam analyzer project was “an internal hypothetical exercise” to provide feedback to people in high-risk categories regarding how safe it might be to go to public places. Out of concern over privacy implications and in light of the bias research she ended up conducting, Barson decided against pursuing it further.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Researchers find evidence of bias in facial expression data sets

July 25, 2020   Big Data
 Researchers find evidence of bias in facial expression data sets

VB Transform

Watch every session from the AI event of the year

On-Demand

Watch Now

Researchers claim the data sets often used to train AI systems to detect expressions like happiness, anger, and surprise are biased against certain demographic groups. In a preprint study published on Arxiv.org, coauthors affiliated with the University of Cambridge and Middle East Technical University find evidence of skew in two open source corpora: Real-world Affective Faces Database (RAF-DB) and CelebA.

Machine learning algorithms become biased in part because they’re provided training samples that optimize their objectives toward majority groups. Unless explicitly modified, they perform worse for minority groups — i.e., people represented by fewer samples. In domains like facial expression classification, it’s difficult to compensate for skew because the training sets rarely contain information about attributes like race, gender, and age. But even those that do provide attributes are typically unevenly distributed.

RAF-DB contains tens of thousands of images from the internet with facial expression and attribute annotations, while CelebA has over 202,599 images of 10,177 people with 40 types of attribute annotations. To determine the extent to which bias existed in either, researchers sampled a random subset and aligned and cropped the images so the faces were consistent with respect to orientation. Then, they used classifiers to measure the accuracy (the fraction of the predictions the model got correct) and fairness (whether the classifier was fair to attributes like gender, age, and ethnicity) — the idea being that the classifiers should provide similar results across different demographic groups.

In the subset of images from RAF-DB, the researchers report the vast majority of subjects — 77.4% — were Caucasian, while 15.5% were Asian and only 7.1% were African American. The subset showed gender skew as well, with 56.3% females and 43.7% male subjects. Accuracy unsurprisingly ranged from low for some minority groups (59.1% for Asian females and 61.6% for African American females) to high for majorities (65.3% for Caucasian males), and on the fairness metric, the researchers found it to be low for race (88.1%) but high overall for gender (97.3%).

On the CelebA subset, the researchers trained a simpler classifier to distinguish between two classes of people: smiling and non-smiling. They note that the data set had substantial skew, with only 38.6% of not-smiling males compared with 61.4% of not-smiling females. The classifier was 93.7% accurate for younger females but less so for older males (90.7%) and females (92.1%) as a result of this, which while not statistically significant is an indication of poor distribution, according to the researchers.

“To date, there exists a large variety and number of data sets for facial expression recognition tasks. However, virtually none of these data sets have been acquired with consideration of containing images and videos that are evenly distributed across the human population in terms of sensitive attributes such as gender, age and ethnicity,” the coauthors wrote.

The evident bias in facial expression data sets underlines the need for regulation, many would argue. At least one AI startup specializing in affect recognition — Emteq — has called for laws to prevent misuse of the tech. A study commissioned by the Association for Psychological Science noted that because emotions are expressed in a range of ways, it’s hard to infer how someone feels from their expressions. And the AI Now Institute, a research institute based at New York University studying AI’s impact on society, warned in a 2019 report that facial expression classifiers were being unethically used to make hiring decisions and set insurance prices.

“At the same time as these technologies are being rolled out, large numbers of studies are showing that there is … no substantial evidence that people have this consistent relationship between the emotion that you are feeling and the way that your face looks,” AI Now cofounder Kate Crawford told the BBC in a recent interview.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

AI needs systemic solutions to systemic bias, injustice, and inequality

July 18, 2020   Big Data
 AI needs systemic solutions to systemic bias, injustice, and inequality

VB Transform

Watch every session from the AI event of the year

On-Demand

Watch Now

Watch all the Transform 2020 sessions on-demand right here.


At the Diversity, Equity, and Inclusion breakfast at VentureBeat’s AI-focused Transform 2020 event, a panel of AI practitioners, leaders, and academics discussed the changes that need to happen in the industry to make AI safer, more equitable, and more representative of the people to whom AI is applied.

The wide-ranging conversation was hosted by Krystal Maughan, a Ph.D. candidate at the University of Vermont, who focuses on machine learning, differential privacy, and provable fairness. The group discussed the need for higher accountability from tech companies, inclusion of multiple stakeholders and domain experts in AI decision making, practical ways to adjust AI project workflows, and representation at all stages of AI development and at all levels — especially where the power brokers meet. In other words, although there are systemic problems, there are systemic solutions as well.

Tech company accountability

The old Silicon Valley mantra “move fast and break things” has not aged well in the era of AI. It presupposes that tech companies exist in some sort of amoral liminal space, apart from the rest of the world where everything exists in social and historical contexts.

“We can see all around the world that tech is being deployed in a way that’s pulling apart the fabric of our society. And I think the reason why is because … tech companies historically don’t see that they’re part of that social compact that holds society together,” said Will Griffin, chief ethics officer at Hypergiant.

Justin Norman, vice president of data science at Yelp, agreed, pointing out the power that tech companies wield because they possess tools that can be incredibly dangerous. “And so not only do they have an ethical responsibility, which is something they should do before they’ve done anything wrong, they also have a responsibility to hold themselves accountable when things go wrong.”

But, Norman added, we — all of us, the global community — have a responsibility here as well. “We don’t want to simply accept that any kind of corporation has unlimited power against us, any government has unlimited power over us,” he said, asserting that people need to educate themselves about these technologies so when they encounter something dubious, they know when to push back.

Both Griffin and Ayodele Odubela, a data scientist at SambaSafety, pointed out the strength of the accountability that communities can bring to bear on seemingly immovable institutions. Griffin called Black Lives Matter activists “amazing.” He said, “Those kids are right now the leaders in AI as well, because they’re the ones who identified that law enforcement was using facial recognition, and through that pressure on institutional investors — who were the equity holders of these large corporations — it forced IBM to pull back on facial recognition, and that forced Microsoft and Amazon to follow suit.” That pressure, which surged in the wake of the police killing of George Floyd, has apparently also begun to topple the institution of law enforcement as we know it by amplifying the movement to defund the police.

Odubela sees the specter of law enforcement’s waning power as an opportunity for good. Defunding the police actually means funding things like social services, she argues. “One of the ideas I really like is trying to take some of these biased algorithms and really repurpose them to understand the problems that we may be putting on the wrong kind of institutions,” she said. “Look at the problems we’re putting on police forces, like mental illness. We know that police officers are not trained to deal with people who have mental illnesses.”

These social and political victories should ideally lead to policy changes. In response to Maughan’s question about what policy changes could encourage tech companies to get serious about addressing bias in AI, Norman pulled it right back to the responsibility of citizens in communities. “Policy and law tell us what we must do,” he said. “But community governance tells us what we should do, and that’s largely an ethical practice.”

“I think that when people approach issues of diversity, or they approach issues of ethics in the discipline, they don’t appreciate the challenge that we’re up against, because … engineering and computer science is the only discipline that has this much impact on so many people that does not have any ethical reasoning, any ethical requirements,” Griffin added. He contrasted tech with fields like medicine and law, which have made ethics a core part of their educational training for centuries, and where practitioners are required to hold a license issued by a governing body.

Where it hurts

Odubela took these thoughts a step beyond the need for policy work by saying, “Policy is part of it, but a lot of what will really force these companies into caring about this is if they see financial damages.”

For businesses, their bottom line is where it hurts. One could argue that it’s almost crass to think about effecting change through capitalist means. On the other hand, if companies are profiting from questionable or unjust artificial intelligence products, services, or tools, it follows that justice could come by eliminating that incentive.

Griffin illustrated this point by talking about facial recognition systems that big tech companies have sold, especially to law enforcement agencies — how none of them were vetted, and now the companies are pulling them back. “If you worked on computer vision at IBM for the last 10 years, you just watched your work go up in smoke,” he said. “Same at Amazon, same at Microsoft.”

Another example Griffin gave: A company called Practice Fusion digitizes electronic health records (EHR) for smaller doctors’ offices and medical practices and runs machine learning on those records, as well as other outside data, and helps provide prescription recommendations to caregivers. AllScripts bought Practice Fusion for $ 100 million in January 2018. But a Department of Justice (DoJ) investigation discovered that Practice Fusion was getting kickbacks from a major opioid company in exchange for recommending those opioids to patients. In January 2020, the DoJ levied a $ 145 million fine in the case. On top of that, as a result of the scandal, “AllScripts’ market cap dropped in half,” Griffin said.

“They walked themselves straight into the opioid crisis. They used AI really in the worst way you can use AI,” he added.

He said that although that’s one specific case that was fully litigated, there are more out there. “Most companies are not vetting their technologies in any way. There are land mines — AI land mines — in use cases that are currently available in the marketplace, inside companies, that are ticking time bombs waiting to go off.”

There’s a reckoning growing on the research side, too, as in recent weeks both the ImageNet and 80 Million Tiny Images data sets have been called to account over bias concerns.

It takes time, thought, and expense to ensure that your company is building AI that is just, accurate, and as free of bias as possible, but the “bottom line” argument for doing so is salient. Any AI system failures, especially around bias, “cost a lot more than implementing this process, I promise you,” Norman said.

Practical solutions: workflows and domain experts

These problems are not intractable, much as they may seem. There are practical solutions companies can employ, right now, to radically improve the equity and safety in the ideation, design, development, testing, and deployment of AI systems.

A first step is bringing in more stakeholders to projects, like domain experts. “We have a pretty strong responsibility to incorporate learnings from multiple fields,” Norman said, noting that adding social science experts is a great complement to the skill sets that practitioners and developers possess. “What we can do as a part of our own power as people who are in the field is incorporate that input into our designs, into our code reviews,” he said. At Yelp, they require that a project passes an ethics and diversity check at all levels of the process. Norman said that as they go, they’ll pull in a data expert, someone from user research, statisticians, and those who work on the actual algorithms to add some interpretability. If they don’t have the right expertise in-house, they’ll work with a consultancy.

“From a developer standpoint, there actually are tools available for model interpretability, and they’ve been around for a long time. The challenge isn’t necessarily always that there isn’t the ability to do this work — it’s that it’s not emphasized, invested in, or part of the design development process,” Norman said. He added that it’s important to make space for the researchers who are studying the algorithms themselves and are the leading voices in the next generation of design.

Griffin said that Hypergiant has a heuristic for its AI projects called “TOME,” for “top of mind ethics,” which they break down by use case. “With thus use case, is there a positive intent behind the way we intend to use the technology? Step two is where we challenge our designers, our developers, [our] data scientists … to broaden their imaginations. And that is the categorical imperative,” he said. They ask what the world would look like if everyone in their company, the industry, and in the world used the technology for this use case — and they ask if that is desirable. “Step three requires people to step up hardcore in their citizenship role, which is [asking the question]: Are people being used as a means to an end, or is this use case designed to benefit people?”

Yakaira Núñez, a senior director at Salesforce, said there’s an opportunity right now to change the way we do software development. “That change needs to consider the fact that anything that involves AI is now a systems design problem,” she said. “And when you’re embarking upon a systems design problem, then you have to think of all of the vectors that are going to be impacted by that. So that might be health care. That might be access to financial assistance. That might be impacts from a legal perspective, and so on and so forth.”

She advocates to “increase the discovery and the design time that’s allocated to these projects and these initiatives to integrate things like consequence scanning, like model cards, and actually hold yourself accountable to the findings … during your discovery and your design time. And to mitigate the risks that are uncovered when you’re doing the systems design work.”

Odubela brought up the issue of how to uncover the blind spots we all have. “Sometimes it does take consulting with people who aren’t like us to point these [blind spots] out,” she said. “That’s something that I’ve personally had to do in the past, but taking that extra time to make sure we’re not excluding groups of people, and we’re not baking these prejudices that already exist in society straight into our models — it really does come [down] to relying on other people, because there are some things we just can’t see.”

Núñez echoed Odubela, noting that “As a leader you’re responsible for understanding and reflecting, and being self aware enough to know that you have your biases. It’s also your responsibility to build a board of advisors that keeps you in check.”

“The key is getting it into the workflows,” Griffin noted. “If it doesn’t get into the workflow, it doesn’t get into the technology; if it doesn’t get into the technology, it won’t change the culture.”

Representation

Not much of this is possible, though, without improved representation of underrepresented groups in critical positions. As Griffin pointed out, this particular panel comprises leaders who have the decision-making power to implement practical changes in workflows right away. “Assuming that [the people on this panel] are in a position to flat-out stop a use case, and say ‘Listen, nope, this doesn’t pass muster, not happening’ — when developers, designers, data scientists know that they can’t run you over, they think differently,” he said. “All of a sudden everyone becomes a brilliant philosopher. Everybody’s a social scientist. They figure out how to think about people when they know their work will not go forward.”

But that’s not the case within enough companies, even though it’s critically important. “The subtext here is that in order to execute against this, this also means that you have to have a very diverse team applying the lens of the end user, the lens of those impacted into that development lifecycle. Checks and balances have to be built in from the start,” Núñez said.

Griffin offered an easy-to-understand benchmark to aim for: “For diversity and inclusion, when you have African Americans who have equity stakes in your company — and that can come in the form of founders, founding teams, C-suite, board seats, allowed to be investors — when you have diversity at the cap table, you have success.”

And that needs to happen fast. Griffin said that although he’s seeing lots of good programs and initiatives coming out of the companies whose boards he sits on, like boot camps, college internships, and mentorship programs, they’re not going to be immediately transformative. “Those are marathons,” he said. “But nobody on these boards I’m with got into tech to run a marathon — they got in to run a sprint. … They want to raise money, build value, and get rewarded for it.”

But we are in a unique moment that portends a wave of change. Griffin said, “I have never in my lifetime seen a time like the last 45 days, where you can actually come out, use your voice, have it be amplified, without the fear that you’re going to be beaten back by another voice saying, ‘We’re not thinking about that right now.’ Now everybody’s thinking about it.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

New benchmark measures gender bias in speech translation systems

June 12, 2020   Big Data
 New benchmark measures gender bias in speech translation systems

A preprint paper published by University of Trento researchers proposes a benchmark — MuST-SHE — to evaluate whether speech translation systems fed textual data are constrained by the fact that sentences sometimes omit gender identity clues. The coauthors assert that these systems can and do exhibit gender bias, and that signals beyond text (like audio) provide contextual clues that might reduce this bias.

In machine translation, gender bias is at least partially attributable to the differences in how languages express female and male gender. Those with a grammatical system of gender, such as Romance languages, rely on a copious set of inflection and gender agreement devices that apply to individual parts of speech. That’s untrue of English, for instance, which is a “natural gender” language — it reflects distinction of sex only via pronouns, inherently gendered words (e.g., “boy,” “girl”), and marked nouns (“actor,” “actress”).

AI translation systems that fail to pick up on the nuances threaten to perpetuate under- or misrepresentation of demographic groups. Motivated by this, the researchers created MuST-SHE, a multilingual test set designed to uncover gender bias in speech translation.

MuST-SHE is a subset of TED talks comprising roughly 1,000 audio recordings, transcripts, and translations in English-French and English-Italian pairs from the open source MuST-C corpus, annotated with qualitatively differentiated and balanced gender-related phenomena. It’s subdivided into two categories:

VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free livestream.

  • Category 1: Samples where the necessary information to disambiguate gender can be recovered from the audio signal, when gender agreement depends only on the speaker’s gender.
  • Category 2: Samples where the disambiguating information can be recovered from the utterance content, where contextual hints such as gender-exclusive words (“mom”), pronouns (“she,” “his”), and proper nouns (“Paul”) inform about gender.

For each reference in the corpus, the researchers created a “wrong” one identical to the original except for the morphological signals conveying gender agreement. The result was a new set of references that are “wrong” compared with the correct ones in regard to the formal expression of gender, the idea being that the difference can be used to measure a speech recognition systems’ ability to handle gender phenomena.

In a series of experiments, the researchers created three speech recognition systems:

  • End2End, which was trained on the MuST-C and open source Librispeech data sets, augmented by automatically translating the original English transcripts into target languages.
  • Cascade, which shares the same core technology as End2End but which was trained on 70 million language pairs for English-Italian and 120 million for English-French from the OPUS repository, after which it was fine-tuned on MuST-C training data.
  • Cascade+Tag, a model identical to Cascade excepting tags added to the training data that indicate a speaker’s gender.

Interestingly, the researchers found that injecting gender information into Cascade didn’t have a measurable effect when evaluated on MuST-SHE. The difference values between the original and “wrong” references in the data set implied that all three systems were biased toward masculine forms.

When it came to the categories, Cascade performed the worst on Category 1 because it couldn’t access the speaker’s gender information it needed for a correct translation. End2End leveraged audio features to accurately translate gender, by contrast, but it exhibited the worst performance on Category 2 data — perhaps because it was trained on a fraction of the data used in Cascade and Cascade+Tag.

“If, like human beings, ‘machine learning is what it eats,’ the different ‘diet’ of machine translation and speech translation models can help them develop different skills,” wrote the researchers. “By ‘eating’ audio-text pairs, speech translation has a potential advantage: the possibility to infer speakers’ gender from input audio signals.”

The paper’s publication comes after Google introduced gender-specific translations in Google Translate chiefly to address gender bias. Scientists have proposed a range of approaches to mitigate and measure it, most recently with a leaderboard, challenge, and set of metrics dubbed StereoSet.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More
« Older posts
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited