• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Tag Archives: clips

Researchers design AI that can infer whole floor plans from short video clips

January 7, 2021   Big Data

Transform 2021

Join us for the world’s leading event about accelerating enterprise transformation with AI and Data, for enterprise technology decision-makers, presented by the #1 publisher in AI and Data

Learn More


Floor plans are useful for visualizing spaces, planning routes, and communicating architectural designs. A robot entering a new building, for instance, can use a floor plan to quickly sense the overall layout. Creating floor plans typically requires a full walkthrough so 3D sensors and cameras can capture the entirety of a space. But researchers at Facebook, the University of Texas at Austin, and Carnegie Mellon University are exploring an AI technique that leverages visuals and audio to reconstruct a floor plan from a short video clip.

The researchers assert that audio provides spatial and semantic signals complementing the mapping capabilities of images. They say this is because sound is inherently driven by the geometry of objects. Audio reflections bounce off surfaces and reveal the shape of a room, far beyond a camera’s field of view. Sounds heard from afar — even multiple rooms away — can reveal the existence of “free spaces” where sounding objects might exist (e.g., a dog barking in another room). Moreover, hearing sounds from different directions exposes layouts based on the activities or things those sounds represent. A shower running might suggest the direction of the bathroom, for example, while microwave beeps suggest a kitchen.

The researchers’ approach, which they call AV-Map, aims to convert short videos with multichannel audio into 2D floor plans. A machine learning model leverages sequences of audio and visual data to reason about the structure and semantics of the floor plan, finally fusing information from audio and video using a decoder component. The floor plans AV-Map generates, which extend significantly beyond the area directly observable in the video, show free space and occupied regions divided into a discrete set of semantic room labels (e.g., family room and kitchen).

 Researchers design AI that can infer whole floor plans from short video clips

The team experimented with two settings, active and passive, in digital environments from the popular Matternet3D and SoundSpaces datasets loaded into Facebook’s AI Habitat. In the first, they used a virtual camera to emit a known sound while it moved throughout the room of a model home. In the second, they relied only on naturally occurring sounds made by objects and people inside the home.

Across videos recorded in 85 large, real-world, multiroom environments within AI Habitat, the researchers say AV-Map not only consistently outperformed traditional vision-based mapping but improved the state-of-the-art technique for extrapolating occupancy maps beyond visible regions. With just a few glimpses spanning 26% of an area, AV-Map could estimate the whole area with 66% accuracy.

“A short video walk through a house can reconstruct the visible portions of the floorplan but is blind to many areas. We introduce audio-visual floor plan reconstruction, where sounds in the environment help infer both the geometric properties of the hidden areas as well as the semantic labels of the unobserved rooms (e.g., sounds of a person cooking behind a wall to the camera’s left suggest the kitchen),” the researchers wrote in a paper detailing AV-Map. “In future work, we plan to consider extensions to multi-level floor plans and connect our mapping idea to a robotic agent actively controlling the camera … To our knowledge, ours is the first attempt to infer floor plans from audio-visual data.”

VentureBeat

VentureBeat’s mission is to be a digital townsquare for technical decision makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you,
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more.

Become a member

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Microsoft’s AI determines whether statements about video clips are true

March 28, 2020   Big Data

In a paper published on the preprint server Arxiv.org, researchers affiliated with Carnegie Mellon, the University of California at Santa Barbara, and Microsoft’s Dynamics 365 AI Research describe a challenge — video-and-language inference — that tasks AI with inferring whether a statement is entailed or contradicted by a given video clip. The idea is to spur investigations into video-and-language understanding, they say, which could enhance tools used in the enterprise for automatic meeting transcription.

As the researchers explain, video-and-language inference requires a thorough interpretation of both visual and textual clues. They to this end introduce a video data set comprising realistic scenes paired with statements from crowdsourced workers via Amazon Mechanical Turk, who watched the videos accompanied by subtitles. The workers wrote statements based on their understanding of both the videos and subtitles, which not only describe explicit information in the video (e.g., objects, locations, characters, and social activity) but that also reveal comprehension of complex plots (understanding events, interpreting human emotions and relations, and inferring causal relations of events).

In total, the data set contains over 95,322 video-statement pairs and 15,887 movie clips from YouTube and TV series — including Friends, Desperate Housewives, How I Met Your Mother, and Modern Family — spanning over 582 hours. Each roughly 30-second video is paired with six either positive or negative statements that identify characters, recognize actions, reason about conversations, infer reasons, or make reference to human dynamics. (In order to prevent bias from creeping in, when collecting negative statements, the researchers asked annotators to use a positive statement as a reference and only modify a small portion of it to make it negative.)

 Microsoft’s AI determines whether statements about video clips are true

To benchmark the data set, the coauthors used a bi-directional long short-term memory model, a type of AI model capable of learning long-term dependencies, to encode video features as numerical representations. A separate model encoded statements and subtitles. Given a video, subtitle, and statement, yet another model — which was trained on 80% of the data set, with 10% reserved for validation and 10% for testing — determined whether the statement entailed or contradicted the video and subtitles. They say that the best-performing baseline achieved 59.45% accuracy, compared with human evaluators’ 85.20% accuracy.

 Microsoft’s AI determines whether statements about video clips are true

“The gap between the baseline models and human performance is significant. We encourage the community to participate in this task and invent stronger methods to push the state of the art on multimodal inference,” wrote the researchers. “Possible future directions include developing models to localize key frames, as well as better utilizing the alignment between video and subtitles to improve reasoning ability.”

The research follows a study by Microsoft Research Asia and Harbin Institute of Technology that sought to generate live video captions with AI by capturing the representations among comments, video, and audio. The system — the code for which is available on GitHub — matches the most relevant comments with videos from a candidate set so that it jointly learns cross-modal representations.

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

DeepMind’s AI learns to generate realistic videos by watching YouTube clips

July 20, 2019   Big Data

Perhaps you’ve heard of FaceApp, the mobile app that taps AI to transform selfies, or This Person Does Not Exist, which surfaces computer-generated photos of fictional people. But what about an algorithm whose videos are wholly novel? One of the newest papers from Google parent company Alphabet’s DeepMind (“Efficient Video Generation on Complex Datasets”) details recent advances in the budding field of AI clip generation. Thanks to “computationally efficient” components and techniques and a new custom-tailored data set, researchers say their best-performing model — Dual Video Discriminator GAN (DVD-GAN) — can generate coherent 256 x 256-pixel videos of “notable fidelity” up to 48 frames in length.

“Generation of natural video is an obvious further challenge for generative modeling, but one that is plagued by increased data complexity and computational requirements,” wrote the coauthors. “For this reason, much prior work on video generation has revolved around relatively simple data sets, or tasks where strong temporal conditioning information is available. We focus on the tasks of video synthesis and video prediction … and aim to extend the strong results of generative image models to the video domain.”

The team built their system around a cutting-edge AI architecture and introduced video-specific tweaks that enabled it to train on Kinetics-600, a data set of natural videos “an order of magnitude” larger than commonly used corpora.  Specifically, the researchers leveraged scaled-up generative adversarial networks, or GANs — two-part AI systems consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples — that have historically been applied to tasks like converting captions to scene-by-scene storyboards and generating images of artificial galaxies. The flavor here was BigGANs, which are distinguished by their large batch sizes and millions of parameters.

 DeepMind’s AI learns to generate realistic videos by watching YouTube clips

Above: A set of four-second synthesized video clips trained on 12 128×128 frames from Kinetics-600.

Image Credit: DeepMind

DVD-GAN contains dual discriminators: a spatial discriminator that critiques a single frame’s content and structure by randomly sampling full-resolution frames and processing them individually, and a temporal discriminator that provides a learning signal to generate movement. A separate module — a Transformer — allowed learned information to propagate across the entire AI model.

As for the training data set (Kinetics-600), which was compiled from 500,000 10-second high-resolution YouTube clips originally curated for human action recognition, the researchers describe it as “diverse” and “unconstrained,” which they claim obviated concerns about overfitting. (In machine learning, overfitting refers to models that correspond too closely to a particular set of data and as a result fail to predict future observations reliably.)

The team reports that after being trained on Google’s AI-accelerating third-generation Tensor Processing Units for between 12 and 96 hours, DVD-GAN managed to create videos with object composition, movement, and even complicated textures like the side of an ice rink. It struggled to create coherent objects at higher resolutions where movement consisted of a much larger number of pixels, but the researchers note that, evaluated on UCF-101 (a smaller data set of 13,320 videos of human actions), DVD-GAN produced samples with a state-of-the-art Inception Score of 32.97.

 DeepMind’s AI learns to generate realistic videos by watching YouTube clips

“We further wish to emphasize the benefit of training generative models on large and complex video data sets, such as Kinetics-600,” wrote the coauthors. “We envisage the strong baselines we established on this data set with DVD-GAN will be used as a reference point by the generative modeling community moving forward. While much remains to be done before realistic videos can be consistently generated in an unconstrained setting, we believe DVD-GAN is a step in that direction.”

Let’s block ads! (Why?)

Big Data – VentureBeat

Read More

Paramore’s ‘Misery Business’ in movie clips

July 10, 2016   Humor

Posted by Krisgo

Like this:

Like Loading…

 Paramore’s ‘Misery Business’ in movie clips

About Krisgo

I’m a mom, that has worn many different hats in this life; from scout leader, camp craft teacher, parents group president, colorguard coach, member of the community band, stay-at-home-mom to full time worker, I’ve done it all– almost! I still love learning new things, especially creating and cooking. Most of all I love to laugh! Thanks for visiting – come back soon icon smile Paramore’s ‘Misery Business’ in movie clips

This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.
Recommended article from FiveFilters.org: Most Labour MPs in the UK Are Revolting.

Deep Fried Bits

Read More
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited