Python PyAudio Voice Activity Detection

Target-Speaker Voice Activity Detection with Chunk-Level Speaker Queries

Abstract: Target speaker voice activity detection (TS-VAD) is a powerful approach for refining the outputs of diarization systems by re-estimating each speaker’s activity conditioned on that speaker’s ...

IEEE

EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors

Abstract: Voice activity detection (VAD) is essential in speech-based systems, but traditional methods detect only speech presence without identifying speakers. Target-speaker VAD (TS-VAD) extends ...

digitalmarketreports

Speechify Launches Windows App With On Device AI For Dictation And Text To Speech

Speechify has released a native Windows application that enables dictation and text-to-speech features using locally stored AI models, expanding its platform to desktop users. The app allows users to ...

TechCrunch

Speechify’s Windows app uses local models for transcription and dictation

Voice AI company Speechify just launched a native Windows app that employs locally stored models to enable dictation across apps, and reading aloud articles, documents, or PDFs using its library of ...

marktechpost

Google Releases Gemini 3.1 Flash Live: A Real-Time Multimodal Voice Model for Low-Latency Audio, Video, and Tool Use for AI Agents

Google has released Gemini 3.1 Flash Live in preview for developers through the Gemini Live API in Google AI Studio. This model targets low-latency, more natural, and more reliable real-time voice ...

Semiconductor Engineering

Rethinking Voice AI At The Edge: A Practical Offline Pipeline

Cloud-based AI dominates the headlines, but responsive and private interaction lies at the edge. This blog post shows how to build a fully offline, real-time voice assistant using the Arm-based NVIDIA ...

Putnam Daily Voice

Retired Explosions Detection K9 Who Helped US Secret Service, Served Westchester PD Dies

The department announced Monday, March 9, the passing of Ellwood, a retired K9 who served with the Westchester County Police Department from 2013 to 2021. Ellwood worked alongside his handler, ...

GitHub

ElBruno.Realtime

A pluggable real-time audio conversation framework for .NET, following Microsoft.Extensions.AI patterns. Build voice-powered apps with local STT, TTS, VAD, and any LLM — all running on your machine, ...

Forbes

How Large Scale Speech Models Will Impact Voice AI

Gautam Jha is the Co-Founder & CTO of Kalpa Labs, an SF-based YC backed startup building large scale Foundational speech models. Voice is quickly becoming a primary interface for enterprise software, ...

GitHub

️ Voice Activity Detector

You can't feed a 10-minute audio file to most AI/ML models at once. You need to cut it into small pieces of 3–10 seconds. Doing this manually is painful and error-prone.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results