Skip to content

Roborregos@Home Docs

Improved speech-to-text module

Roborregos@Home Docs

Welcome to RoBorregos @Home
Overview
Overview
Publications
Areas
Areas
Team Members
Advances over the years
Advances over the years
- 2025
  2025
  - Navigation
  - Computer Vision
    Computer Vision
    
    Computer Vision
    
    Architecture Overview
    
    Vision Exercises
    
    OnBoarding
    
    Human Analysis
    Human Analysis
    
    Clothing Detection
    
    Face Recognition
    
    Person Tracking
    
    Poses and Gestures
    
    Object Detection
    Object Detection
    
    Dataset Generation Pipeline
    
    Shelf Detection
    
    Zero-Shot Object Detector
    
    VLM
    VLM
    
    Moondream
  - Human Robot Interaction
    Human Robot Interaction
    
    HRI 2025 Summary
    
    Areas
    Areas
    
    Command Interpreter
    
    OpenWakeWord
    
    Speech pipeline upgrades
    
    RAG
  - Manipulation
    Manipulation
    
    Architecture
    
    Manipulation Onboarding Guide
- 2024
  2024
  - Achievements from 2024
  - Manipulation
  - Navigation
  - Computer Vision
    Computer Vision
    
    Computer Vision
    
    Human Analysis
    Human Analysis
    
    Face detection and recognition
    
    Person Counting and Finding
    
    Person Description
    
    Person Tracking
    
    Object Detection
    Object Detection
    
    Dataset generation
    
    Seat detection
    
    Shelf Object detection
    
    Utils
    Utils
    
    ZED_Simulation
  - Human Robot Interaction
    Human Robot Interaction
    
    HRI 2024 Summary
    
    Areas
    Areas
    
    Keyword Spotting (KWS)
    
    Local command extraction
    
    Local TTS
    
    Respeaker
    
    RoboMetrics
    
    Improved speech-to-text module
    
    Speech and NLP pipeline upgrades
    
    Local Entities Similarity
  - Integration
    Integration
    
    Integration
    
    Project Structure
    
    Troubleshooting
    Troubleshooting
    
    Javier AGX Flashing - Board ID not recognized
- 2023
  2023
  - Achievements from 2023
  - Computer Vision
  - Electronics and Control
  - Integration and Networks
  - Mechanics
  - Navigation
  - Human Robot Interaction
    Human Robot Interaction
    
    Human Robot Interaction
    
    Human Physical Analysis
    Human Physical Analysis
    
    Face following
    
    Robot Interface
    Robot Interface
    
    Display
    
    Speech
    Speech
    
    NATURAL LANGUAGE PROCESSING (NLP)
    
    Human Speech Processing
  - Manipulation
    Manipulation
    
    Manipulation
    
    Pick and Place
    
    Planning and Hardware
- 2022
  2022
  - Achievements from 2022 - June 2023
  - Computer Vision
    Computer Vision
    
    Computer Vision
    
    Human Analysis
    Human Analysis
    
    Overview
    
    Pose Estimation with MediaPipe
    
    Object Detection
    Object Detection
    
    Overview
    
    Dataset Automatization
    
    Custom Models
    Custom Models
    
    TensorFlow Lite Model Maker
    
    YOLOv5
  - Electronics and Control
    Electronics and Control
    
    Control
    
    Electronics
    
    Boards
    Boards
    
    Boards
  - Human Robot Interaction
    Human Robot Interaction
    
    Human Robot Interaction
    
    Speech
    Speech
    
    Overview
    
    GPT-3 API
    
    Speech To text
    
    Text To Speech
  - Integration and Networks
    Integration and Networks
    
    Overview
    
    Jetson Nano Setup
    
    Jetson AGX Xavier
    
    Network
  - Mechanics
    Mechanics
    
    DashGO x ARM
    DashGO x ARM
    
    Dash Go + xARM
    
    RBGS
    RBGS
    
    Base Omnidireccional
Resources
Resources
- Codelabs
  Codelabs
  - @Home Codelabs
  - ROS2 @Home Guide
  - General
    General
    
    Tailscale Installation & Usage Guide
    
    Using Tmux
  - Hri
    Hri
    
    HRI Display Guide
- Onboarding
  Onboarding
  - Onboarding
  - Vision
    Vision
    
    Architecture Overview
    
    OnBoarding
Development
Development
- Development
- HRI
  HRI
  - Weekly Spotlights
- Electronics
  Electronics
  - Weekly Spotlights
- Integration
  Integration
  - Integration Overview
  - Weekly Spotlights
  - Task Breakdown
    Task Breakdown
    
    Tasks per area
    
    Clean Table
    
    Enhanced General Purpose Service Robot
    
    Receptionist
    
    Restaurant
    
    Serve Breakfast
    
    Stickler for the Rules
    
    Storing Groceries
    
    Give me a Hand
    Give me a Hand
    
    Description
    
    To Do Tasks - Give me a Hand
    
    Gpsr
    Gpsr
    
    General Purpose Serivce Robot
    
    Functions for GPSR
    
    Command Break Down
    
    Proposed API for GPSR
- Manipulation
  Manipulation
  - Area Overview
  - Weekly Spotlights
- Mechanics
  Mechanics
  - Overview
  - Weekly Spotlights
- Navigation
  Navigation
- Omnibase
  Omnibase
- Vision
  Vision
  - Node Overview
  - Weekly Spotlights

Improved speech-to-text module

Migration to Faster-Whisper after benchmarking:
Improved speed (halved translation time).
Higher accuracy in noisy environments.
Dynamic Integration of "Hot Words":
Context-specific vocabulary dynamically adjusted.
Increases robustness and accuracy for uncommon terms.

STT Benchmark

File (10s)	Size (MB)	Faster-whisper accuracy	Time (s)	Whisper accuracy	Time (s)
test1.wav	1.22	85.7%	0.64	71.4%	1.25
test2.wav	1.22	77.8%	0.71	33.3%	1.44
test3.wav	1.22	71.4%	0.66	57.1%	1.13
test4.wav	1.22	80%	0.70	60%	1.36
test5.wav	1.53	71.4%	4.68	71.4%	4.5
test6.wav	1.83	42.9%	0.63	28.6%	1.03
test7.wav	1.83	90%	0.64	90%	0.87
test8.wav	1.83	83.3%	0.61	66.7%	0.99
test9.wav	1.83	100%	0.62	100%	0.94
test10.wav	1.83	100%	0.58	100%	0.77