Introduction
Welcome to the official website of DuRT.
The name DuRT stands for Du(Do) Speech Recognition & Transcription.
DuRT is a software that provides various features based on speech recognition.
Use Cases
Learning foreign languages; Online courses; Online meetings; File transcription; Assist video editing, etc.
Break language barriers: Both recognition and text processing features of DuRT support dozens of current mainstream languages to enhance your learning, work, and daily life.
Enhance productivity and learning: DuRT utilizes accurate technology to implement recognition and text processing features.
Installation
Install from Mac App Store
Core Features
Real-Time Speech Recognition
- Real-time speech recognition: Multiple approaches for powerful speech-to-text conversion
- Audio sources: Capture audio from system output, built-in microphone, or external microphone
- Floating window display: Customize floating window settings according to your preferences
- Extensive language support: Covers dozens of mainstream languages
- Result saving: Save both audio and recognition results for later review
- History records: Store recognition logs for future reference
- Text processing integration: Break language barriers
Preview:
Click to view image

File Transcription
- Multi-format support: Convert audio from various video/audio files into subtitles
- Broad language coverage: Supports dozens of mainstream languages
- Subtitle editing: Intuitive subtitle editing tools
- History records: Store transcription logs for future access
- Text processing integration: Enable bilingual subtitles and more
Preview:
Click to view image

Text Processing
- Translation: Convert real-time recognition or file transcription results between languages
- Text polishing: Fix typos, add punctuation, and refine recognition outputs
- Custom configurations: Flexible settings for creative workflows
Preview:
Click to view image

System Requirements
Currently only available for Mac (macOS 13.0+), requires Apple Silicon (M-series chips).
Two speech recognition modes:
- Apple Recognition: Uses macOS built-in speech recognition (low resource usage)
- Whisper Recognition: Utilizes local Whisper models (requires significant memory/CPU/GPU resources). Memory usage ≈ 1.5-2x model size.