Skip to main content

User Guide

DuRT currently offers real-time speech recognition, file transcription, text processing, subtitle editing, result saving and other features.

This guide provides a brief introduction to using DuRT, including pre-use considerations and explanations of its main functionalities.

Real-time Speech Recognition

Preview:

Click to view image
Speech Recognition Interface

DuRT currently supports two real-time speech recognition methods: Apple Speech Recognition and Whisper Recognition.

FeatureAppleWhisper
QualityGoodGood
Model DownloadNot neededRequired
PunctuationSupportedSupported
SpeedReal-timeNear real-time
Supported LanguagesLimited30+
Language Switching Mid-recognitionNot supportedSupported
Local-only OperationYesYes

Apple Speech Recognition

Apple Recognition utilizes macOS's built-in speech recognition service.

This method provides real-time speech-to-text conversion. The main limitation is restricted language support. Users need to download language packs in system settings to access supported languages.

Whisper Recognition

DuRT implements Whisper Recognition to achieve near real-time speech recognition.

Advantages include support for dozens of languages and seamless recognition of multiple languages within a single video. The tradeoffs are model download requirements and a 2-3 second processing delay.

File Transcription

In the transcription interface, users can select audio/video files through file selection or drag-and-drop operations.

File Transcription exclusively uses Whisper speech recognition technology.

Preview:

Click to view image
File Transcription Interface

Text Processing

The text processing service leverages large language models to achieve various effects including translation, text polishing, Q&A, etc.

Both real-time recognition and file transcription results can be processed through this service for enhanced output display and saving.

Service Configuration

In the service configuration interface, users can manage speech services and text processing services.

Preview:

Click to expand image
Image description

Permission Requirements

Speech recognition requires system permissions. DuRT supports both system audio capture and microphone input.

System Audio Recording requires Screen Recording & System Audio permissions: System Settings > Privacy & Security > Screen Recording & System Audio > Enable DuRT

Click to expand image
Image description

Microphone Input requires Microphone permission: System Settings > Privacy & Security > Microphone > Enable DuRT

Click to expand image
Image description

To save recordings and processing results, users need to specify a storage directory in DuRT's settings.

Security Note: DuRT only accesses recording permissions during active recognition processes.

Additional Requirements for Apple Recognition

Two additional permissions are required for Apple Speech Recognition:

  1. Enable Speech Recognition: System Settings > Privacy & Security > Speech Recognition > Enable DuRT
Click to expand image
Image description
  1. Enable Keyboard Dictation: System Settings > Keyboard > Dictation > Turn On
Click to expand image
Image description

Memory Requirements

Whisper recognition requires memory allocation approximately twice the size of the loaded model.

Model Details

Whisper models are available in different sizes: tiny, small, base, medium, large, and turbo. Larger models generally provide better accuracy.

Supported languages include: Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Italian, Japanese, Korean, Macedonian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese.

We recommend using the whisper-large-v3-turbo model.

For model downloads, see Model Download Guide.