frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

In-Depth Practice of HarmonyOS 5 Intelligent Voice

1•zhxwork•4h ago
## Introduction

In the wave of digital education transformation, HarmonyOS 5 has opened up a new paradigm of intelligent interaction for educational software through its innovative distributed capabilities and AI technology stack. Taking the *K12 oral training scenario* as an entry point, this article deeply analyzes how to use the ArkUI framework and AI voice services to create smart education solutions with functions such as real-time speech evaluation and intelligent transcription of classroom content, achieving three major breakthroughs:

* Technical Highlights* Multimodal Interaction: Dual-channel input of voice and touch, supporting teaching scenarios such as classroom quick response and oral follow‑up Educational‑Level Latency: 1.2‑second edge‑side speech recognition response to ensure smooth classroom interaction Accessibility Support: Real‑time subtitle generation technology to assist in special education scenarios

* Value in Educational Scenarios*

- *Language Learning*: AI speech evaluation enables real‑time scoring of pronunciation accuracy - *Classroom Recording*: Automatically generates timestamped text of teaching content - *Homework Grading*: Quickly invokes question bank resources via voice commands

Build a real‑time speech‑to‑text function that supports long‑pressing a button to trigger recording and dynamically displays recognition results. Suitable for scenarios such as voice input and real‑time subtitles.

---

## Detailed Development Process

### 1. Environment Preparation

*System Requirements*: HarmonyOS 5 API 9+ *Device Support*: Requires verification of device microphone hardware capabilities

```typescript // Device capability detection if (!canIUse('SystemCapability.AI.SpeechRecognizer')) { promptAction.showToast({ message: 'Device does not support speech recognition' }) } ```

### 2. Permission Configuration

*Step Description*:

1. Declare permissions: Add to `module.json5`:

```json "requestPermissions": [ { "name": "ohos.permission.MICROPHONE", "reason": "$string:microphone_permission_reason", "usedScene": { "abilities": ["EntryAbility"], "when": "always" } } ] ```

1. Dynamic permission request:

```typescript private async requestPermissions() { const atManager = abilityAccessCtrl.createAtManager(); try { const result = await atManager.requestPermissionsFromUser( getContext(), ['ohos.permission.MICROPHONE'] ); this.hasPermissions = result.authResults.every( status => status === abilityAccessCtrl.GrantStatus.PERMISSION_GRANTED ); } catch (err) { console.error(`Permission request failed: ${err.code}, ${err.message}`); } } ```

### 3. Speech Engine Management

*Lifecycle Control*:

```typescript // Engine initialization private async initEngine() { this.asrEngine = await speechRecognizer.createEngine({ language: 'zh-CN', // Supports multiple languages like en-US online: 1 // Online recognition mode });

  this.configureCallbacks();
}

// Resource release private releaseEngine() { this.asrEngine?.finish('10000'); this.asrEngine?.cancel('10000'); this.asrEngine?.shutdown(); this.asrEngine = undefined; } ```

### 4. Core Configuration Parameters

Comments

zhxwork•4h ago
Introduction In the wave of digital education transformation, HarmonyOS 5 has opened up a new paradigm of intelligent interaction for educational software through its innovative distributed capabilities and AI technology stack. Taking the K12 oral training scenario as an entry point, this article deeply analyzes how to use the ArkUI framework and AI voice services to create smart education solutions with functions such as real-time speech evaluation and intelligent transcription of classroom content, achieving three major breakthroughs:

Technical Highlights Multimodal Interaction: Dual-channel input of voice and touch, supporting teaching scenarios such as classroom quick response and oral follow‑up Educational‑Level Latency: 1.2‑second edge‑side speech recognition response to ensure smooth classroom interaction Accessibility Support: Real‑time subtitle generation technology to assist in special education scenarios

Value in Educational Scenarios

Language Learning: AI speech evaluation enables real‑time scoring of pronunciation accuracy Classroom Recording: Automatically generates timestamped text of teaching content Homework Grading: Quickly invokes question bank resources via voice commands Build a real‑time speech‑to‑text function that supports long‑pressing a button to trigger recording and dynamically displays recognition results. Suitable for scenarios such as voice input and real‑time subtitles.

Detailed Development Process 1. Environment Preparation System Requirements: HarmonyOS 5 API 9+ Device Support: Requires verification of device microphone hardware capabilities

// Device capability detection if (!canIUse('SystemCapability.AI.SpeechRecognizer')) { promptAction.showToast({ message: 'Device does not support speech recognition' }) } 2. Permission Configuration Step Description:

Declare permissions: Add to module.json5: "requestPermissions": [ { "name": "ohos.permission.MICROPHONE", "reason": "$string:microphone_permission_reason", "usedScene": { "abilities": ["EntryAbility"], "when": "always" } } ] Dynamic permission request: private async requestPermissions() { const atManager = abilityAccessCtrl.createAtManager(); try { const result = await atManager.requestPermissionsFromUser( getContext(), ['ohos.permission.MICROPHONE'] ); this.hasPermissions = result.authResults.every( status => status === abilityAccessCtrl.GrantStatus.PERMISSION_GRANTED ); } catch (err) { console.error(`Permission request failed: ${err.code}, ${err.message}`); } } 3. Speech Engine Management Lifecycle Control:

// Engine initialization private async initEngine() { this.asrEngine = await speechRecognizer.createEngine({ language: 'zh-CN', // Supports multiple languages like en-US online: 1 // Online recognition mode });

  this.configureCallbacks();
}

// Resource release private releaseEngine() { this.asrEngine?.finish('10000'); this.asrEngine?.cancel('10000'); this.asrEngine?.shutdown(); this.asrEngine = undefined; } 4. Core Configuration Parameters Audio Parameters:

const audioConfig: speechRecognizer.AudioInfo = { audioType: 'pcm', // Recommended lossless format sampleRate: 16000, // Standard speech sampling rate soundChannel: 1, // Monophonic recording sampleBit: 16 // 16‑bit sampling depth }; Recognition Parameters:

const recognitionParams = { recognitionMode: 0, // 0 – Streaming recognition, 1 – Single-sentence recognition vadBegin: 2000, // Voice start detection threshold (ms) vadEnd: 3000, // Voice end silence judgment maxAudioDuration: 60000 // Maximum recording time }; 5. Callback Event Handling private configureCallbacks() { const _this = this;

  this.asrEngine.setListener({
    onResult(sessionId, result) {
      _this.text = result.result;  // Incrementally update recognition results
 
      if (result.isLast) {
        _this.handleRecognitionEnd();
      }
    },
 
    onError(sessionId, code, msg) {
      promptAction.showToast({ message: `Recognition error: ${msg}` });
      _this.resetState();
    }
  });
}

private handleRecognitionEnd() { this.isRecording = false; this.releaseEngine(); promptAction.showToast({ message: 'Recognition completed' }); } Complete Implementation Code View Component @Entry @ComponentV2 struct SpeechRecognitionView { @State private displayText: string = ''; @State private recordingStatus: boolean = false; private recognitionEngine?: speechRecognizer.SpeechRecognitionEngine;

  build() {
    Column() {
      // Result display area
      Scroll() {
        Text(this.displayText)
          .fontSize(18)
          .textAlign(TextAlign.Start)
      }
      .layoutWeight(1)
      .padding(12)
 
      // Voice control button
      VoiceButton({
        recording: this.recordingStatus,
        onStart: () => this.startRecognition(),
        onEnd: () => this.stopRecognition()
      })
    }
    .height('100%')
    .backgroundColor(Color.White)
  }
} Custom Voice Button Component @ComponentV2 struct VoiceButton { @Link recording: boolean; onStart: () => void; onEnd: () => void; build() { Button(this.recording ? 'Release to End' : 'Long Press to Speak') .size({ width: '80%', height: 80 }) .backgroundColor(this.recording ? '#FF6B81' : '#3498DB') .gesture( LongPressGesture() .onActionStart(() => { this.onStart(); this.recording = true; }) .onActionEnd(() => { this.onEnd(); this.recording = false; }) ) } } Best Practice Recommendations Performance Optimization Resource Management: Ensure engine release when components are unloaded aboutToDisappear(): void { this.releaseEngine(); } Throttling Processing: Avoid frequent state updates private updateText(newText: string) { if (Date.now() - this.lastUpdate > 200) { this.displayText = newText; this.lastUpdate = Date.now(); } } User Experience Enhancement Add audio waveform animation: // Add dynamic effects to the button @Builder WaveEffect() { Circle() .width(this.recording ? 30 : 0) .height(this.recording ? 30 : 0) .opacity(0.5) .animate({ duration: 1000, iterations: -1 }) } Error recovery mechanism: private async retryRecording() { await this.releaseEngine(); await new Promise(resolve => setTimeout(resolve, 500)); await this.initEngine(); this.startRecognition(); } Technical Key Points Summary ModuleKey Technical PointsPermission ManagementDynamic permission request mechanism + exception fallback handlingAudio ProcessingPCM audio stream configuration + VAD silence detection parameter optimizationState ManagementUI and logic state synchronization via @State/@LinkPerformance OptimizationEngine lifecycle management + throttling update strategyException HandlingError code mapping table + automatic retry mechanism

Through this case, developers can master the core development model of HarmonyOS 5 voice services and quickly build high‑quality voice interaction functions.

10k Drum Machines

https://blog.adafruit.com/2025/05/16/10000-drum-machines/
1•ohjeez•1m ago•0 comments

Show HN: PIxel Converter I hope you can give me some feedback on my gadget

https://www.pixelany.fun/
1•zaiyiqi•1m ago•0 comments

Ask HN: Why does Google Photos' search suck?

1•sexy_seedbox•2m ago•0 comments

Build a daily podcast from your interests

https://www.huxe.com/
1•matthewsinclair•3m ago•0 comments

MCP client quirks: Cursor (40-tools, 60-character limit), OpenAI (no root anyOf)

https://www.stainless.com/blog/what-we-learned-converting-complex-openapi-specs-to-mcp-servers
1•minks96•4m ago•0 comments

Wikipedia's Most Translated Articles

https://sohom.dev/most-translated-articles-on-wikipedia/pretty.html
1•sohom_datta•6m ago•0 comments

Secure Email Delivery with XOAUTH2

https://www.keycloak.org/2025/05/send-mails-xoauth-26-2
2•mooreds•7m ago•0 comments

30-Year-Old's Startup Is Bringing Leverage to 401(k) Savers

https://www.bloomberg.com/news/articles/2025-05-12/401-k-investing-basic-capital-brings-leverage-private-credit-to-savers
1•mooreds•7m ago•0 comments

Why did DeepMind solve protein folding?

https://jakefeala.substack.com/p/why-did-deepmind-solve-protein-folding
1•WillDaSilva•7m ago•0 comments

Nvidia launches NVLink Fusion to connect custom CPUs with Nvidia hardware

https://www.datacenterdynamics.com/en/news/nvidia-launches-nvlink-fusion-to-connect-custom-cpus-and-asics-with-nvidia-hardware/
1•anastalaz•7m ago•0 comments

Virtual reality study reveals how burglars weigh risk and reward

https://phys.org/news/2025-05-virtual-reality-reveals-burglars-reward.html
1•PaulHoule•10m ago•0 comments

From idea to signals in 3 hours

https://launched.lovable.dev/bunads
1•paaloeye•11m ago•0 comments

'Turbocharged' Mitochondria Power Birds' Epic Migratory Journeys

https://www.quantamagazine.org/turbocharged-mitochondria-power-birds-epic-migratory-journeys-20250519/
1•pseudolus•12m ago•0 comments

The History of Now

https://historyofnow.antikythera.org/
1•anarbadalov•16m ago•0 comments

Show HN: Chrome native AI polyfill with OpenRouter

https://github.com/Explosion-Scratch/chrome-ai-polyfill
1•explosion-s•19m ago•0 comments

Too Much Go Misdirection

https://flak.tedunangst.com/post/too-much-go-misdirection
2•todsacerdoti•19m ago•0 comments

Discover is now part of Capital One

https://www.discover.com/faqs/merger/
2•LopRabbit•20m ago•0 comments

Hate Your Business Processes a Little Less with Workflows4s

https://medium.com/business4s-blog/workflows4s-finally-released-you-might-hate-your-business-processes-a-little-less-e4e7bf0dd0aa
1•erikvanoosten•20m ago•0 comments

Take this new map of GitHub with 690k repositories

https://anvaka.github.io/map-of-github/
2•anvaka•22m ago•1 comments

Nvidia Unveils NVLink Fusion

https://nvidianews.nvidia.com/news/nvidia-unveils-nvlink-fusion-for-industry-to-build-semi-custom-ai-infrastructure-with-nvidia-partner-ecosystem
1•Spellman•24m ago•0 comments

The Vibes (a tool-agnostic approach for coding with LLMs)

https://taoofmac.com/space/blog/2025/05/13/2230
1•rcarmo•24m ago•0 comments

Greenspun's 10th rule and the sad state of software quality

https://gist.github.com/MIvanchev/1b0af1de6324626bf9cd6358e7154177
4•mivanchev•27m ago•0 comments

Malware Attack and Counterattack

https://www.antoineschmitt.com/malware-attack-and-counterattack/
3•todsacerdoti•27m ago•0 comments

A Story About Jessica (2014)

https://harihareswara.net/posts/2024/a-story-about-jessica-by-swiftonsecurity/
2•colejohnson66•28m ago•0 comments

R5RS, R6RS, R7RS (2017)

https://elmord.org/blog/?entry=20171001-r6rs-r7rs
2•lioeters•31m ago•0 comments

Competing Elite-Selection Mechanisms

https://www.thediff.co/archive/competing-elite-selection-mechanisms/
2•jger15•32m ago•0 comments

23andMe Sells Gene-Testing Business to DNA Drug Maker Regeneron

https://www.bloomberg.com/news/articles/2025-05-19/23andme-sells-gene-testing-business-to-dna-drug-maker-regeneron
27•wslh•32m ago•11 comments

Zod 4

https://zod.dev/v4
99•bpierre•35m ago•35 comments

Spaced Repetition Algorithm: A Three‐Day Journey from Novice to Expert

https://github.com/open-spaced-repetition/fsrs4anki/wiki/Spaced-Repetition-Algorithm:-A-Three%E2%80%90Day-Journey-from-Novice-to-Expert
1•akbarnama•36m ago•0 comments

774–775 carbon-14 spike

https://en.wikipedia.org/wiki/774%E2%80%93775_carbon-14_spike
1•simonebrunozzi•36m ago•0 comments