audio_chat

command

v1.0.0 Latest Latest Go to latest Published: Nov 11, 2025 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/liuzl/ai

Links

Open Source Insights

README ¶

Audio Analysis Example

This example demonstrates how to use audio input with AI models. Audio analysis is primarily supported by Google Gemini models.

Supported Providers

✅ Gemini - Full support for audio analysis
❌ OpenAI - Not supported in chat completions API (use Whisper API instead)
❌ Anthropic - Not currently supported

Supported Audio Formats

MP3
WAV
AIFF
AAC
OGG
FLAC

Maximum duration: ~9.5 hours

Prerequisites

# Set environment variables
export AI_PROVIDER=gemini
export GEMINI_API_KEY="your-gemini-api-key"
export GEMINI_MODEL="gemini-2.0-flash-exp"  # or gemini-1.5-pro, gemini-1.5-flash

Running the Example

go run main.go

Use Cases

Music Analysis: Identify instruments, tempo, genre, mood
Speech Transcription: Convert speech to text
Sound Classification: Identify environmental sounds
Audio Quality Assessment: Analyze audio quality issues
Language Detection: Identify languages in multilingual audio
Emotion Detection: Analyze emotional tone in speech

Example Output

=== Audio Analysis Example ===
This example requires Gemini API

Analyzing audio file...
Audio URL: https://www2.cs.uic.edu/~i101/SoundFiles/BabyElephantWalk60.wav

AI Response:
This is a playful and whimsical piece of music. I can hear:

Instruments:
- Trumpet or brass section playing the main melody
- Tuba providing the bass line
- Light percussion (possibly tambourine or bells)
- Possibly a xylophone or marimba

The mood is cheerful, lighthearted, and child-friendly. It has a bouncy, walking rhythm that sounds like it could be from a cartoon or children's program. The melody is repetitive and catchy, with a comedic quality.

=== Detailed Audio Analysis ===
Detailed Analysis:
1. **Instruments**:
   - Trumpet/Brass: Carries the main melodic theme
   - Tuba: Provides a bouncing bass line
   - Percussion: Light, possibly wooden blocks or claves
   - High-pitched melodic instrument: Possibly glockenspiel

2. **Tempo and Rhythm**:
   - Moderate tempo, approximately 120-130 BPM
   - Bouncy, march-like rhythm
   - Strong emphasis on the downbeat
   - Syncopated melody creates a "walking" feeling

3. **Emotions**:
   - Playful and amusing
   - Lighthearted and carefree
   - Nostalgic (reminiscent of classic cartoons)
   - Whimsical and slightly silly

4. **Duration**: Approximately 60 seconds

Code Example with Base64

If you have a local audio file, you can convert it to base64:

package main

import (
	"context"
	"encoding/base64"
	"fmt"
	"io"
	"log"
	"os"

	"github.com/liuzl/ai"
)

func main() {
	// Read local audio file
	audioData, err := os.ReadFile("path/to/audio.mp3")
	if err != nil {
		log.Fatal(err)
	}

	// Convert to base64
	base64Audio := base64.StdEncoding.EncodeToString(audioData)

	// Create client
	client, err := ai.NewClientFromEnv()
	if err != nil {
		log.Fatal(err)
	}

	// Use base64 audio
	req := &ai.Request{
		Messages: []ai.Message{
			ai.NewMultimodalMessage(ai.RoleUser, []ai.ContentPart{
				ai.NewTextPart("What's in this audio?"),
				ai.NewAudioPartFromBase64(base64Audio, "mp3"),
			}),
		},
	}

	resp, err := client.Generate(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(resp.Text)
}

Notes

The example uses a public domain audio file for demonstration
Audio files are automatically downloaded and converted to base64 for Gemini
Large audio files may take longer to process
Ensure your API key has sufficient quota for audio processing

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL