Skip to content

More intuitive Microphone helper #2308

@davidgilbertson

Description

@davidgilbertson

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

I'm looking at the Microphone class, and it seems to favour the use case where you want to record a specific amount of audio (where you know that time in advance).

I would have thought the most common use case is where a USER decides when to start and stop recording, and I can't work out how to use this class in that case.

I can wrap it to turn it from async to sync, something like this:

class MicrophoneSync:
    def __init__(self):
        self.do_rec = threading.Event()
        self.loop = asyncio.new_event_loop()
        self.loop_thread = threading.Thread(target=self.loop.run_forever, daemon=True)
        self.loop_thread.start()
        self.future = None
        self.mic = Microphone(should_record=self.should_record)

    def should_record(self):
        return self.do_rec.is_set()

    def start(self):
        self.do_rec.set()
        self.future = asyncio.run_coroutine_threadsafe(self.mic.record(), self.loop)

    def stop(self):
        self.do_rec.clear()
        return self.future.result()[1]


mic = MicrophoneSync()

mic.start()
time.sleep(2)
wav_bytes = mic.stop()

But that's almost as complex as just creating a synchronous one from scratch:

class MicrophoneSync:
    def __init__(self, sample_rate=24_000):
        self.frames = []
        self.stream = InputStream(
            samplerate=sample_rate,
            channels=1,  # mono
            dtype="int16",  # 16-bit
            callback=self._callback,
        )

    def _callback(self, indata, frames, time, status):
        self.frames.append(indata.copy())

    def start(self):
        self.frames = []
        self.stream.start()

    def stop(self):
        self.stream.stop()

        wav_bytes = io.BytesIO()

        with wave.open(wav_bytes, "wb") as wave_file:
            wave_file.setframerate(self.stream.samplerate)
            wave_file.setnchannels(self.stream.channels)
            wave_file.setsampwidth(self.stream.samplesize)
            wave_file.writeframes(np.concatenate(self.frames, axis=0).tobytes())

        return wav_bytes


mic = MicrophoneSync()

mic.start()
time.sleep(2)
wav_bytes = mic.stop()

So I have two questions:

  1. Am I missing something, is there a simple way to call the Microphone class such that I can start and stop it in response to a user interaction?
  2. Is it worth adding a sync version (either of the above) to the package?

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions