Speech-To-Text

Overview

What is this project about?

ICS' Role

What is ICS' Role in the project?

Notes

background info to add to the text to speech summary
- also include these things in the summary
- per surgeon template for each surgery
test the tool on 3 procedures one template for each
needs to review the summary for inaccuracies
- how to review and correct
- ideally would not need to be edited
patient comes in
- do pre-op consult
  - how, risks benefits etc
  - pretty much a script
  - Q&A
  - recorded or realtime
- if realtime
  - be part of discussion
  - find images?
  - take small clips and send for running transcription
- summary the transcript send to LLM
- send to AI
- review and correct if needed
- then print to document
- after patient goes home with summary
HIPAA compliant
integrated with epic
installed on a specific device
- streaming transcript as translated
review page
save page

challenges

open web ui does not do realtime text-to-speech, requires button clicking to handle sending audio
not a great review/edit experience

Next Steps

investigate building a demo env
investigate other available tools
send follow up email
- instructions on testing open web ui
  - https://lmstudio.ai/model/llama-3.2-1b-instruct
  - https://lmstudio.ai
- - https://www.docker.com/products/docker-desktop/
  - https://docs.openwebui.com/getting-started/#installing-open-webui-with-bundled-ollama-support
  - Sign up with your information
  - Go to admin - settings - models
  - pull model llama3.2:1b
  - go to audio and pull the base whisper model
- next steps
  - eval open web ui and/or other tools
  - create SOW for custom tool

Table of Contents

Updated on August 7, 2025