Using the OpenAI Text-to-speech API with Rails

Generated with OpenAI TTS:

In this post we’re going to create a simple CMS, that will automatically send our posts to the OpenAI Speech API, which will create a mp3 file of the post’s content.

We’ll start with a simple implementation first and refine it afterwards with background jobs, error handling and live updates with Turbo.

By the way, you can check out the full project with tests in my GitHub repository for this blog’s project. You can add your own OpenAI access token and run this demo locally on your own machine.

Creating the article

We’re starting with a simple CRUD for an article record and install ActiveStorage in a new Rails app.

$ rails g scaffold article content:text
$ rails active_storage:install

We’re adding one attachment to the article record next:

class Article < ApplicationRecord
  has_one_attached :audio
end

We’re going to use the ruby-openai gem and use the new Text-to-speech (TTS) API to create an audio version of the article’s content.

First let’s create a simple client to speak with the API:

class TextToSpeech
  def initialize
    @client = OpenAI::Client.new(access_token: Rails.credentials.open_ai_access_token!)
  end

  def speech(text)
    @client.audio.speech(
      parameters: {
        model: "tts-1",
        input: text,
        voice: "alloy"
      }
    )
  end
end

In our first version I’m going to use a simple callback in the article that runs when saving the article, speaks with the OpenAI endpoint and attaches the returned file to our article.

class Article < ApplicationRecord
  has_one_attached :audio

  before_save :generate_audio_mp3

  private

  def generate_audio_mp3
    response = TextToSpeech.new.speech(self.content)
    self.audio.attach(
      io: StringIO.new(response, 'rb'),
      filename: audio_filename
    )
  end

  def audio_filename
    "article-#{id}.mp3"
  end
end

Creating an article with some content will immediately send the content to the API and attach the file immediately. This happens all in the same request cycle.

Let’s modify the HTML to display a simple audio-player for our content:

# _article.html.erb
<%= audio_tag(audio.audio, controls: true) %>

Nice! After creating the article, there will be an audio player visible as well, which reads our whole article out loud. We can improve the code, but we’ll come back to that later.

In the next step we’re first going to improve the audio quality.

Increasing the audio quality of the speech API

The audio API supports parameters to finetune the returned audio file. After playing around with the API, I found that there are two parameters that can increase the audio quality a little bit:

Change the model tts-1 to tts-1-hd
Change the speed to 0.95

Let’s also change the voice to echo, which I found to deliver great results, although this is surely a matter of taste. In general, you should probably adapt the parameters to your specific use-case and test it regularly.

Our API client’s #speech method looks like this after the parameter changes:

def speech(text)
  @client.audio.speech(
    parameters: {
      model: "tts-1-hd",
      input: text,
      voice: "echo",
      speed: 0.95,
    }
  )
end

The generated audiofiles are a little bit slower, have a higher quality and a different tone of voice.

Some problems

Every time an article is saved, we’re regenerating the audiofile, even if no changes to the article’s content have been made. Also we’re waiting for the OpenAI API response in the request lifecycle, instead of handling this in the background. You’ll notice that saving an article takes quite some time!

There are even more problems with the current implementation:

Any exception will rollback the database transaction. Instabilities in the OpenAI API will break saving articles 🥶
Requests to the API are not retried in case of a failure
Database transactions wait for the API, which will be very taxing on our database performance at high load
The request takes a long time to complete, which is suboptimal for our webserver and for our user experience
Just touching the record will regenerate the audiofile, which will cost us a lot of money in the long run

Let’s use a different approach and use a background job with conditional callbacks to avoid those issues.

TTS in a background job and retrying errors

To avoid talking with external services in a transaction, we can run it after the transaction finished.

Instead of the before_save callback, we’re going to to use the after_commit callback to talk with the speech endpoint. This way, even with a failing API request our article is going to be saved, and we’re not blocking the database transaction with a long running request.

We’re going to make use of ActiveModel::Dirty and the method content_previously_changed?. It’s a predicate method that will check if the last transaction of the record instance changed a specific column value. With this in place, we’re able to see if the content of an article changed, even after the transaction has already completed.

class Article < ApplicationRecord
  after_commit :generate_audio_mp3, if: :content_previously_changed?

  ...
end

We’re fixing our retry issue by enqueuing a background job instead of talking with the API inline in the after_commit callback.

Let’s create a new job with rails g job text_to_speech. In this job we’re going to run our API request and buffer the response in an IO class, which we’ll attach directly to the article.

# text_to_speech_job.rb
class TextToSpeechJob < ApplicationJob
  queue_as :default
  retry_on Faraday::Error, wait: :polynomially_longer, attempts: 10

  def perform(article)
    response = TextToSpeech.new.speech(article.content)
    article.audio.attach(
      io: StringIO.new(response, 'rb'),
      filename: "article--#{article.id}.mp3"
    )
  end
end

We need to adapt our model method to enqueue the job:

# article.rb
def generate_audio_mp3
  TextToSpeechJob.perform_later(self)
end

We also need to call retry_on to retry failed API requests. Since the openai-ruby gem uses faraday under the hood, we can basically retry the most general Faraday::Error, which faraday currently raises for most HTTP errors.

We solved our problems and can continue with fixing some bugs we introduced by moving the api request to the background.

Fixing the UI and refreshing the player

Moving things to background jobs has some downsides. If we are creating a new article in the UI, we’ll run into an error page.

Since generating the audiofile and attaching it to the article record takes longer than it takes our controller to respond, article.audio in _article.html.erb will sometimes be nil.

We avoided this problem earlier by attaching the file in the controller directly. Let’s handle the nil case and refresh the articles/show.html.erb page with Turbo once the audiofile has been attached successfully.

We’ll add the following condition to our _article.html.erb template and render the audioplayer only if the audiofile has been attached successfully.

<% if  article.audio.attached? %>
  <%= audio_tag article.audio, controls: true %>
<% else  %>
  <p>Audio is generating...</p>
<% end %>

We’re not getting any errors anymore, but a user will have to refresh the page to see the updated file. On the articles/show.html.erb page, we’re going to add

<%= turbo_stream_from(@article) %>

to create a turbo-stream-source for the current article, and in the article we’re going to add the following:

class Article < ApplicationModel
  broadcasts
  ...
end

Because our job will attach the audiofile as soon as it’s ready, and attaching a file will touch our article, a turbo-stream action="replace" will be broadcasted, which will replace our article’s container.

Of course, there’s one last gotcha we need to solve:

Since the Turbo::Streams::ActionBroadcastJob that is responsible to render the replace action renders the _article.html.erb partial with the ApplicationRenderer outside of the request context, it’s missing the correct host for the audio file’s url that we need in the src attribute of the audio tag.

It defaults to http://example.org, which will result in a 404. One way to fix this (at least in development) is to set the default_url_options for ActionController in development.rb.

config.action_controller.default_url_options = {
  host: 'localhost',
  port: '3000',
}

Saving an article will enqueue a job to create an audio version and the UI will automatically update with a player as soon as the job finishes.

Conclusion

We implemented a simple demo for the OpenAI Text-to-speech API, learned a super simple way to skip callbacks in ActiveRecord models, and handled an asynchronous page refresh with 2 lines of turbo code.

Also many thanks to Alex Rudall who merged my TTS PR for the ruby-openai gem, which allowed me to write this tutorial without implementing the http client for the API myself.

As always, please reach out to me with questions or feedback on X (@ModernRails) or send me an e-mail.

Creating the article#

Increasing the audio quality of the speech API#

Some problems#

TTS in a background job and retrying errors#

Fixing the UI and refreshing the player#

Conclusion#