In this post we’re going to create a simple CMS, that will automatically send our posts to the OpenAI Speech API, which will create a mp3 file of the post’s content.
We’ll start with a simple implementation first and refine it afterwards with background jobs, error handling and live updates with Turbo.
By the way, you can check out the full project with tests in my GitHub repository for this blog’s project. You can add your own OpenAI access token and run this demo locally on your own machine.
Creating the article
We’re starting with a simple CRUD for an article record and install ActiveStorage in a new Rails app.
$ rails g scaffold article content:text
$ rails active_storage:install
We’re adding one attachment to the article record next:
class Article < ApplicationRecord
has_one_attached :audio
end
We’re going to use the ruby-openai
gem and use the new Text-to-speech (TTS) API to create an audio version of the article’s content.
First let’s create a simple client to speak with the API:
class TextToSpeech
def initialize
@client = OpenAI::Client.new(access_token: Rails.credentials.open_ai_access_token!)
end
def speech(text)
@client.audio.speech(
parameters: {
model: "tts-1",
input: text,
voice: "alloy"
}
)
end
end
In our first version I’m going to use a simple callback in the article that runs when saving the article, speaks with the OpenAI endpoint and attaches the returned file to our article.
class Article < ApplicationRecord
has_one_attached :audio
before_save :generate_audio_mp3
private
def generate_audio_mp3
response = TextToSpeech.new.speech(self.content)
self.audio.attach(
io: StringIO.new(response, 'rb'),
filename: audio_filename
)
end
def audio_filename
"article-#{id}.mp3"
end
end
Creating an article with some content will immediately send the content to the API and attach the file immediately. This happens all in the same request cycle.
Let’s modify the HTML to display a simple audio-player for our content:
# _article.html.erb
<%= audio_tag(audio.audio, controls: true) %>
Nice! After creating the article, there will be an audio player visible as well, which reads our whole article out loud. We can improve the code, but we’ll come back to that later.
In the next step we’re first going to improve the audio quality.
Increasing the audio quality of the speech API
The audio API supports parameters to finetune the returned audio file. After playing around with the API, I found that there are two parameters that can increase the audio quality a little bit:
- Change the model
tts-1
totts-1-hd
- Change the speed to
0.95
Let’s also change the voice to echo
, which I found to deliver great results, although this is surely a matter of taste. In general, you should probably adapt the parameters to your specific use-case and test it regularly.
Our API client’s #speech
method looks like this after the parameter changes:
def speech(text)
@client.audio.speech(
parameters: {
model: "tts-1-hd",
input: text,
voice: "echo",
speed: 0.95,
}
)
end
The generated audiofiles are a little bit slower, have a higher quality and a different tone of voice.
Some problems
Every time an article is saved, we’re regenerating the audiofile, even if no changes to the article’s content have been made. Also we’re waiting for the OpenAI API response in the request lifecycle, instead of handling this in the background. You’ll notice that saving an article takes quite some time!
There are even more problems with the current implementation:
- Any exception will rollback the database transaction. Instabilities in the OpenAI API will break saving articles 🥶
- Requests to the API are not retried in case of a failure
- Database transactions wait for the API, which will be very taxing on our database performance at high load
- The request takes a long time to complete, which is suboptimal for our webserver and for our user experience
- Just touching the record will regenerate the audiofile, which will cost us a lot of money in the long run
Let’s use a different approach and use a background job with conditional callbacks to avoid those issues.
TTS in a background job and retrying errors
To avoid talking with external services in a transaction, we can run it after the transaction finished.
Instead of the before_save
callback, we’re going to to use the after_commit
callback to talk with the speech endpoint.
This way, even with a failing API request our article is going to be saved, and we’re not blocking the database transaction with a long running request.
We’re going to make use of ActiveModel::Dirty
and the method content_previously_changed?
. It’s a predicate method that will check if the last transaction of the record instance changed a specific column value. With this in place, we’re able to see if the content of an article changed, even after the transaction has already completed.
class Article < ApplicationRecord
after_commit :generate_audio_mp3, if: :content_previously_changed?
...
end
We’re fixing our retry issue by enqueuing a background job instead of talking with the API inline in the after_commit
callback.
Let’s create a new job with rails g job text_to_speech
. In this job we’re going to run our API request and buffer the response in an IO class, which we’ll attach directly to the article.
# text_to_speech_job.rb
class TextToSpeechJob < ApplicationJob
queue_as :default
retry_on Faraday::Error, wait: :polynomially_longer, attempts: 10
def perform(article)
response = TextToSpeech.new.speech(article.content)
article.audio.attach(
io: StringIO.new(response, 'rb'),
filename: "article--#{article.id}.mp3"
)
end
end
We need to adapt our model method to enqueue the job:
# article.rb
def generate_audio_mp3
TextToSpeechJob.perform_later(self)
end
We also need to call retry_on
to retry failed API requests. Since the openai-ruby
gem uses faraday under the hood, we can basically retry the most general Faraday::Error
, which faraday currently raises for most HTTP errors.
We solved our problems and can continue with fixing some bugs we introduced by moving the api request to the background.
Fixing the UI and refreshing the player
Moving things to background jobs has some downsides. If we are creating a new article in the UI, we’ll run into an error page.
Since generating the audiofile and attaching it to the article record takes longer than it takes our controller to respond, article.audio
in _article.html.erb
will sometimes be nil
.
We avoided this problem earlier by attaching the file in the controller directly. Let’s handle the nil
case and refresh the articles/show.html.erb
page with Turbo once the audiofile has been attached successfully.
We’ll add the following condition to our _article.html.erb
template and render the audioplayer only if the audiofile has been attached successfully.
<% if article.audio.attached? %>
<%= audio_tag article.audio, controls: true %>
<% else %>
<p>Audio is generating...</p>
<% end %>
We’re not getting any errors anymore, but a user will have to refresh the page to see the updated file. On the articles/show.html.erb
page, we’re going to add
<%= turbo_stream_from(@article) %>
to create a turbo-stream-source
for the current article, and in the article we’re going to add the following:
class Article < ApplicationModel
broadcasts
...
end
Because our job will attach the audiofile as soon as it’s ready, and attaching a file will touch our article, a turbo-stream action="replace"
will be broadcasted, which will replace our article’s container.
Of course, there’s one last gotcha we need to solve:
Since the Turbo::Streams::ActionBroadcastJob
that is responsible to render the replace action renders the _article.html.erb
partial with the ApplicationRenderer
outside of the request context, it’s missing the correct host for the audio file’s url that we need in the src
attribute of the audio
tag.
It defaults to http://example.org
, which will result in a 404. One way to fix this (at least in development) is to set the default_url_options
for ActionController
in development.rb
.
config.action_controller.default_url_options = {
host: 'localhost',
port: '3000',
}
Saving an article will enqueue a job to create an audio version and the UI will automatically update with a player as soon as the job finishes.
Conclusion
We implemented a simple demo for the OpenAI Text-to-speech API, learned a super simple way to skip callbacks in ActiveRecord models, and handled an asynchronous page refresh with 2 lines of turbo code.
Also many thanks to Alex Rudall who merged my TTS PR for the ruby-openai
gem, which allowed me to write this tutorial without implementing the http client for the API myself.
As always, please reach out to me with questions or feedback on X (@ModernRails) or send me an e-mail.