The first Laravel queue job I ever shipped to production ran 47 times. It was a “send welcome email” job. The user got 47 welcome emails. I learned about queue retries that day, in the loudest possible way.
That was 2019. I’ve gotten better. The queue I run today isn’t fancier, just less wrong. Most of the improvements were unlearning defaults I’d never actually thought about. Laravel’s queue API is good. The defaults are calibrated for “you’re playing around”, not for “you’re paying AWS for SQS by the message”.
This is the short list of habits I’ve changed, with the actual config I run now. If you’ve ever had a job run more times than it should have, this one’s for you.
tries vs maxExceptions, and why I always set both
For years I set --tries=3 on php artisan queue:work and called it a day. I assumed it meant “retry up to 3 times, then stop”. It does. Sort of.
Here’s the catch. tries counts every attempt, whether the job failed because of an exception or because the worker crashed mid-run. So a job that hits an OOM kill three times in a row gets marked permanently failed without ever actually throwing. That’s fine. But a job that throws a recoverable exception (network blip, transient DB error) also burns a try. Three transient errors in an hour and you’re in the failed_jobs table forever.
What I do now: I set tries high (10) and maxExceptions low (3). tries is the hard cap on attempts including crashes. maxExceptions is the cap on raised exceptions. The pair gives me “give up after 3 real errors, but let me reboot the worker without losing the job”.
class ProcessPayment implements ShouldQueue
{
public int $tries = 10;
public int $maxExceptions = 3;
public int $timeout = 60;
public int $backoff = 30; // seconds between retries
}
I learned about backoff separately. The default is zero. Zero means “retry immediately, possibly hammering the upstream that just rejected you”. 30s with jitter is fine for most cases. For external APIs, I use the array form: public array $backoff = [10, 30, 60, 300]; so I back off harder if it keeps failing.
One more piece I always set: timeout. The queue worker default is 60 seconds, but if you have a job that legitimately takes 4 minutes, it gets killed mid-run and the failure looks identical to a crash. Worse, depending on driver, the job can be picked up again before the timeout has actually elapsed, so you get two workers running the same job at the same time. I set timeout per job to whatever the realistic upper bound is plus 30%, and I set the worker’s --timeout to one second higher than the highest job timeout.
Bus::chain vs Bus::batch, when each one earns its keep
This is where I see the most confusion in code review. Chains and batches are both ways to coordinate jobs, but they have different semantics, and people pick whichever feels familiar.
Chains are for sequential, dependent work. Each job runs after the previous one succeeds. If any job in the chain fails, the rest don’t run.
Bus::chain([
new ProvisionAccount($user),
new SeedDefaultProjects($user),
new SendWelcomeEmail($user),
])->dispatch();
If ProvisionAccount fails, the welcome email doesn’t go out. Good. That user doesn’t have an account yet.
Batches are for parallel work with a shared completion signal. Jobs in a batch can run concurrently. You can hook then(), catch(), and finally() callbacks.
Bus::batch([
new SyncStripeCustomer($user),
new SyncMailchimpContact($user),
new IndexInAlgolia($user),
])->then(function ($batch) {
Log::info('Synced user '.$batch->id);
})->dispatch();
The trap I see in code review: people use a chain because the jobs “should run in order”, when the order doesn’t actually matter and a batch would be twice as fast. Or people use a batch and then are confused that one job depends on another’s output. Pick by data dependency, not by what feels organized.
Idempotency is the part Laravel won’t do for you
This one took me longest to internalize. The Laravel queue docs mention it briefly and move on. Every retry should be safe to run.
That means: a job that charges a credit card needs an idempotency key. A job that sends an email needs a “have I sent this already?” check. A job that updates a database row needs to be idempotent or use firstOrCreate/updateOrCreate. The framework cannot do this for you. The framework will happily run your job twice and let your business logic decide what that means.
The pattern I default to now, especially around payments:
public function handle(Stripe $stripe): void
{
$key = "charge:{$this->order->id}";
if (Cache::has($key)) {
return; // already charged this order
}
$stripe->charge($this->order, idempotency_key: $key);
Cache::put($key, true, now()->addDays(7));
}
That’s the minimum. For more serious systems, the idempotency check has to live in the same transaction as the side effect, so a worker crash between “I just charged” and “let me record that I charged” doesn’t leave you double-charging. Laravel hints at this in the events section of the docs, but the responsibility is yours.
Horizon defaults I always change
I love Laravel Horizon. I do not love its defaults.
Things I change on every new project:
balanceMaxShift and balanceCooldown. The default auto-balance is aggressive. On low-traffic queues it churns workers up and down constantly. I set balance => 'auto', maxShift => 1, cooldown => 10 so it adjusts slowly and predictably.
Per-queue worker counts. Horizon will happily put your high priority queue on the same supervisor as low. I split them into separate supervisors with explicit min/max process counts. Email and webhook jobs go on medium. Anything user-facing goes on high with a smaller max. Reporting goes on low.
tries at the supervisor level. Horizon respects the per-job tries, but the supervisor default is 1, which silently overrides if you forgot to set it on the job. I set the supervisor tries to 5 as a floor.
'environments' => [
'production' => [
'supervisor-high' => [
'connection' => 'redis',
'queue' => ['high'],
'balance' => 'auto',
'maxProcesses' => 5,
'tries' => 5,
'timeout' => 60,
],
],
],
I covered this kind of config debt in more detail in my Laravel 12 in production post, but the queue config is the bit I tweak most often.
Failed jobs are a queue. Treat them like one
The failed_jobs table is not an archive. It is an inbox. I monitor its size and alert when it grows faster than I can investigate. If I ignore it, it grows until either I miss a real outage hiding inside what I assumed was noise, or I hit some other operational issue I should have caught earlier.
What I run now:
- A daily report of distinct exception messages from
failed_jobswith counts. Most of them tell me about a buggy job or a flaky dependency I’d forgotten about. - A retry policy by exception class, not by blanket retry. Network errors get retried automatically with a script. Validation errors do not, because retrying a validation error just produces the same failure.
- A 14-day TTL. After two weeks, failed jobs are either fixed or genuinely lost. Either way they shouldn’t sit in the table forever.
You can read more about how I think about production reliability across stacks on my about page.
What to do this week
If you only do one thing after reading this: open your Laravel app’s main job class, find your most-dispatched job, and ask one question. What happens if this job runs twice? If you have a clear answer (it’s idempotent, or you have an idempotency key, or it’s a no-op on second run), you’re fine. If you don’t, you have one bug. Fix that one before you do anything else.
Then go set maxExceptions on every job that talks to an external API. Then go to bed.