Implementing circuit breaker pattern from scratch in Python

Bhavesh Praveen
5 min readDec 13, 2020

--

We’ll briefly look into the circuit breaking pattern before jumping to code.

What is circuit breaking?

In real world applications, services might go down and start back up (or they might just stay down). The idea is that when you make a remote call(HTTP Request/RPC) to another service, there are chances that the remote call might fail. After a certain number of failed remote calls, we stop making remote calls and send a cached response or an error as a response. After a specified delay, we allow one remote call to be made to the failing server, if it succeeds, we allow the subsequent remote calls to be made to the server, if it did not succeed, we will continue sending a cached response or an error and will not make any remote calls to the failing service for some time.

When all services were working and the remote calls were returning without any errors, we call this state — “Closed”.

When the remote calls continued to fail and when we stopped making any more remote calls to the failing service, we call this state — “Open

After a certain delay, when we make a remote call to the failing service, the state transitions from “Open” to “Half-Open”. If the remote call does not fail, then we transition the state from “Half Open” to “Closed” and the subsequent remote calls are allowed to be made. In case the remote call failed, we transition the state from “Half Open”, back to “Open” state and we wait for a certain period of time till we can make the next remote call (in Half Open state)

State Transition Diagram; image src: https://martinfowler.com/bliki/CircuitBreaker.html

To know more, read this and this

Why do you need it?

  • To prevent a network or service failure from cascading to other services.
  • Saves bandwidth by not making requests over a network when the service you’re requesting is down.
  • Gives time for the failing service to recover.

Code Marathon

Let’s now try to build a simple circuit-breaker using Python

Disclaimer: This is in no way production ready. There are some excellent libraries that are available online and well tested. I’ve mentioned two of them here: circuit-breaker and pybreaker.

Let’s first decide on the api for the circuit breaker that we are going to build and also define the expected behavior.

I’m a big fan of retry library syntax. We change the API towards the end of the blog post.

Let’s define all the possible states

Let’s create a class that handles all of the circuit breaker logic.

Constructor takes the following parameters

  • func - method/function that makes the remote call
  • exceptions - an exception or a tuple of exceptions to catch (ideally should be network exceptions)
  • threshold - number of failed attempts before the state is changed to "Open"
  • delay - delay in seconds between "Closed" and "Half-Open" state

make_remote_call takes the parameters that the underlying remote call might need (func)

If it seems confusing, please take a look at the following snippet

make_request is passed as a first class function to CircuitBreaker class. The parameters required by make_request are sent through make_remote_call

Let’s now try to complete handle_closed_state and handle_open_state

handle_closed_state makes the remote call, if it is a success, then we update last_attempt_timestamp and return the result of the remote call.

If the remote call fails, then _failed_attempt_count is incremented. If _failed_attempt_count is lesser than threshold, then simple raise an exception. If _failed_attempt_count is greater than or equal to the threshold, we change the state to Open and finally an exception is raised.

handle_open_state first checks if the delay seconds has elapsed since the last attempt to make a remote call. If not, then it raises an exception. If delay seconds has elapsed since the last attempt then we change the state to "Half Open". Now we try to make one remote call to the failing service. If the remote call was successful, then we change the state to "Closed" and reset the _failed_attempt_count to 0 and return the response of the remote call. If the remote call failed, when it was in "Half Open" state, then state is again set to "Open" and we raise an exception.

Complete code

Now to test it out. Let’s create a mock server.

Install Flask and requests. Ipython is optional

pip install requests
pip install Flask
pip install ipython

Let’s create some endpoints to mock the server

Run the development server

export FLASK_APP=main.py; flask run

By default it runs on port 5000

Now to test it out. You can use these snippets to test it out.

Now open up a terminal and run the following commands.

(circuit-breaker) ➜  circuit-breaker git:(master) ✗ ipythonIn [1]: from circuit_breaker import CircuitBreakerIn [2]: from snippets import make_request, faulty_endpoint, success_endpointIn [3]: obj = CircuitBreaker(make_request, exceptions=(Exception,), threshold=5, delay=10)In [4]: obj.make_remote_call(success_endpoint)
Call to http://localhost:5000/success succeed with status code = 200
06:07:51,255 INFO: Success: Remote call
Out[4]: <Response [200]>
In [5]: obj.make_remote_call(success_endpoint)
Call to http://localhost:5000/success succeed with status code = 200
06:07:53,610 INFO: Success: Remote call
Out[5]: <Response [200]>
In [6]: vars(obj)
Out[6]:
{'func': <function snippets.make_request(url)>,
'exceptions_to_catch': (Exception,),
'threshold': 5,
'delay': 10,
'state': 'closed',
'last_attempt_timestamp': 1607800073.610199,
'_failed_attempt_count': 0}

Line 1 and Line 2 are just imports. In line 3, we are creating a CircuitBreaker object for make_request. Here, we're setting exceptions=(Exception,), this will catch all the exceptions. We should ideally narrow down the exception to the one that we actually want to catch, in this case, Network Exceptions, but we're going to leave it there for this demo.

Now make successive calls to the faulty endpoint.

In [7]: obj.make_remote_call(faulty_endpoint)In [8]: obj.make_remote_call(faulty_endpoint)In [9]: obj.make_remote_call(faulty_endpoint)In [10]: obj.make_remote_call(faulty_endpoint)In [11]: obj.make_remote_call(faulty_endpoint)In [12]: obj.make_remote_call(faulty_endpoint)
---------------------------------------------------------------------------
Traceback data ..........
RemoteCallFailedException: Retry after 8.688776969909668 secs In [13]: obj.make_remote_call(success_endpoint)
---------------------------------------------------------------------------
Traceback data......
RemoteCallFailedException: Retry after 6.096494913101196 secs

Try to make these calls as fast as possible. After the first five callls to the faulty_endpoint, the next call(Line 12) will not make an api-request to the flask server instead it will raise an Exception, mentioning to retry after a specified number of secs. Even if you make an API call to the success_endpoint endpoint (Line 13), it will still raise an error. It is in "Open" state.

Now, after the delay time has elapsed, if we make a call to the faulty endpoint, it will transition from Half-Open to Open state.

In [18]: obj.make_remote_call(faulty_endpoint)
06:21:24,959 INFO: Changed state from open to half_open
...
06:21:24,964 INFO: Changed state from half_open to open

Now, after the delay has elapsed, if we make a call to the success_endpoint, it will transition from Half-Open to Closed state

In [19]: obj.make_remote_call(success_endpoint)
06:25:10,673 INFO: Changed state from open to half_open
...
06:25:10,678 INFO: Changed state from half_open to closed
Out[19]: <Response [200]>

Finally, improving the API shouldn’t take a lot of time. I’ve added quick dirty version here

All code samples can be found here

Now we have a working circuit breaker. We could introduce response caching, monitoring and make it thread-safe. Errors could be handled better. More Exception types could help. All of these features are left as an exercise for the readers.

--

--

Bhavesh Praveen
Bhavesh Praveen

No responses yet