Our focus on making OneSignal more reliable
One of our engineering team's top priorities this quarter has been ensuring that OneSignal is fast and reliable.
Regrettably, over the last few days we've fallen short of our expectations. We've experienced several incidents that caused notifications to be delayed instead of being delivered immediately.
We'd like to explain the cause of these issues and explain how our team is working on resolving these:
-
We identified cases where parts of our notification queue management have O(n^2) complexity. Due to recent increases in the number of notifications we're sending, these cases resulted in significantly increased computational requirements on our backend. As of today, all of these cases have been resolved by reducing these to linear or constant complexity algorithms.
-
Our notification queuing system is beginning to hit the limitation of the processing power available to it. As a result, brief issues can result in extended notification delays. We are adding several new servers so that in the case of an event that does result in a backlog, these delays are significantly reduced or eliminated.
-
OneSignal uses Redis for our delivery queue, and we have begun to approach the maximum number of queries per second supported by our Redis instance. This week we partially addressed this by implementing a caching layer to reduce the number of queries required, and we are now implementing support for running multiple Redis processes to distribute the operations across them.
These improvements will be fully rolled out over the coming days, and we will be carefully monitoring our system to ensure that we have minimized the frequency and duration of service issues. We report all service issues on our Twitter status account.
Longer-Term Solutions
OneSignal has been growing very rapidly. We currently have over 4,000 new clients signing up per week, which has increased the pace that we need to scale our system. With the completion of our recent Series A financing, we have begun to establish a dedicated SRE team to keep OneSignal running smoothly 24/7.
Our team is committed to making sure OneSignal is both the most powerful, and more importantly the most reliable notification delivery platform around. Thank you for bearing with us as we work to reach this goal.