Should I get a Second Opinion on my cloud Architecture?
Will my solution scale to meet the rising demand?
Has my cloud architect missed something?
Is Serverless computing a cheaper way to do things?
Will my team get offended?
As the owner of the technology in your organisation, these are all questions that hound you as the need to scale arises. As long as this is not yet a problem, you have time to be proactive and largely everything will turn out fine.
One thing to keep in mind is that the relationship of load to resource utilisation is NOT linear. This means that if your system is running at a 20% load when serving 100 concurrent users, it does not mean that you will be able to do 500 users at 100% utilisation. This value is generally lower and the most probable cause is a race-condition occurring on some of the shared resources.
As this point we assume you have already done the performance testing and are planning on doing this activity. These are tasks which are elaborate and intense and we could very well lose our main objective.
If you are still planning on one then there is a ton of literature on how to do this right and here are a few links to some of them which we have used as well.
If things did not look very good on the performance test then there are some fundamental things you can check to ensure the basics are covered. From a cloud architecture as well as general scalable architecture, there are 2 broad categories:
You isolated the point-of-failure to a single process / service
The problem is in some complex set of processes and which one is exactly not know
Under what conditions do these services start to fail?
If the first case, you will need to assess if this is the best possible implementation or there are alternatives.
Is there a possibility to decouple this service / process by introducing a message queue or a similar system?
Is there a way to scale only these services horizontally?
All these will cost you time and money and you will be the best judge.
What were the conditions under which it failed? This could provide a hint as to where the melt-down occurred.
In some cases, scaling horizontally buys you time but it is always better to handle the problem.
What may the effort be in splitting the app server into microservices. More often than not, this is not complex as it sounds. The developers are the best ones to answer this question.
As a general rule of thumb, it is always better to make the application logic asynchronous. More on this a little later.
Okay, here are some tips to make applications more scalable. Asynchrony is your best friend.
Split functionality into micro services so that they can be individually scaled. Why is this? This is because not all functionality / services is resource intensive and hence all need not be scaled equally.
Make the inter-dependancy asynchronous. This is easier said than done but this is your best friend. The standard industry example is transcoding of media to different formats when uploaded by the user. In this case, you don't want the user to wait till his/her uploaded has been transcoded into all formats. This is not just bad user experience but also blocks new users if the concurrency increases.
Certain flow logic of the application needs to change in order to accommodate the asynchronous approach. For example, adding some kind of a progress indicator on the user-interface till being notified that transcoding process is complete.
With this you will be able to individually scale each of the services and hence more fine-grained control on scaling. You will also be able to utilize some of the cloud features such as Auto-scaling and their large-scale robust queuing services like Kafka, simple-message-queue etc. which are commodity services these days and cost nothing.
This kind of architecture will also give you advantages of utilising cloud architecture features like auto-scaling.
Another advantage is hot-patching software where you will not have to necessarily go through a down-time to patch / update the software.
Another quick way to do this is to get a fresh pair of eyes look into this. Typically a Cloud Architect will be able to look at the reports and identify a probable cause.
We at Mezzlink also will be able to help. Please click here and fill out your details and we are more than happy to provide some feedback at no cost.
Hope this was helpful. Cheers!!