AWS S3 cost-cutting strategies: AWS services has been a big boon to a lot of companies like us, especially when it comes to IT infrastructure. We don’t have troubled nights over the capacity planning of an application anymore owing to the easiness, AWS and its services has brought to the table. Now, deploying the application with auto-scaling in AWS, and then later use the benchmarks for better planning has become a de-facto practice in today’s world.
Along with the flexibility the AWS services provides, it also comes with some pain points, that is the billing. As the number of services grows, the billing becomes a complex process. There are a few services in AWS which are very frequently used by most of the users, and at the top of that list is the S3 service. The security, scalability and durability which it offers are the primary reasons for its wide adoption. We at Factweavers, being one of the early adopters of the cloud services, also use S3 heavily for a wide range of things ranging from archiving to static website hosting. So I think it might be a good idea to share some of the cost optimisation strategies that can be employed in using S3.
AWS S3 Categories
Most of the new users to the world of AWS would not be much aware of the S3 categories that exist. There are actually 3 categories of storage that are coming under S3. It would be good to know each one of them and also the typical use cases involved with each. Let us see one by one.
1. S3 Standard
In layman’s terms, this means out of 100 billion documents stored per year, there might be a risk of losing 1 document
This essentially means that,out of 10000 hours, the unavialibility of data might be for an hour
Keeping the above points in mind, the standard S3 storage class would be ideal for operations like website hosting, content distribution, cloud applications/services,big data etc.
2. S3 RRS (Reduced Redundancy Storage)
As the name suggests, this class of S3 comes with reduced redundancy. This means less number of replicas. The redundancy offered by S3 RRS is 99.99%, which is less compared to the S3 standard,and hence comes at a lesser price compared to S3 standard. Since the redundancy is less, this is perfect for such data which are easily reproducible.
3. S3 Standard infrequent access
S3 Standard IA, as the name suggests is ideal for infrequently accessed data. Let us see why. While it has high capacity, low latency and the same redundancy that standard S3 offers, it is designed for low availability compared to S3 standard. The availability of S3 IA is about 99.99%.
Also the information retrieval from S3 -IA is chargeable and also it has a minimum storage period of 30 days as opposed to S3. And also it comes with a minimum storage size for 128KB for storage, which means if you insert/retrieve a document with size less than 128KB, you will be charged for 128 KB of data.
These features are designed, for data which are infrequently accessed, but when accessed needs to be available with low latency. This covers use cases such as disaster recovery data, data backups etc.
4. S3 Z-IA
This is a recent addition to the storage classes. S3 Z-IA, the one zone IA from S3 will restrict the availability of your data to just one zone instead of 3, which is the default. This is somewhat 20% lesser than that of the S3 IA and can be used for purposes where one is ready to have their data to be in a single zone.
5. AWS Glacier
Glacier is the perfect solution for the storage of data for a long time and purposes like archiving. This is because it is designed in such a way that the upload process is simple but the retrieval process is not that seamless. Here also there is a minimum period of storage for the data (90 days) and also the retrieval charges are employed.
The main use cases involves archiving, backup copies of databases, research data etc.
A summary of all of these storage services from AWS are listed in the table below:
|Features/S3 Variant||S3 Standard||S3 RRS||S3 -IA||S3 Z-IA||Glacier|
|Minimum storage tenure||Unlimited||unlimited||30 days||30 days||90 days|
AWS S3 Billing – the full picture
Another important factor which tends to be overlooked by most is the pricing structure of S3. Most of us presume that, since S3 is a storage service only the cost for storage is involved. But this is not the case and the storage costs account to only 80 per cent of the costs, rest is covered by other components. Let us see the different components in the pricing of S3
1. Storage costs
2. API request costs
3. Inter-region data transfer costs
Cost optimisation strategies (AWS S3)
Now since we have a good overview on the type of storage classes and the billing components, let us move on to the optimisation strategies involved in S3
1. Select the right category
The first and foremost factor to be considered is the selection of the right type of instance. One should thoroughly examine their requirements and then select the most appropriate S3 flavour for their use case. A simple example is that, the S3 RRS, is almost 20 percentage cheaper than the standard S3 and if one wrongly opt for standard variant when he/she has a requirement that is apt for the RRS, will end up paying 20% extra on their bills. So this shows the critical nature of the importance of selection of the right variant of S3. Being aware of the differences and more over existence of S3 variants is the proactive step here.
2. Organising the buckets
Another simple optimisation you can make is to properly organize the buckets according to different criteria of the contents. This applies when the contents inside the buckets have varying lifespans. It is advisable to segregate contents with similar with lifecycle to the maximum possible extent. Provisioning a bucket with a special identifier, which indicates the contents in those would go to Glacier would greatly helps all those who are using that S3 account.
Most times it happen that, buckets are randomly organised so that, people might move to the contents to wrong categories and hence incur more charges either for input or for retrieval.
3. Timing of archiving
Also planning for the archiving data at appropriate timings would be a good measure in cost reduction. We at Factweavers deals with a lot of log data and quite often we see this data gets irrelevant after a certain point of time. But we need to store these anyway for long terms owing to compliance or as per client recommendations. So what we do is to store the log data in AWS S3 standard/RRS for the required amount of time and along with the creation of the buckets for these logs, we will add lifecycle rules which will automatically move the data to Glacier (most times Glacier suits our long term storage requirements). This will free up space in S3 and also gets us a cheaper alternative too. Thus using lifecycle rules or moving data to appropriate buckets would save you money.
4. Saving on data costs
The inter EC2 data transfers under the same Availability zones are chargeable in AWS. Where as the data transfers from or to S3 across different Availability zones are free of cost. So architecturing data transfers in a way that data transfers happen from EC2 to S3 would be a good move to save price.
5. Static website hosting using AWS S3
Another intelligent move in the direction of cost optimisation would be using S3 to host static websites. This in turn can prevent some cost overruns as some might use EC2 instances for doing the same. Using S3 for static website hosting can costs several fractions less than EC2 alternatives.
6. Access key nomenclature
The names of the access keys for the buckets can also play a role in both speed and cost of the bills. According to Amazon’s official documentation:
“Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored in UTF-8 binary ordering across multiple partitions in the index. The key name dictates which partition the key is stored in.
Using a sequential prefix, such as time stamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.”
So, while naming , by prefixing a small hash value against the names of the access keys can significantly alter the above scenario by making the storage to fan out to other location rather than a targeted one and hence reducing the I/O operations, thus improving speed and cost.
7. Delete unwanted files
The cloud team should always be extra judicious to delete the unwanted files from the S3 system. What we have noted is that, while the data gets larger, we often forget to clean up the expired data and this might prove costly.
Another case of freeing up space is to delete the incomplete multipart upload operations residual files, which might occupy a significant amount of space
Also, deletion can also be considered to files that can be easily recreated.
8. Confidentiality of AWS S3 user credentials
Other important nuance to take care of is the confidentiality of AWS S3 user credentials. If you are the admin level user who holds control over provisioning access to the team members who wants access to AWS S3, it is advised that you give them temporary access keys/credentials that expire within the estimated task duration time. This ensures better tracking and wrong practices, like provisioning wrong storage classes etc.
9. Data compression
When dealing with big data, compressing the data can be a huge cost saver. This is that the human-readable formats such as JSON objects would occupy more space (2 times or more) than the machine-readable formats like binary. So a file’s size would be significantly smaller when compressed and this can result in a significant difference in the I/O operations.
10. Batching contents
Since the pricing is also based on the number of requests, it is always advisable to batch documents and transfer, rather than transfer the document of the document by document (object by object).
As a final conclusion, we can say that , by bringing in careful analysis and planning in the S3 use cases we are dealing with and also bringing in some attention in the S3 configurations, we can save significantly in the AWS monthly storage bills and hence reduce the cloud infrastructure costs.