In this article, you will learn how to set up an S3 bucket, launch a SageMaker Notebook Instance and run your first model on SageMaker. Amazon SageMaker is a fully-managed machine learning platform that enables data scientists and developers to build and train machine learning models and deploy them into production applications.
Table of Contents:
- What is amazon S3 bucket?
- How to create an S3 bucket?
- Create an S3 bucket?
- Differences between AWS regions and Availability Zones
- How to launch a SageMaker Notebook Instance
- Run A SageMaker example
- Last action before closing SageMaker
What is amazon S3 bucket?
In simple terms is Amazon Simple Storage Service for the Internet designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.
A dedicated S3 bucket is used for:
- Data Store for training models
- Model Artifact storage (SageMaker)
How to create an S3 bucket?
After signed in correctly and switched to a role with the right permissions an S3 bucket can be created. Search in Services S3 (It may already be listed under 'Recently visited services') or otherwise
The screen shot below will appear when 'S3' is selected. Select the 'Create bucket' button.
The screen shot below should appear.
- Enter a Bucket name; the system will give instructions to the format to be used
- The Region should be the nearest server to your current location preferably. Use the drop down list to find it.
- Select the 'Next' button in the bottom right hand corner.
Please note that if you are planning to use SageMaker the data has to be in the same region as where you are training the model otherwise Sagemaker will throw an error that the data are not available.
Click next then the following appears with the following options:
- Versioning: enable it if you want to keep multiple versions of an object in the bucket.
- Server access logging: provides detailed records for the requests that are made to a bucket. Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and an error code, if relevant.When logging is enabled, logs are saved to a bucket in the same AWS Region as the source bucket.
- Tags: used to track the storage cost for individual projects or groups of projects, (cost allocation tags) and it is associate with an S3 bucket will be used on the cost allocation report. i.e In the field 'Key' you can put the words 'created by' and in the Value field your name; as below.
- Default Encryption: used for protecting data while it travels to and from Amazon S3 and while it is stored on disks in Amazon S3 data centers. If selecting 'Default encryption', the options below will appear
After selecting 'Next' button, bottom right hand corner it gives you the ability to specify permissions as follow:
You can leave this screen as is and select the 'Next' button.
The above screen allows you to check the settings you have chosen.
Select the 'Create bucket' button if you are happy with your configuration.
Success!! The S3 bucket was successfully created.
Any file saved in this bucket is automatically replicated across multiple availability zones within the selected region.
Differences between AWS regions and Availability Zones
- Region: a physically isolated zone, completely independent from the rest of the regions. Nowadays, Amazon Web Services provides 16 regions spread worldwide. By default, resources aren’t replicated across regions unless you do so specifically. So a problem in a region can affect other services in that area, but can’t affect other regions.
- Availability Zone (AZ): Each AWS region has multiple, isolated locations known as Availability Zones. AZs in the same region connect between them through low-latency links so there is fast and easy communication between them. Almost all AWS provided services replicate by default between AZs in the same region to offer high-availability.
How to launch a SageMaker Notebook Instance
Now that we have created our S3 bucket, we can start on SageMaker.
To access SageMaker do the following:
- Select 'Service' button and a list of all the services will appear in the main part of the screen.
- Enter 'SageMaker' in the Find Services box. As below:
You have now entered the SageMaker service as displayed below. Please note that the Region is Frankfurt (red box) which is the same region as where the S3 bucket is located. Keep in mind that the region when creating the notebook instance has to be the same as of its of the S3 bucket otherwise you are not going to be able to access the data.
Create a 'Notebook instance'. Select the 'Notebook instance' button on the left hand side.
The screen below should appear.Select the 'Create notebook instance' button as circled in red above.
Then the following screenshot will be displayed:
Configure the notebook instance by doing the following:
- Give your instance a name, in the 'Notebook instance name' field
- Specify the 'Notebook instance type, there is a cost listing for the size of the instances chosen.Useful link https://aws.amazon.com/ec2/pricing/on-demand/
- Click on Additional Configuration where you can specify a Lifecycle Configuration and Volume size as shown below; Volume size determines the storage limit of the files stored in the Jupyter Notebook.
- Lifecycle Configuration provides you the ability to manually install additional libraries on your notebook instances and can be created separately.Useful link: https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html An example is:
- Specify permissions through selecting the appropriate 'IAM role' value or creating a new one.
- When creating a new one you can grant access to a specific S3 bucket (like the once created beforehand) or all S3 buckets in your account. Automatically it grant access any S3 bucket/object containing sagemaker in the name. Click 'Create role'.Click 'Create role' and a success message will pop up.
- VPC stand for Virtual Private Cloud enabling you to launch AWS resources into a virtual network that you've defined. This virtual network closely resembles a traditional network that you'd operate in your own data center, with the benefits of using the scalable infrastructure of AWS.
- Tag section you can specify Tag as shown below;
Select the button 'Create notebook instance' after reviewing all the settings.The screen shot below will be displayed, however, the status while it is being created will be 'Pending'. Once created it will have a status of 'Inservice' as it will need some time for the notebook instance to spin up.
Now select the option 'Open Jupyter', as circled in blue above
This will open in a new internet TAB/window, as below.
Run A SageMaker example
Select the option 'SageMaker Examples', as outlined in red below
The following should be displayed
- Expand the section 'Introduction to Amazon Algorithms', and select the option 'blazingtext_hosting_pretrained_fasttext.ipynb'. As shown in the screen shot below
- When you select the 'Use' button the screen below will appear
- After selecting the 'Create copy' button, a new tab will be open with the screen below
In the section below, we will amend the code to
- Point to the S3 bucket
- Specify folder prefix
- When you select the <RUN> command, a '*' will appear in the [ ] of the section being executed.When the sections complete successfully, a number for that section will appear in the [ ]
- When executing the section below we download, build and .tar the FastText pretrained language model.
- Once you have downloaded the model, an Endpoint needs to be created, this is done by executing the command below
Now you should be able to see the newly created model by selecting the 'Model' tab on the left hand side as circled in red in the diagram below. The newly created model is displayed in the listing in the middle.
- Once the model has been created successfully, an Endpoint needs to be created, this is done by executing the command below
- While the Endpoint command is executing you, you can check on it's progress by select the 'Endpoints' tab, as circled in red below. The status will be as below, "Creating"
- Once the 'Endpoint' has been created the status will be as below, 'InService'
Now the model is ready for use, we just need to pass it the information we wish to decode (work out what language it is). simple by running the following commands.
That's it the model correctly identified the language of the sentences. P.S The sentences can be amended, and the three sections about re-run to display the different output.
Last action before closing SageMaker
Once you have finished with this exercise, it is very important that we remember to Stop/Close the 'Endpoint' and then 'Stop' or delete the instance.
There are two ways to Stop/Close the Endpoint.
- One is to execute the command below;
- The other way to Stop/Close the Endpoint, is to select the Endpoint you have created from the Endpoint window. Select the 'Action' tab, from the drop down window select 'Stop'.
Now that we have Stopped/Close the 'Endpoint'. We need to stop the Instance we created. Select the 'Notebook instance' tab, select the instance you created, select the 'Actions' button circled in red
- From the drop down menu, select 'Stop', as below;
- Once the instance has been stopped it should appear as below
Now that the instance has been stopped, you can delete the instance by selecting it again, selecting the 'Actions' button, and from the drop down window selecting the delete option.
Please remember to always stop/close 'Endpoint' and the instance as there is a cost associated by keeping them open.
That's it! This should help you set up and running quickly on SageMaker, so you can actually focus your time on building you ML model.