Today (17/08/2018) I had the privilege to do a 2 hour guest lecture on “Expectations of the IT industry” for SLIIT Matara Branch BSc IT students. I hope they were able to learn something out of this presentation. You can reach the slide deck using the following link:
Today (08/07/2018) I had the privilege to do a 1 hour presentation on “Towards a Cloud Enabled Data Intensive Digital Transformation” for Jaffna University IT students. I hope they were able to learn something out of this presentation. You can reach the slide deck using the following link:
With the advent of Big Data, the enterprise applications nowadays are following a Data Intensive microservices based enterprise application architecture deviating more monolithic architectures, which we have been used to decades.
These data intensive applications should meet a set of requirements.
1. Ingest Data at Scale without a loss
2. Analyze data in real-time
3. Trigger action based on the analyzed data
4. Store the data at cloud-scale.
5. Need to run in a distributed and highly resilient cloud platform
The SMACK is such a stack, which can be used for building modern enterprise applications because it can performs each of the above objectives with a loosely coupled tool chain of technologies that are are all open source, and production-proven at scale.
(S – Spark, M – Mesos, A – Akka, C – Cassendra, K – Kafka)
- Spark – A general engine for large-scale data processing, enabling analytics from SQL queries to machine learning, graph analytics, and stream processing
- Mesos – Distributed systems kernel that provides resourcing and isolation across all the other SMACK stack components. Mesos is the foundation on which other SMACK stack components run.
- Akka – A toolkit and runtime to easily create concurrent and distributed apps that are responsive to messages.
- Cassandra – Distributed database management system that can handle large amounts of data across servers with high availability.
- Kafka – A high throughput, low-latency platform for handling real-time data feeds with no data loss.
Developer Tools for Serverless Applications on AWS
AWS and its ecosystem provide frameworks/ tools, which help you develop serverless applications on AWS Lambda and other AWS services. These will help you rapidly build, test, deploy, and monitor serverless applications.
There are multiple AWS / open source frameworks available in the market today to simplify serverless application development and deployment.
1. AWS Server Application Model (SAM)
2. Open Source third party frameworks (Apex, Chalice, Clauda.js, Serverless Express, Serverless Framework, Serverless Java Container, Sparta, Zappa)
1.) AWS SAM
For simple applications it is good to use normal Lambda console. However, for complex applications, it is recommended to use AWS SAM. AWS SAM is an “abstraction” of Cloudformation (Infrastructure As Code), which is optimized for serverless applications. It supports anything that Cloudformation supports and it is an Open Specification under Apache 2.0 License.
AWS SAM Local Client
AWS SAM Local is a complementary CLI tool that lets you locally test Lambda functions defined by AWS SAM templates.You can plug this client tool into any of your favorite IDE for higher fidelity testing and debugging.
Now AWS introduced a new IDE for serverless development called AWS Cloud9. This has integrated all the required components for serverless development and testing without relying on any other tool/ IDE.
However, the deployment aspect was missing in AWS SAM and recently that was also added to the AWS SAM to automate the incremental deployments into AWS Lambda. This further allows to roll-out new versions to production in an incremental manner.
2). Open Source third party frameworks (Serverless Framework)
Please do have a look at my previous blog for an article on the Serverless Framework.
1. Developer Tools for Serverless Applications – https://aws.amazon.com/serverless/developer-tools/
2, Comparing AWS SAM with the Serverless Framework – https://sanderknape.com/2018/02/comparing-aws-sam-with-serverless-framework/
4. AWS SAM Local – Build and Test Serverless Applications Locally – https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/
1. Authoring and Deploying Serverless Applications with AWS SAM: – https://www.youtube.com/watch?v=pMyniSCOJdA
2. Serverless Architecture Patterns and Best Practices – https://www.youtube.com/watch?v=_mB1JVlhScs
3. Building CI/CD Pipelines for Serverless Applications – https://www.youtube.com/watch?v=9uOl3B88bcY
4. AWS Serverless Application Model (SAM) Implementation is Now Open-Source – Apr 10, 2018 – AWS Launchpad San Francisco – https://www.youtube.com/watch?v=uxv1dOExq5U
You may use following Linux commands to try above. If you are new to Linux especially on a cloud infrastructure like AWS, the following would be useful.
AWS Instance Type: Amazon Linux (Redhat version)
1. lsblk – To check all volumes mounted
2. Then use the following to create a file system within the volume created
>> sudo mke2fs /dev/xvdf
3. Mount the created volume to an existing folder
>> sudo mount /dev/xvdf /mnt
4. Now check lsblk. You can see /mnt directory is mounted to /dev/xdvf folder.
5. Now you can copy files to the mounted folder
6. Id you want to unmount the volume you can use the following
>> sudo umount /mnt
Spark is a fast and general cluster computing system for Big Data. It is written in Scala Language. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. Apache Spark is built on one main concept, which is “Resilient Distributed Data (RDD)”.
(Python vs Scala vs Java) with Spark?
The most popular languages that Spark associated are Python and Scala. Both languages follow a similar syntax and compared to Java they are quite easy to follow. However compared to Python, Scala seems more faster mainly Spark is written in Scala and it overcomes the delay of having to go through another set of libraries to interpret if you chose to use Python. However, in general both are capable of doing the task in almost all the use cases.
Installing Apache Spark with Python
In order to complete this task, you are required to follow the following steps one by one.
Step 1: Install Java
- I assume you already have Java Development Kit (JDK) installed in your machines. In March 2018, the Spark supported JDK version is JDK 8.
- You may verify the Java installation
$ java -version
Step 2: Install Scala
- If you do not have Scala installed in your system, use this link to install it.
- Get the “tgz” bundle and extract it to the /usrlocal/scala folder (This is as a best practice)
$ tar xvf scala-2.11.12.tgz // Extract the scala into /usr/local folder $ su - $ mv scala-2.11.12 /usr/local/scala $ exit
- Then update the .bashrc to have SCALA_HOME and $SCALA_HOME/bin to the $PATH.
- After all, verify the scala installation
$ scala -version
Step 3: Install Spark
After installing both Java and Scala, now you are ready to download the Spark version. Use this link to download the “tgz” file.
P.Note: Also when downloading Apache Spark itself, be sure to install the latest version of Spark 2.3
$ tar xvf spark-2.3.0-bin-hadoop2.7.tgz // Extract the Spark into /usr/local/spark $ su - $ mv spark-2.3.0-bin-hadoop2.7 /usr/local/spark $ exit
- Then update the .bashrc to have SPARK_HOME and $SPARK_HOME/bin to the $PATH.
- Now you may verify the Spark installation
if all goes well, you will see a Spark prompt being displayed!. (Use :quit OR Ctrl+D to exit from the shell)
Step 4: Install Python
You may install Python using Canopy
Use this link to download the binaries to your system. (Use Linux(64-bit Python 3.5 Download for this blog)
Once you installed,Canopy, you have a Python development environment to work with Spark with all the libraries including PySpark.
Once all these installed you can try PySpark by just typing “pyspark” on the terminal window.
This will allow you to continue to execute your Python scripts on Spark.
The Default Security – (Permissions)
By default Lambda functions are “not” authorized to do access other AWS services. Hence, it is required to explicitly give access (permissions) to each and every AWS service.(i.e. accessing S3 to store images, accessing external databases such as DynamoDB, etc). These permissions are managed by AWS IAM roles.
Changing the Default Security – (Permissions)
If you are using the Serverless Framework you can customize the default settings by changing the serverless.yaml file (in the “iamRoleStatements:” block).
iamRoleStatements: - Effect: "Allow" Action: - "lambda:*" Resource: - "*"
The above will “Allow” all (“*”) to be invoked from the Lambda Function.
The Default Security – (Network)
By default, Lambda functions are not launched in a VPC. But you can change this by creating a Lambda function within a VPC. Furthermore, you can extend further by applying “Security Groups” as an additional layer of security within a VPC.
Changing the Default Security – (Network)
If you are using the Serverless Framework you can customize the default settings by changing the serverless.yaml file. Here is the code snippet that might use for this.
provider: name: aws runtime: python2.7 profile: serverless-admin region: us-east-1 vpc: securityGroupIds: - <security-group-id> subnetIds: - <subnet-1> - <subnet-2>
The Serverless Framework (https://serverless.com/framework/) is an open-source CLI for building serverless architectures to cloud providers (AWS, Microsoft Azure, IBM OpenWhisk, Google Cloud Platform, etc).
This article will brief you on the important steps you may require to get on with the AWS platform. This Framework works well with CI/CD tools and has the full support of AWS CloudFormation. With this it can provision your AWS Lambda functions,events, and infrastructure resources.
Step 1: Installing NodeJS
Serverless is a Node.js CLI tool so the first thing you need to do is to install Node.js on your machine. Refer the official NodeJS web site and download and follow the instructions to install NodeJS.
Serverless Framework runs on Node v6.5.0 or higher. You can verify that NodeJS is installed successfully by executing node -v in your terminal.
If all fine, we may proceed to the second step.
Step 2: Installing Serverless Framework
$ npm install -g serverless
Once installed, you may verify it.
$ serverless --version
Step 3: Setting up Cloud Provider (AWS) Credentials
The Serverless Framework needs access to your cloud provider’s account so that it can create and manage resources on your behalf. You may set it up with this Youtube link
Once above is completed, you may add the AWS credentials to your client machine to work as a CLI. You may use the following command to do that.
$ serverless config credentials --provider aws --key XXXXXXXXXXXXXXXXX --secret XXXXXXXXXXXXXXXXX --profile serverless-admin
This will basically add an entry to the credentials file, which is located in the $<home-folder>/.aws folder. (assumes the AWS user is serverless-admin)
[serverless-admin] aws_access_key_id = XXXXXXXXXXXXXXXX aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXX
If all above is OK, you are ready to create your first Serverless function (Lambda Function) with AWS.
Step 3: Creating your Serverless Project
You may build your projects based on the templates/ archetypes given by the framework.
By default, there are multiple templates/ archetypes given. (i.e. “aws-nodejs”, “aws-python”, “aws-python3″, “aws-groovy-gradle”, “aws-java-maven”, “aws-java-gradle”, “aws-scala-sbt”, “aws-csharp”, etc)
So lets create a “aws-python” project for fun…
$ serverless create --template aws-python --path hello-world-python
The above will create a folder named “hello-world-python”.
Just browse the folder. You would see two files.
1. handler.py – (This is the Serverless Function. Your Business Logic goes here)
Here just edit the handler.py to have a simple output.
def hello(event, context): print "Hello Crishantha" return "Hello World!"
2. serverless.yml – (The Serverless Function Configuration.)
P.Note: You may check the following configuration especially before you executing the rest of the key commands
If you are new to YAML and know JSON well, you may use https://www.jason2yaml.com link to convert JSON to YAML and vice versa.
provider: name: aws runtime: python2.7 profile: serverless-admin region: us-east-1
If all above is ok, you are good to go and deploy the function on AWS. So lets move to the next step. (Step 4)
Step 4: Deploy the Serverless Function
As explained, move to “hello-world-python” folder and execute the following command.
$ serverless deploy -v
The above will run the automated script creating all the background scripts including CloudFormation scripts to deploy the respective application. It is pretty awesome!
Step 5: Invoke the Serverless Function
Use the following to see the output.
$ serverless invoke -f hello -l
The above will return a simple “hello” for you (The output that you have mentioned in the handler.py)
It is that simple!!!
Step 6: Verify
If you want to verify all this, you can log in to the AWS console and see what you have done is reflected in the AWS Lambda area. Sure you will.
Step 7: Remove All
OK. We just did some testing. So probably you want to remove the serverless function and all its dependencies (IAM roles, Cloudwatch Log groups, etc)
- Move to the folder that the function that you want to delete.
- Execute the following
$ serverless remove
The above will clean the whole thing up!…
So, if you are a AWS Developer, you may find it very useful as much as I do at the moment. Happy Coding!
1. Serverless Framework Page – https://serverless.com/framework/docs/providers/aws/guide/services/
2. AWS Provider Documentation – https://serverless.com/framework/docs/providers/aws/
3. Serverless AWS Lambda Guide – https://serverless.com/framework/docs/providers/aws/guide/
4. Serverless Framework GitHub – https://github.com/serverless/serverless
5. YAML to JSON tool – https://www.jason2yaml.com
6. The Serverless Framework: A deep overview of the best AWS Lambda + API Gateway Automation Solution – https://cloudacademy.com/blog/serverless-framework-aws-lambda-api-gateway-python/
If you are having a “self-managed” MySQL EC2 instance, which can be connected to other EC2 instances in the same VPC or even other remote machines. In order to do this, there are a few configuration changes you need to carry out.
Here are the steps:
1. Connect to the remote MySQL remote EC2 instance. – On default you can access the MySQL using “root” user. However it is not advisable to access a MySQL instance remotely using the “root” user for security reasons.
[P.Note: Please make sure the Port 3306 is added to the inbound rules in the EC2 Security Group prior attempting this.]
2. Change the <bind-address> parameter to 0.0.0.0, allowing the access to all remote addresses. This needs to be changed in the /etc/mysql/mysql.conf.d/my.cnf file.
3. Restart the MySQL instance
mysql-ec2-instance>> sudo /etc/init.d/mysqld restart
4. Therefore, create a new MySQL user. – For this, you are required to sign in to the MySQL and execute the following command(s).
mysql-ec2-instance>> mysql -u root -p<root-password> Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'user123'; mysql> CREATE USER 'user'@'%' IDENTIFIED BY 'user123'; mysql> GRANT ALL PRIVILEGES ON *.* to user@localhost IDENTIFIED BY 'user123' WITH GRANT OPTION; mysql> GRANT ALL PRIVILEGES ON *.* to user@'%' IDENTIFIED BY 'user123' WITH GRANT OPTION; mysql> FLUSH PRIVILEGES; mysql> EXIT;
5. Now exit from the EC2 MySQL instance and try to log into the MySQL EC2 instance from your local machine.
your-local-machine>> mysql -h <ec2-public-dns-name> -u user -puser123
If all fine, you should be able to sign in to the remote EC2 instance without any issue!!
Currently, Docker is the most popular and widely used container management system. In most of our enterprise applications nowadays, we do tend to have components running in separate containers. In such an architecture, the “container orchestration” (starting/ shutting down containers and setting up intra-container linkages) is an important factor and the Docker community came up with a solution called Fig, which basically handled this requirement. This uses a single YAML file to orchestrate all your Docker containers and configurations. The popularity of Fig allowed Docker community to plug into its own Docker code base as separate component called “Docker Compose“.
1. Installing Docker Compose
You are required to follow the steps below:
$ sudo curl -o /usr/local/bin/docker-compose -L "https://github.com/docker/compose/releases/download/1.11.2/docker-compose-$(uname -s)-$(uname -m)"
Set the permissions:
$ sudo chmod +x /usr/local/bin/docker-compose
Now check whether it is installed properly:
$ docker-compose -v
2. Running a Container with Docker Compose
Create a directory called “ubuntu” to download an image from GitHub. This will basically download the latest ubuntu distribution as an image to the local.
$ mkdir ubuntu $ cd ubuntu
Once you do above, create a configuration file (docker-compose.yml) as an guideline to create an image.
docker-compose-test: image: ubuntu
Now execute the following:
$ docker-compose up // As an interactive job $ docker-compose up -d // As a daemon job
The above will read the docker-compose.yml and pull the relevant images and up the respective container.
Pulling docker-compose-test (ubuntu:latest)... latest: Pulling from library/ubuntu e0a742c2abfd: Pull complete 486cb8339a27: Pull complete dc6f0d824617: Pull complete 4f7a5649a30e: Pull complete 672363445ad2: Pull complete Digest: sha256:84c334414e2bfdcae99509a6add166bbb4fa4041dc3fa6af08046a66fed3005f Status: Downloaded newer image for ubuntu:latest Creating ubuntu_docker-compose-test_1 Attaching to ubuntu_docker-compose-test_1 ubuntu_docker-compose-test_1 exited with code 0
Now execute the following to see whether an ubuntu:latest image is downloaded and container is created.
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu latest 14f60031763d 4 days ago 120 MB
$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5705871fe7ed ubuntu "/bin/bash" 2 minutes ago Exited (0) 2 minutes ago ubuntu_docker-compose-test_1
4. Docker Compose (Official Web URL)