Cellery  is an Open Source Framework, which facilitates you to implement the Cell Based Architecture , which was drafted by WSO2.
Cellery is a code-first approach to building, integrating, running and managing composite microservice applications on Kubernetes.
Using Cellery we are able to create/ composite microservices apps with code, push and pull them from docker hub and other registries, deploy into k8s and monitor. Cells have automatic API security and Web SSO.
. Cellery – The Cell based Architecture Implementation (WSO2) – https://wso2-cellery.github.io/
. Cell Based Reference Architecture (WSO2) – https://github.com/wso2/reference-architecture
If you are using VS Code for your development, the ENOSPC Error can be a common one. (Especially if you are Linux Debian user).
This basically happens due to the fact that, VS Code file watcher is running out of handles because the workspace is large and contains many files .
You can see the current limit by executing the following command:
$ cat /proc/sys/fs/inotify/max_user_watches
The limit can be increased to its maximum by editing
/etc/sysctl.conf and adding this line to the end of the file:
The above value then can be loaded to the system by executing the following:
$ sudo sysctl -p
If all goes well, you will have no issues with the watches on VS Code.
What is a Bastion Host?
Bastion hosts are instances that sit within your public subnet and are typically accessed using SSH (for Linux) or RDP (for Windows). It acts as a ‘jump’ server, allowing you to use SSH or RDP to login to other instance in private subnet.
High Availability (HA) can be ensured for Bastion hosts by having multiple bastion hosts in each availability zone, with each bastion host is mapped to an Auto scaling group
A NAT instance is, like a bastion host, lives in your public subnet. A NAT instance, however, allows your private instances outgoing connectivity to the Internet (to get updates), while at the same time blocking inbound traffic from the Internet.
It is required to use Elastic IP addresses for bastion hosts mainly if you are using high availability scenarios.
The following are the best practices while configuring a bastion host
1. Never place your SSH private keys within a bastion hosts/ server. As suggested, use SSH Agent Forwarding for this task to connect first to the bastion host then to other instances on the private subnets. This lets you keep the private keys only with your servers.
2. Make sure the security group on the bastion host to allow SSH (port 22) to connect only from your trusted hosts and never from 0.0.0.0/0 mask.
3. Always have more than one bastion. For example, having a bastion host for each Availability Zone (AZ).
4. Make sure to configure security groups on private subnets to accept SSH traffic only from the bastion hosts.
How to handle Bastion hosts via SSH Agent Forwarding?
The SSH agent handles signing of authentication data for you. When authenticating to a server, you are required to sign some data using your private key, to prove that you are. As a security measure most people sensibly protect their private keys with a pass phrase, so any authentication attempt would require you to enter this pass-phrase. This can be undesirable, so the ssh agent caches they key for you and you only need to enter the password once, when the agent wants to decrypt it.
The SSH agent never hands these keys to client programs, but merely presents a socket over which clients can send it data and over which it responds with signed data. A side benefit of this is that you can use your private key even with programs you don’t fully trust.
Another benefit of the SSH agent is that it can be forwarded over SSH. So when you ssh to host A, while forwarding your agent, you can then ssh from A to another host B without needing your key present (not even in encrypted form) on host A.
These SSH Agents can not only be used when the paraphrase is being used. This can be successfully used in Bastion hosts. Rather copying the PEM (rather the private key) to the Bastion host, it is more secure to hand this process to SSH Agents. That would be more secure and easy!. So here are the simple steps to follow if you are to do this task. However, if you are running this on heavily secured environment with well designed Security groups and NACLs, it is always good to have a complete idea before executing this. Otherwise you will end up having too many confusions. If all well, this works like a charm!
Step 1: Adding the private key (PEM file) to the key chain. This allows the user to access the private instances without copying to the bastion host. This adds an additional layer of security.
$ ssh-add -k <PEM_file_name>
Step 2: Check whether the private key is properly added to the key chain
$ ssh-add -L
The above will list all the keys added to the chain. Check whether the key you added is listed there.
Step 3: Access the Bastion Host (Public instance)
$ ssh -A ec2-user@<bastion-host-elastic-ip> [Here ec2-user is the user for the Linux instance]
Step 4: Access the private instance
$ ssh ec2-user@<private-instance-ip>
Today (17/08/2018) I had the privilege to do a 2 hour guest lecture on “Expectations of the IT industry” for SLIIT Matara Branch BSc IT students. I hope they were able to learn something out of this presentation. You can reach the slide deck using the following link:
Today (08/07/2018) I had the privilege to do a 1 hour presentation on “Towards a Cloud Enabled Data Intensive Digital Transformation” for Jaffna University IT students. I hope they were able to learn something out of this presentation. You can reach the slide deck using the following link:
With the advent of Big Data, the enterprise applications nowadays are following a Data Intensive microservices based enterprise application architecture deviating more monolithic architectures, which we have been used to decades.
These data intensive applications should meet a set of requirements.
1. Ingest Data at Scale without a loss
2. Analyze data in real-time
3. Trigger action based on the analyzed data
4. Store the data at cloud-scale.
5. Need to run in a distributed and highly resilient cloud platform
The SMACK is such a stack, which can be used for building modern enterprise applications because it can performs each of the above objectives with a loosely coupled tool chain of technologies that are are all open source, and production-proven at scale.
(S – Spark, M – Mesos, A – Akka, C – Cassendra, K – Kafka)
- Spark – A general engine for large-scale data processing, enabling analytics from SQL queries to machine learning, graph analytics, and stream processing
- Mesos – Distributed systems kernel that provides resourcing and isolation across all the other SMACK stack components. Mesos is the foundation on which other SMACK stack components run.
- Akka – A toolkit and runtime to easily create concurrent and distributed apps that are responsive to messages.
- Cassandra – Distributed database management system that can handle large amounts of data across servers with high availability.
- Kafka – A high throughput, low-latency platform for handling real-time data feeds with no data loss.
Developer Tools for Serverless Applications on AWS
AWS and its ecosystem provide frameworks/ tools, which help you develop serverless applications on AWS Lambda and other AWS services. These will help you rapidly build, test, deploy, and monitor serverless applications.
There are multiple AWS / open source frameworks available in the market today to simplify serverless application development and deployment.
1. AWS Server Application Model (SAM)
2. Open Source third party frameworks (Apex, Chalice, Clauda.js, Serverless Express, Serverless Framework, Serverless Java Container, Sparta, Zappa)
1.) AWS SAM
For simple applications it is good to use normal Lambda console. However, for complex applications, it is recommended to use AWS SAM. AWS SAM is an “abstraction” of Cloudformation (Infrastructure As Code), which is optimized for serverless applications. It supports anything that Cloudformation supports and it is an Open Specification under Apache 2.0 License.
AWS SAM Local Client
AWS SAM Local is a complementary CLI tool that lets you locally test Lambda functions defined by AWS SAM templates.You can plug this client tool into any of your favorite IDE for higher fidelity testing and debugging.
Now AWS introduced a new IDE for serverless development called AWS Cloud9. This has integrated all the required components for serverless development and testing without relying on any other tool/ IDE.
However, the deployment aspect was missing in AWS SAM and recently that was also added to the AWS SAM to automate the incremental deployments into AWS Lambda. This further allows to roll-out new versions to production in an incremental manner.
2). Open Source third party frameworks (Serverless Framework)
Please do have a look at my previous blog for an article on the Serverless Framework.
1. Developer Tools for Serverless Applications – https://aws.amazon.com/serverless/developer-tools/
2, Comparing AWS SAM with the Serverless Framework – https://sanderknape.com/2018/02/comparing-aws-sam-with-serverless-framework/
4. AWS SAM Local – Build and Test Serverless Applications Locally – https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/
1. Authoring and Deploying Serverless Applications with AWS SAM: – https://www.youtube.com/watch?v=pMyniSCOJdA
2. Serverless Architecture Patterns and Best Practices – https://www.youtube.com/watch?v=_mB1JVlhScs
3. Building CI/CD Pipelines for Serverless Applications – https://www.youtube.com/watch?v=9uOl3B88bcY
4. AWS Serverless Application Model (SAM) Implementation is Now Open-Source – Apr 10, 2018 – AWS Launchpad San Francisco – https://www.youtube.com/watch?v=uxv1dOExq5U
You may use following Linux commands to try above. If you are new to Linux especially on a cloud infrastructure like AWS, the following would be useful.
AWS Instance Type: Amazon Linux (Redhat version)
1. lsblk – To check all volumes mounted
2. Then use the following to create a file system within the volume created
>> sudo mke2fs /dev/xvdf
3. Mount the created volume to an existing folder
>> sudo mount /dev/xvdf /mnt
4. Now check lsblk. You can see /mnt directory is mounted to /dev/xdvf folder.
5. Now you can copy files to the mounted folder
6. Id you want to unmount the volume you can use the following
>> sudo umount /mnt
Spark is a fast and general cluster computing system for Big Data. It is written in Scala Language. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. Apache Spark is built on one main concept, which is “Resilient Distributed Data (RDD)”.
(Python vs Scala vs Java) with Spark?
The most popular languages that Spark associated are Python and Scala. Both languages follow a similar syntax and compared to Java they are quite easy to follow. However compared to Python, Scala seems more faster mainly Spark is written in Scala and it overcomes the delay of having to go through another set of libraries to interpret if you chose to use Python. However, in general both are capable of doing the task in almost all the use cases.
Installing Apache Spark with Python
In order to complete this task, you are required to follow the following steps one by one.
Step 1: Install Java
- I assume you already have Java Development Kit (JDK) installed in your machines. In March 2018, the Spark supported JDK version is JDK 8.
- You may verify the Java installation
$ java -version
Step 2: Install Scala
- If you do not have Scala installed in your system, use this link to install it.
- Get the “tgz” bundle and extract it to the /usrlocal/scala folder (This is as a best practice)
$ tar xvf scala-2.11.12.tgz // Extract the scala into /usr/local folder $ su - $ mv scala-2.11.12 /usr/local/scala $ exit
- Then update the .bashrc to have SCALA_HOME and $SCALA_HOME/bin to the $PATH.
- After all, verify the scala installation
$ scala -version
Step 3: Install Spark
After installing both Java and Scala, now you are ready to download the Spark version. Use this link to download the “tgz” file.
P.Note: Also when downloading Apache Spark itself, be sure to install the latest version of Spark 2.3
$ tar xvf spark-2.3.0-bin-hadoop2.7.tgz // Extract the Spark into /usr/local/spark $ su - $ mv spark-2.3.0-bin-hadoop2.7 /usr/local/spark $ exit
- Then update the .bashrc to have SPARK_HOME and $SPARK_HOME/bin to the $PATH.
- Now you may verify the Spark installation
if all goes well, you will see a Spark prompt being displayed!. (Use :quit OR Ctrl+D to exit from the shell)
Step 4: Install Python
You may install Python using Canopy
Use this link to download the binaries to your system. (Use Linux(64-bit Python 3.5 Download for this blog)
Once you installed,Canopy, you have a Python development environment to work with Spark with all the libraries including PySpark.
Once all these installed you can try PySpark by just typing “pyspark” on the terminal window.
This will allow you to continue to execute your Python scripts on Spark.
The Default Security – (Permissions)
By default Lambda functions are “not” authorized to do access other AWS services. Hence, it is required to explicitly give access (permissions) to each and every AWS service.(i.e. accessing S3 to store images, accessing external databases such as DynamoDB, etc). These permissions are managed by AWS IAM roles.
Changing the Default Security – (Permissions)
If you are using the Serverless Framework you can customize the default settings by changing the serverless.yaml file (in the “iamRoleStatements:” block).
iamRoleStatements: - Effect: "Allow" Action: - "lambda:*" Resource: - "*"
The above will “Allow” all (“*”) to be invoked from the Lambda Function.
The Default Security – (Network)
By default, Lambda functions are not launched in a VPC. But you can change this by creating a Lambda function within a VPC. Furthermore, you can extend further by applying “Security Groups” as an additional layer of security within a VPC.
Changing the Default Security – (Network)
If you are using the Serverless Framework you can customize the default settings by changing the serverless.yaml file. Here is the code snippet that might use for this.
provider: name: aws runtime: python2.7 profile: serverless-admin region: us-east-1 vpc: securityGroupIds: - <security-group-id> subnetIds: - <subnet-1> - <subnet-2>