I am Crishantha Nanayakkara from Sri Lanka. I have a BSc(Hons) in Computing and Information Systems from University of London and a MBA in IT from University of Moratuwa. I have been in the IT industry for nearly 15 years and now working at the IT Apex body of Sri Lanka. (ICTA) My main research interest is e-Government and its evolution in the developing countries.

Home page:

Posts by admin

Connecting to a remote MYSQL instance on a AWS EC2 instance

If you are having a “self-managed” MySQL EC2 instance, which can be connected to other EC2 instances in the same VPC or even other remote machines. In order to do this, there are a few configuration changes you need to carry out.

Here are the steps:

1. Connect to the remote MySQL remote EC2 instance. – On default you can access the MySQL using “root” user. However it is not advisable to access a MySQL instance remotely using the “root” user for security reasons.

[P.Note: Please make sure the Port 3306 is added to the inbound rules in the EC2 Security Group prior attempting this.]

2. Change the <bind-address> parameter to, allowing the access to all remote addresses. This needs to be changed in the /etc/mysql/mysql.conf.d/my.cnf file.

3. Restart the MySQL instance

mysql-ec2-instance>> sudo /etc/init.d/mysqld restart

4. Therefore, create a new MySQL user. – For this, you are required to sign in to the MySQL and execute the following command(s).

mysql-ec2-instance>> mysql -u root -p<root-password>

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'user123';

mysql> CREATE USER 'user'@'%' IDENTIFIED BY 'user123';

mysql> GRANT ALL PRIVILEGES ON *.* to user@localhost IDENTIFIED BY 'user123' WITH GRANT OPTION;



mysql> EXIT;

5. Now exit from the EC2 MySQL instance and try to log into the MySQL EC2 instance from your local machine.

your-local-machine>> mysql -h <ec2-public-dns-name> -u user -puser123

If all fine, you should be able to sign in to the remote EC2 instance without any issue!!

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Docker on Ubuntu 16.04 LTS – [Part 04] Docker Compose

Currently, Docker is the most popular and widely used container management system. In most of our enterprise applications nowadays, we do tend to have components running in separate containers. In such an architecture, the “container orchestration” (starting/ shutting down containers and setting up intra-container linkages) is an important factor and the Docker community came up with a solution called Fig, which basically handled this requirement. This uses a single YAML file to orchestrate all your Docker containers and configurations. The popularity of Fig allowed Docker community to plug into its own Docker code base as separate component called “Docker Compose“.

1. Installing Docker Compose

You are required to follow the steps below:

$ sudo curl -o /usr/local/bin/docker-compose -L "$(uname -s)-$(uname -m)"

Set the permissions:

$ sudo chmod +x /usr/local/bin/docker-compose

Now check whether it is installed properly:

$ docker-compose -v

2. Running a Container with Docker Compose

Create a directory called “ubuntu” to download an image from GitHub. This will basically download the latest ubuntu distribution as an image to the local.

$ mkdir ubuntu
$ cd ubuntu

Once you do above, create a configuration file (docker-compose.yml) as an guideline to create an image.

  image: ubuntu

Now execute the following:

$ docker-compose up // As an interactive job
$ docker-compose up -d // As a daemon job

The above will read the docker-compose.yml and pull the relevant images and up the respective container.

Pulling docker-compose-test (ubuntu:latest)...
latest: Pulling from library/ubuntu
e0a742c2abfd: Pull complete
486cb8339a27: Pull complete
dc6f0d824617: Pull complete
4f7a5649a30e: Pull complete
672363445ad2: Pull complete
Digest: sha256:84c334414e2bfdcae99509a6add166bbb4fa4041dc3fa6af08046a66fed3005f
Status: Downloaded newer image for ubuntu:latest
Creating ubuntu_docker-compose-test_1
Attaching to ubuntu_docker-compose-test_1
ubuntu_docker-compose-test_1 exited with code 0

Now execute the following to see whether an ubuntu:latest image is downloaded and container is created.

$ docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
ubuntu                        latest              14f60031763d        4 days ago          120 MB
$ docker ps -a
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS                     PORTS                    NAMES
5705871fe7ed        ubuntu                        "/bin/bash"              2 minutes ago       Exited (0) 2 minutes ago                            ubuntu_docker-compose-test_1


1. How to install Docker Compose on Ubuntu 16.04 LTS

2. How to install and use Docker Compose on Ubuntu 14.04 LTS

3. How To Configure a Continuous Integration Testing Environment with Docker and Docker Compose on Ubuntu 16.04

4. How To Install WordPress and PhpMyAdmin with Docker Compose on Ubuntu 14.04

4. Docker Compose (Official Web URL)

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Docker on Ubuntu 16.04 LTS – [Part 03] Docker Networking

In my previous post on Docker images, we were able to run certain containers in the foreground. To recall it, here it is:

$ docker run -d -p 80 --name static_web crishantha/static_web  /usr/sbin/apache2ctl -D FOREGROUND

However, this container is not visible to outside since it runs in a private network. If you are to run this allowing to public means, you are required to bind the 80 port to some other port, which runs the container itself. For example, if we map the same port of 80 to the container, we should execute the above command as follows:

$ docker run -d -p 80:80 --name static-web crishantha/static-web  /usr/sbin/apache2ctl -D FOREGROUND

Once you do above, you are able to run the container from the outside IP. Hope this is clear now!

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Docker on Ubuntu 16.04 LTS – [Part 02] – Images

In my previous article, I stopped at the Docker Container management. In this article basically I will be touching the Docker Images.

A typical traditional Linux system to run, it basically needs two file systems:

  1. boot file system (bootfs)
  2. root file system (rootfs)

The bootfs contains the boot loader and the kernel. The user never makes any changes to the boot file system. In fact, soon after the boot process is complete, the entire kernel is in memory, and the boot file system is unmounted to free up the RAM associated with the initrd disk image.

The rootfs includes the typical directory structure we associate with Unix-like operating systems: /dev, /proc, /bin, /etc, /lib, /usr, and /tmp plus all the configuration files, binaries and libraries required to run user applications.

Here the root file system is mounted read-only and then switched to read-write after boot. In Docker, the root file system stays in read-only mode, and Docker takes advantage of a union mount to add more read-only filesystems onto the root file system and appear as only one file system. This gives the complete control of the all the file systems, which are added to the Docker container. Finally when a container is created/ launched, Docker will mount a read-write file system on top of all the other file system image layers. All the changes made to underneath images are basically stored in this read-write layer. However, the original copy is retained in underneath layers without and changes written to them. This read-write layer + other layers underneath  + base layer basically form a Docker container. (See the image below)

In Part 01 of this article, we created a container with an ubuntu image. You can see all the available images by,

$ sudo docker images

ubuntu     latest 07f8e8c5e660 4 weeks ago 188.3 MB

Seems now you have the “latest” ubuntu image with you. If you want a specific version image then you need to specify it as a TAG. i.e. ubuntu:12.04. So lets try that now.

$ sudo docker run -t -i --name new_container ubuntu:12.04 /bin/bash

Now, check the image status

$ sudo docker images

ubuntu     latest 07f8e8c5e660 4 weeks ago 188.3 MB
ubuntu     12.04  ac6b0eaa3203 4 weeks ago 132.5 MB

Further, if you want to delete one of the created images you can use,

$ sudo docker rmi <image-id>

While interacting with multiple images, there can be many unnamed and unwanted (dangling) images are being created. These can take a lot of space in the disk. Hence periodically it is required to purge  them from the system. Use the following to do the trick:

$ docker rmi $(docker images -q -f dangling=true)

Up to now, we used Docker run command to create containers. While creating it downloads the given image from the Docker Hub. This downloading to the local basically takes some time. If you want to save this time when you are creating the container, you can have the alternate route by first pulling the required template from the Docker Hub and then creating the container using the downloaded image. So here are the steps

// Pulling the image from Docker Hub
$ sudo docker pull fedora

// Creating a Docker Container using the pulled image
$ sudo docker run -i -t fedora /bin/bash

Now if you see, you will have 3 containers.

$ sudo docker ps -a

86476cec9907 fedora:latest ---
4d8b96d1f8b1 ubuntu:12.04  ---
c607547adce2 ubuntu:latest ---
Building your own Docker Images

There are two ways to do this.

Method (1). Via docker commit command

Method (2). Via docker build command with a Dockerfile (This is the recommended method)

To test method (1), first create a container using an already pulled image and then do some alteration to the image and then execute docker commit.

// Creating a Docker Container using an image
$ sudo docker run -i -t ubuntu:14.04 /bin/bash

// Alter the image
$ apt-get -yqq update
$ apt-get -y install apache2

// Committing the changes to the image
// Here, crishantha is the account created
// in the Docker Hub repository
// you may use Docker Hub or any other Docker repo
// 9b48a2b8850f is the Container ID of the contatiner

$ sudo docker commit 9b48a2b8850f crishantha/apache2

// List the Docker images
// Here the Docker altered image ID is shown
$ sudo docker images crishantha/apache2
crishantha/apache2 latest 0a33454e78e4 ....

To test method (2), you may create a Dockerfile at a given directory and specify the required changes needed for the image. For example, the Dockerfile can have the following lines, for an Ubuntu 14.04 image. FROM basically pulls the ubuntu 14.04 image and then RUN commands basically executes and add more layers to the image. EXPOSE will basically expose port 80 from the container.

Before executing the Dockerfile, it is good to create a new directory and create the DockerFile within that directory. Here the directory is called static_web.

FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y apache2

Once this is done, you can execute the Dockerfile by,

$ sudo docker build -t="crishantha/static_web" .

If all successful, it will return a image ID and further you can see it using docker images crishantha/static_web

Checking the Docker Image History

You can further check the history of the image by executing docker history <image Name/ image ID>

$ sudo docker history crishantha/static_web

Now you can execute the container by,

$ sudo docker run -d -p 80 --name static_web crishantha/static_web  /usr/sbin/apache2ctl -D FOREGROUND

The above will run as a detached process and you would see this by executing docker ps and you would see it running in the background as a Docker process.

If you use Nginx instead of Apache2 as the web server, you may add nginx -g “daemon off;” to the command. The daemon off; directive tells Nginx to stay in the foreground. For containers this is useful as best practice is for one container = one process. One server (container) has only one service.

Pushing Docker Images

Once an image is created we can always push it a Docker repository. If you are registered with Docker Hub, it is quite easy to push your image to it. Since it is a public repository, then if anyone interested can just pull it to his/her own Docker repository.

// If you have not already not logged in,
// Here the username is the one you registered
// with Docker Hub
$ sudo docker login
Username: crishantha
Password: xxxxxxxxxx

// If login is successful
$ sudo docker push crishantha/static_web

If all successful, you may see it is available in the Docker Hub.

Pulling Docker Images

Once it is push to Docker Hub, you may pull to to any other instance which runs Docker.

$ sudo docker pull crishantha/static_web
Automated Builds in Docker Hub Repositories

In addition to push our images from our set ups to Docker Hub, it allow us to automate Docker image builds within Docker Hub by connecting to external repositories. (private or public)

You can test this out by connecting your GitHub repository or Bitbucket repositories to Docker Hub. (Use the Add Repository –> Automated Build option in the Docker Hub to follow this process)

However, the Docker Hub automated builds should have a Dockerfile attached to it in the specific build folder. The build will go through based on the Dockerfile build that you specify here. Once the build is completed you can see the build log as well.

VN:F [1.9.22_1171]
Rating: 7.5/10 (4 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Docker on Ubuntu 16.04 LTS – [Part 01] – Installation and Containers

Docker is an open-source engine that automates the deployment of applications into containers released under the Apache 2 License. It adds an application deployment engine on top of a virtualized container execution environment. Docker aims to reduce the cycle time between code being written and code being tested, deployed, and used .

Core components of Docker:

  1. The Docker client and server
  2. Docker images
  3. Registries
  4. Docker Containers

Docker has client-server architecture. The docker binary acts as both the client and the server. As a client, the docker binary sends requests to docker daemon, process them and return.

Docker images are the building blocks or the the packaging aspect of Docker. Basically containers are launched from Docker images. These Docker images can be shared, stored, updated easily and considered highly portable.

Registries are there to store Docker images that you create in Docker. There are two types of Docker Registries 1) Private 2) Public. Docker Hub is the public Docker Registry maintained by the Docker Inc.

Containers are the running and execution aspect of Docker.

Docker does not care what software resides within the container. Each container is loaded on the same way as any other container. You can map this to a shipping container. A shipping container is not too much bothered about what it basically carries inside. It teats all the goods inside in the same way.

So the Docker containers are interchangeable, stackable, portable, and as generic as possible.

Docker can be run on any x64 host, which is running a modern Linux kernel.
(The Recommended kernel version 3.10 and later.)

The native Linux container format that Docker uses is libcontainer

The Linux Kernel Namespaces provides the isolation (file system, processes, network) which is required by Docker containers.

  • File System Isolation – Each container is running its own “root” file system
  • Process Isolation – Each container is running its own process environment
  • Network Isolation – Separate virtual interfaces and IP addressing

Resources like CPU and Memory allocation for each container happens using cgroups.(cgroups is a Linux kernel feature that limits, accounts for and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes)

Installing Docker on Ubuntu

Currently it is supported in wide variety of Linux platforms including Ubuntu, RedHat (RHEL), Dabian, CentOS, Fedora, Oracle Linux, etc.


1. A 64-bit architecture (x86_64 abd amd64 only) 32 bit not supported.

2. Linux 3.8 Kernel or later version.

3. Kernel features such as cgroups and namespaces should be enabled.

Step 1 – Checking the Linux Kernel

In order to check the current Linux Kernel

$ uname -r


So my Linux Kernel is 4.4 and should support Docker easily. (Since it is more than 3.8) and it is x86_64.

But if your Ubuntu Linux Kernel is less than 3.8 you may try to install 3.8.

$ sudo apt-get update
$ sudo apt-get install linux-headers-3.8.0-27-generic linux-image
-3.8.0-27-generic linux-headers-3.8.0-27

If above headers are not available, you can try referring the Docker manuals on the web. (

Once this is done you are required to update the grub and reboot the system

$ sudo update-grub
$ sudo reboot

After rebooting pls check the Linux Kernel version by typing uname -a or uname -r

Step 2 – Installing Docker

Make sure the APT works fine with the “https” and the CA certificates are installed

$ sudo apt-get update

Add the new GPG Key

$ curl -fsSL | sudo apt-key add -

Add Docker APT repositories to /etc/apt/sources.list.d/docker.list file

$ sudo add-apt-repository "deb [arch=amd64] $(lsb_release -cs) stable"

Now update the APT sources

$ sudo apt-get update
// Make sure you are installing from the Docker repositories and not from the Ubuntu repositories
$ apt-cache policy docker-ce

Finally, now you are in a position to install Docker and other additional packages using

$ sudo apt-get install -y docker-ce

Docker now is installed and the daemon must be started. Process also enabled to start on a reboot. You may check its availability using,

$ sudo systemctl status docker

You may get rid of having “sudo” in all the commands by adding the user to the docker group, which has the super user privileges.

$ sudo usermod -aG docker $(whoami)

Once you do above, you have to re-login to the system again.

If all OK, now you should check whether Docker was installed properly using

$ docker info

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 17.03.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
 Profile: default
Kernel Version: 4.4.0-66-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 990.6 MiB
Step 6 – Creating a Docker Container

There are two types of Docker containers.

1. Interactive Docker Containers

2. Demonized Docker Containers.

1. Interactive Docker Container

Once Docker is installed successfully, now we can try an create a Docker container instance. Prior to that it is good to check the Docker status by typing the sudo docker status command.

If everything is alright, you can go ahead and create the Docker “interactive” container instance.

$ sudo docker run --name crish_container -i -t ubuntu /bin/bash

$ root@c607547adce2:/#

The above will create a container named crish_container, which is an ubuntu template. If you do not specify a name, the system will create a dummy name along with an unique container ID attached to it. One created you will be given an “interactive shell” like below.


Here the c607547adce2 is the container ID. You can type exit to move away from the containers interactive session. Once exited from the interactive session you can see the container is being stopped. The container only runs as long as the interactive session (/bin/bash) is running. That is the reason why they called as “interactive” docker containers.

Now again you can check the docker status by,

$ docker info

Containers: 1
Images: 4
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 6
 Dirperm1 Supported: false
Execution Driver: native-0.2
Kernel Version: 3.13.0-32-generic
Operating System: Ubuntu 14.04.1 LTS
CPUs: 2
Total Memory: 1.955 GiB
Name: crishantha
WARNING: No swap limit support

2. Demonized Docker Container

Other than the interactive docker instance, there is another type called “demonized containers”. These can be utilized to execute long-running jobs. In this you will not get an interactive session.

You can create a demonized container by,

$ docker run --rm --name crish_daemon -d ubuntu /bin/sh

However, these demonized sessions are ended in the background and you may not be able to reattach as an “interactive” docker session.

If we deviate from the above example, If you want to run a “nginx” container on port 8080 you can execute the following:

$ docker run -d -p 8080:80 --restart=always nginx

Now try docker ps

$ CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
65438050ceb8        nginx               "nginx -g 'daemon of…"   5 seconds ago       Up 2 seconds>80/tcp     fervent_gates

You can see that the container port 80 is pointing to host port, which is 80 and this container is running as a daemon (background).

The above will run on port 8080 on your web browser. Try localhost:8080.

The same container you can run on any other port that you define as well. For example you can run it on 8000 using the following:

$ docker run -d -p 8000:80 --restart=always nginx

However both the URLs are pointing to the same nginx container.You can again try docker ps and see. You will see two entries for the same container.

P.Note: –restart=always allows the Docker daemon to restart the container once the system is restarted after a shut down. If you do not specify this here, you are required to restart the container manually.

Step 7 – Display the Container List

To show all containers in the system (running and not running)

$ docker ps -a

If you feel that the containers are not required to be around after running, you can use –rm tag while executing the docker run command.

To show all the running containers,

$ docker ps
Step 8 – Attach to a container

The container that you created with docker run command will restart with the same options that we have specified when we reattach to the same container again. The interactive session is basically waiting for the running container. You may use the following to reattach again.

$ docker attach crish_container
$ docker attach c607547adce2

Here c607547adce2 is the <container_ID>

Note: Sometimes you are required to press ENTER key to show the bash shell once you execute the attach command.

Step 9 – Extract the Container IP

There is no straight forward command to get the Container IP that you are running. You may use the following to get it:

$ docker inspect <CONTAINER ID> | grep -w "IPAddress" | awk '{ print $2 }' | head -n 1 | cut -d "," -f1
Step 10 – Starting and Stopping a container

To start

$ docker start crish_container

To stop

$ docker stop crish_container
Step 11 – Deleting a container
$ docker rm crish_container

Step 12 – Deleting all stopped containers
$ docker system prune
VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Hadoop 2.6 (Part 2) – Running the Mapreduce Job

This is the continuation of my previous article on “Installing Hadoop 2.6 on Ubuntu 16.04“. This article will explain how we run one of the examples given with the Hadoop binary.

Once the Hadoop installation is completed, you can run the “wordcount” example provided with the Hadoop examples in order to test a Mapreduce job. This example actually is bundled with the hadoop-examples.jar file in the distribution. (See the below steps for more details)

Step 1: Start the Hadoop Cluster, if not already started.

$ /usr/local/hadoop/sbin/
$ /usr/local/hadoop/sbin/

Step 2: Copy the text files that you are going to consider for a “wordcount” to a local folder (/home/hadoop/textfiles)

Step 3: Copy the text files (in the local folder) to HDFS.

$ echo "Word Count Text File" > textFile.txt
$ hdfs dfs -mkdir -p /user/hduser/dfs
$ hadoop dfs -copyFromLocal textFile.txt /user/hduser/dfs
Step 4: List the content of the HDFS folder.
$ hadoop dfs -ls /user/hduser/dfs
Step 5: If you were able to complete the step 4, you are good to go ahead with the MapReduce job.
$ cd /usr/local/hadoop
$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.1.jar wordcount /user/hduser/dfs /user/hduser/dfs-output
If the job was completed successfully, Congratulations!

You can either choose the command line or the web interface to display the contents of the HDFS directories. If you choose the command line you can try the following command.

$ hadoop dfs -ls /user/hduser/dfs-output


VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Hadoop 2.6 (Part 01) – Installing on Ubuntu 16.04 (Single-Node Cluster)

Sometime ago I wrote a blog on “Setting up Hadoop 1.x on Ubuntu 12.04“. Since its 1.x version it is no longer the correct blog to refer. So I thought to update it to 2.x version running on Ubuntu latest version, which is 16.04 LTS.


1. Make sure that have Java installed.

(Version 2.7 and later of Apache Hadoop requires Java 7. It is built and tested on both OpenJDK and Oracle JDK/JRE. Earlier versions (2.6 and earlier) support Java 6.)

You may visit Hadoop Wiki for more information.

2. Add a separate user and a group dedicated to Hadoop work. Here the group is called “hadoop” and the user is called “hduser”

Adding “sudo” to the command will allow hduser to have super user privileges

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser sudo

3. Enable SSH access to localhost for the hduser. (Hadoop requires SSH to manage its nodes.Hence you are required to enable it. Since this is a Single node setup you are required to enable SSH to localhost)

// Though Ubuntu is pre-installed with SSH, to enable SSHD (The server daemon) // you are required to install SSH again.
$ sudo apt-get install ssh

$ su - hduser

// Create a key pair for the instance.
$ ssh-keygen -t rsa -P ""

// Move public key to authorized_keys file to negate password verification while login using SSH
$ cat /home/hduser/.ssh/ >> /home/hduser/authorized_keys

// Now you can check SSH on localhost
$ ssh localhost

The above will create a public/private key-pair for secure communication via SSH. By executing the above command you basically create a directory ‘/home/hduser/.ssh’ and the private key is stored in ‘/home/hduser/.ssh/id_rsa’ and the public key is stored in ‘/home/hduser/.ssh/’ respectively.

Installing Hadoop

1. Download Hadoop from the Apache Hadoop mirrors and store in a folder of your choice. I am using hadoop-2.6.1.tar.gz distribution here.

// Now copy the hadoop tar to /usr/local and execute the following commands
$ cd /usr/local
$ sudo tar xzf hadoop-2.6.1.tar.gz
$ sudo mv hadoop-2.6.1 hadoop
$ sudo chown -R hduser:hadoop hadoop

2. Set the environment in /home/hduser/.bashrc

export JAVA_HOME=/opt/jdk1.8.0_66
export HADOOP_HOME=/usr/local/hadoop

Once you edit .bashrc, you may logout and come back and type

$ hadoop version

3. Configure Hadoop -

After setting up the prerequisites, you are required to set the environment for Hadoop. Here, there are a set of configuration files to be edited. However this is the minimum level configuration that you required to edit to get one Hadoop instance up an running with HDFS.

- $HADOOP_HOME/etc/hadoop/

- $HADOOP_HOME/etc/hadoop/core-site.xml

- $HADOOP_HOME/etc/hadoop/mapred-site.xml

- $HADOOP_HOME/etc/hadoop/hdfs-site.xml


Required to set the JAVA_HOME here

# The java implementation to use.
export JAVA_HOME=/opt/jdk1.8.0_66

(ii) core-site.xml

It is required to set the HDFS temporary folder (hadoop.tmp.dir) here in this configuration. This should be positioned within the <configuration>.. </configuration> tags.

  <description>A base for other temporary directories.</description>

  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>

Then you are required to create the temporary directory as mentioned in the parameters.

$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp

(iii). mapred-site.xml

The following should be inserted within the <configuration>.. </configuration> tags.

The mapred-site.xml is originally not in the folder. You have to rename/ copy the mapred-site.xml.template to mapred-site.xml before inserting.

  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.

(iv). hdfs-site.xml

The following should be inserted within the <configuration>.. </configuration> tags.

  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.

Once editing the above, it is required to create two directories, which will be used for the NameNode and the DataNode on the host.

$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
$ sudo chown -R hduser:hadoop hadoop_store

4. Formatting the HDFS -

When you first setup Hadoop along with HDFS, you are required to format the HDFS file system. This is like formatting a normal filing system that you get with an OS. However you are not supposed to format it once you are using a HDFS mainly because it will erase all your data on it.

$ hadoop namenode -format

5. Start the Single Node Hadoop Cluster

$ /usr/local/hadoop/sbin/


$ /usr/local/hadoop/sbin/
$ /usr/local/hadoop/sbin/

This will basically start a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.

Once you execute the above, if all OK, you will see the following output on the console.

This script is Deprecated. Instead use and
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-crishantha-HP-ProBook-6470b.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-crishantha-HP-ProBook-6470b.out
Starting secondary namenodes [] starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-crishantha-HP-ProBook-6470b.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-crishantha-HP-ProBook-6470b.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-crishantha-HP-ProBook-6470b.out

Execute the following to see the active ports after starting the Hadoop cluster

$ netstat -plten | grep java

tcp        0      0*               LISTEN      1002       48587       5439/java
tcp        0      0 *               LISTEN      1002       49408       5756/java
tcp        0      0 *               LISTEN      1002       46080       5439/java
tcp        0      0 *               LISTEN      1002       51317       5579/java
tcp        0      0 *               LISTEN      1002       51323       5579/java
tcp        0      0 *               LISTEN      1002       51328       5579/java
tcp6       0      0 :::8040                 :::*                    LISTEN      1002       56335       6028/java
tcp6       0      0 :::8042                 :::*                    LISTEN      1002       54502       6028/java
tcp6       0      0 :::8088                 :::*                    LISTEN      1002       49681       5909/java
tcp6       0      0 :::39673                :::*                    LISTEN      1002       56327       6028/java
tcp6       0      0 :::8030                 :::*                    LISTEN      1002       49678       5909/java
tcp6       0      0 :::8031                 :::*                    LISTEN      1002       49671       5909/java
tcp6       0      0 :::8032                 :::*                    LISTEN      1002       52457       5909/java
tcp6       0      0 :::8033                 :::*                    LISTEN      1002       55528       5909/java

6. Verify the Hadoop Cluster – You can check the availability of starting of above nodes by using the following command.

$ jps

3744 NameNode
4050 SecondaryNameNode
4310 NodeManager
3879 DataNode
4200 ResourceManager
4606 Jps

You may use the Web Interface provided to check the running nodes:

http://localhost:50070 - To see DataNodes

http://localhost:50090 - To see NameNodes

7. Stopping the Hadoop Cluster

You are required to execute the following command.

$ /usr/local/hadoop/sbin/

If you were able to complete all above test you have set up a successful single node Hadoop cluster!!

VN:F [1.9.22_1171]
Rating: 8.0/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

A Collection of IS Theories

Recently got to know about this Wiki related to IS Theories. As researcher, I am experiencing the difficulty of finding the right mix of theories in our research areas. If you an IS researcher, I am sure this Wiki will give you a concise view of most of the IS theories around. This may not give you everything, but I am sure it will give you a kick start to move forward. Kudos to whom have initiated this collaborative act.

Here is the link:

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Building SaaS based Cloud Applications


According to NIST, “cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (NIST, 2011).

The traditional approach incur a huge capital expenditure upfront along with too much excess capacity not allowing to predict the capacity based on the market demand. The Cloud Computing approach has the ability to provision computing capabilities without requiring the human interaction with the service provider. In addition to that, its ability to have a broad network access, resource pooling, elasticity and measured services are a few of the characteristics, which basically overpower the traditional hardware approach. As benefits, it can drastically cut down the procurement lead time, can produce better scalability, substantial cost savings (no capital cost, pay for what you use) with less management headaches in terms of operational costs.

Cloud Service Models

There are three (03) basic cloud service models such as IaaS (Infrastructure as a Service), Platform as a Service (PaaS) and Software as a Service (SaaS).

SaaS Service Model

In the SaaS cloud service model the consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage or even individual application capabilities with the possible exception of limited user specific application configuration settings. A typical SaaS based application primarily provide multi-tenancy, scalability, customization and resource pooling features to the client.


Multi-tenancy is the ability for multiple customers (tenants) to share the same applications and/ or compute resources. A single software/ application can be used and customized by different organizations as if they each have a separate instance, yet a single shared stack of software or hardware is used. Further it ensures that their data and customizations remain secure and insulated from the activity of all other tenants.

Multi-tenancy Models

There are three basic multi-tenancy models.

1. Separate applications and separate databases

2. One shared application and separate databases

3. One shared application and one shared database

Figure 1 – Multi-tenancy Models

Multi-tenancy Data Architectures

According to Figure 1, there are three types of multi-tenancy data architectures based on the way data is being stored.

1. Separate Databases

In this approach, each tenant data is stored in a separate database ensuring that the tenant can access only the specific tenant database. A separate database connection pool should be set up and need to select the connection pool based on the tenant ID associated with the logged in user.

2. Shared Database – Separate Schemas

In this approach, the data is stored in separate schemas in a single database for each tenant. Similar to the first approach separate connection pools can be created for each database schema. Alternatively a single connection pool also can be used and based on the connection tenant ID (i.e. using SET SCHEMA SQL command), the relevant schema is selected.

3. Shared Database – Shared Schema (Horizontally Partitioned)

In this approach, the data is stored in a single database schema. The tenant is separated from the tenant ID, which is represented by a separate column in each table in the schema. Only one connection pool is configured at the application level. Based on the tenant ID the database schema should be partitioned (horizontally) or indexed to speed up the performance.

Figure 2 – Multi-tenancy Data Architecture


1. NIST (2011), NIST Definition,

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Robotic Process Automation (RPA)


Robotic Process Automation (RPA) refers to the use of software “robots” that are trained to mimic human actions on the user interface of applications, and then reenact these actions automatically.

RPA Technology can be applied to wide range of industries,

  • Process Automation
    • Mimics the steps of rules based process without compromising the existing IT architecture, which are able to carry out “prescribed” functions and easily scale up or down based on the demand
    • The back office process automation –  (Finance, Procurement, SCM, Accounting, Customer Service, HR) can be expedited using data entry, purchase order issuing, creating of on-line access credentials and complex and connected business processes.
  • IT Support and Management
    • Processes in IT infrastructure management such as Service Desk Management, Network Monitoring
  • Automated Assistant
    • Call center software


There are several benefits that RPA can bring in.

1. Cost and Speed – On average RPA robot is a third of the cost of an FTE (Full Time Equivalent)

2. Scalability – Additional robots can be deployed quickly with minimal expenditure. Can train tens, hundreds or thousands of robots at exactly the same time through workflow creation.

3. Accuracy – Eliminates human error

4. Analytics – RPA tools can provide this with ease

Steps Involoved

1. Identify the business processes involved in your organization

2. Constant mapping of how the employees interact with the current business processes (There is no process re-engineering involved)

3. Find the business processes that require simple data transfers first and then move to complex ones (data manipulations and data transfers)

RPA Solutions

1. No proper “Open Source” solutions so far.

2. UI Path ( – Community Edition (Free for non commercial use)

3. Python based Automation –

4. Robot Framework – – Open Source, premature

5. Pega Open Span – Commercial Version -

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
admin's RSS Feed
Go to Top