Bitnami Hadoop Virtual Machine


Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.

First steps with the Bitnami Hadoop Stack

Welcome to your new Bitnami application! This guide includes some basic information you will need to get started with your application.

How to import a Bitnami Virtual Machine?

Check the following instructions to import a Bitnami Virtual Machine:

Importing a Bitnami Virtual Machine in VirtualBox
  • Select the "File -> Import Appliance" menu option and select the .ova file downloaded from the Bitnami website. Then click "Continue".
  • Once it is imported, click the "Start" button in the VirtualBox toolbar.

For a detailed walkthrough, check our Virtualbox tutorial.

Importing a Bitnami Virtual Machine in a VMware product
  • Select the "File -> Import" menu option and select the .ova file downloaded from the Bitnami website. Then click "Continue".
  • Once the import is complete, click "Finish" to start the virtual machine.

For a detailed walkthrough, check our VMware tutorial, which uses VMware Fusion as an example. To learn how to use our virtual machines with other VMware products, refer to the VMware Workstation documentation or the VMware vSphere documentation.

What credentials do I need?

You need two sets of credentials:

  • The application credentials, consisting of a username and password. These credentials allow you to log in to your new Bitnami application.

  • The server credentials, consisting of an SSH username and password. These credentials allow you to log in to your Virtual Machines server using an SSH client and execute commands on the server using the command line.

What is the administrator username set for me to log in to the application for the first time?

Username: user

What is the administrator password?

Password: The administrator password to log in to your application is randomly generated during the first boot. Check the FAQ to learn how to retrieve it.

What SSH username should I use for secure shell access to my application?

SSH username: bitnami

What is my server IP address?

The IP address is displayed on screen at the end of the boot process, but you can check it at any time by running the following command:

  $ sudo ifconfig

Check server IP address

How do I get my SSH key or password?

You can obtain the SSH password from the virtual machine console when it starts up. Click here for more information.

How to access your application?

Once you have imported your Bitnami Virtual Machine, the IP address for your application is displayed on the virtual machine's login screen. Access the application via your browser by entering this IP address.

Check these instructions about how to remotely access the Bitnami application.

What are the default ports?

A port is an endpoint of communication in an operating system that identifies a specific process or a type of service. Bitnami stacks include several services or servers that require a port.

Remember that if you need to open some ports you can follow the instructions given in the FAQ to learn how to open the server ports for remote access.

Port 22 is the default port for SSH connections.

Bitnami opens some ports for the main servers. These are the ports opened by default: 80, 443.

What is the default configuration?

The stack provides a web panel to control the status of Hadoop. To access it, follow these steps:

Once inside authenticated, you will see a page like this:


Hadoop configuration files

The Hadoop configuration files are located at /opt/bitnami/hadoop/etc/hadoop, and the most relevant ones are:

  • /opt/bitnami/hadoop/etc/hadoop/ and /opt/bitnami/hadoop/etc/hadoop/ Configuration options for the scripts found in /opt/bitnami/hadoop/bin.
  • /opt/bitnami/hadoop/etc/hadoop/APP-site.xml: Site-specific configuration for each Hadoop service.

Find more details about how to configure Hadoop settings in Hadoop's official documentation.

Hadoop log files

The Hadoop log files for the specific services are created in the following directories:

  • /opt/bitnami/hadoop/logs: NameNode, Secondary NameNode, DataNode, Timeline server, History server, NodeManager and Resource Manager.
  • /opt/bitnami/hadoop/hive/logs: Derby, HiveServer2, Metastore and WebHCat.
  • /opt/bitnami/hadoop/hive/hcatalog/var/log: HCatalog.
  • /opt/bitnami/hadoop/pig/logs: Pig.

Hadoop ports

Each daemon in Hadoop listens to a different port. The most relevant ones are:

  • ResourceManager:
    • Scheduler: 8030.
    • Resource Tracker: 8031.
    • Service: 8032.
    • Web UI: 8088.
  • NodeManager:
    • Localizer: 8040.
    • Web UI: 8042.
  • Timeline Server:
    • Service: 10200.
    • Web UI: 8188.
  • History Server:
    • Service: 10020
    • Admin: 10033.
    • Web UI: 19888.
  • NameNode:
    • Service: 8020.
    • Web UI: 9870.
  • Secondary NameNode:
    • Web UI: 9868.
  • DataNode:
    • Data Transfer: 9866.
    • Service: 9867.
    • Web UI: 9864.
  • Hive:
    • Derby DB: 1527.
    • HCat/Metastore: 9083.
    • Hiveserver2 Thrift: 10000.
    • Hiveserver2 Web UI: 10002.
    • WebHCat: 50111.

All ports are closed by default. In order to access any of them, you have two options:

  • (Recommended) Create an SSH tunnel for accessing the port (refer to the FAQ for more information about SSH tunnels).
  • Open the port for remote access (refer to the FAQ for more information about opening ports).

How to start or stop the services?

Each Bitnami stack includes a control script that lets you easily stop, start and restart services. The script is located at /opt/bitnami/ Call it without any service name arguments to start all services:

$ sudo /opt/bitnami/ start

Or use it to restart a single service, such as Apache only, by passing the service name as argument:

$ sudo /opt/bitnami/ restart apache

Use this script to stop all services:

$ sudo /opt/bitnami/ stop

Restart the services by running the script without any arguments:

$ sudo /opt/bitnami/ restart

Obtain a list of available services and operations by running the script without any arguments:

$ sudo /opt/bitnami/

How to access the administration panel?

Access the administration panel by browsing to http://SERVER-IP/cluster/.

How to create a full backup of Hadoop?


The Bitnami Hadoop Stack is self-contained and the simplest option for performing a backup is to copy or compress the Bitnami stack installation directory. To do so in a safe manner, you will need to stop all servers, so this method may not be appropriate if you have people accessing the application continuously.

Follow these steps:

  • Change to the directory in which you wish to save your backup:

      $ cd /your/directory
  • Stop all servers:

      $ sudo /opt/bitnami/ stop
  • Create a compressed file with the stack contents:

      $ sudo tar -pczvf application-backup.tar.gz /opt/bitnami
  • Restart all servers:

      $ sudo /opt/bitnami/ start

You should now download or transfer the application-backup.tar.gz file to a safe location.


Follow these steps:

  • Change to the directory containing your backup:

      $ cd /your/directory
  • Stop all servers:

      $ sudo /opt/bitnami/ stop
  • Move the current stack to a different location:

      $ sudo mv /opt/bitnami /tmp/bitnami-backup
  • Uncompress the backup file to the original directoryv

      $ sudo tar -pxzvf application-backup.tar.gz -C /
  • Start all servers:

      $ sudo /opt/bitnami/ start

If you want to create only a database backup, refer to these instructions for MySQL and PostgreSQL.

How to enable HTTPS support with SSL certificates?

NOTE: The steps below assume that you are using a custom domain name and that you have already configured the custom domain name to point to your cloud server.

Bitnami images come with SSL support already pre-configured and with a dummy certificate in place. Although this dummy certificate is fine for testing and development purposes, you will usually want to use a valid SSL certificate for production use. You can either generate this on your own (explained here) or you can purchase one from a commercial certificate authority.

Once you obtain the certificate and certificate key files, you will need to update your server to use them. Follow these steps to activate SSL support:

  • Use the table below to identify the correct locations for your certificate and configuration files.

    Variable Value
    Current application URL https://[custom-domain]/
      Example: or
    Apache configuration file /opt/bitnami/apache2/conf/bitnami/bitnami.conf
    Certificate file /opt/bitnami/apache2/conf/server.crt
    Certificate key file /opt/bitnami/apache2/conf/server.key
    CA certificate bundle file (if present) /opt/bitnami/apache2/conf/server-ca.crt
  • Copy your SSL certificate and certificate key file to the specified locations.

    NOTE: If you use different names for your certificate and key files, you should reconfigure the SSLCertificateFile and SSLCertificateKeyFile directives in the corresponding Apache configuration file to reflect the correct file names.
  • If your certificate authority has also provided you with a PEM-encoded Certificate Authority (CA) bundle, you must copy it to the correct location in the previous table. Then, modify the Apache configuration file to include the following line below the SSLCertificateKeyFile directive. Choose the correct directive based on your scenario and Apache version:

    Variable Value
    Apache configuration file /opt/bitnami/apache2/conf/bitnami/bitnami.conf
    Directive to include (Apache v2.4.8+) SSLCACertificateFile "/opt/bitnami/apache2/conf/server-ca.crt"
    Directive to include (Apache < v2.4.8) SSLCertificateChainFile "/opt/bitnami/apache2/conf/server-ca.crt"
    NOTE: If you use a different name for your CA certificate bundle, you should reconfigure the SSLCertificateChainFile or SSLCACertificateFile directives in the corresponding Apache configuration file to reflect the correct file name.
  • Once you have copied all the server certificate files, you may make them readable by the root user only with the following commands:

     $ sudo chown root:root /opt/bitnami/apache2/conf/server*
     $ sudo chmod 600 /opt/bitnami/apache2/conf/server*
  • Open port 443 in the server firewall. Refer to the FAQ for more information.

  • Restart the Apache server.

You should now be able to access your application using an HTTPS URL.

How to create an SSL certificate?

OpenSSL is required to create an SSL certificate. A certificate request can then be sent to a certificate authority (CA) to get it signed into a certificate, or if you have your own certificate authority, you may sign it yourself, or you can use a self-signed certificate (because you just want a test certificate or because you are setting up your own CA).

Follow the steps below:

  • Generate a new private key:

     $ sudo openssl genrsa -out /opt/bitnami/apache2/conf/server.key 2048
  • Create a certificate:

     $ sudo openssl req -new -key /opt/bitnami/apache2/conf/server.key -out /opt/bitnami/apache2/conf/cert.csr
    IMPORTANT: Enter the server domain name when the above command asks for the "Common Name".
  • Send cert.csr to the certificate authority. When the certificate authority completes their checks (and probably received payment from you), they will hand over your new certificate to you.

  • Until the certificate is received, create a temporary self-signed certificate:

     $ sudo openssl x509 -in /opt/bitnami/apache2/conf/cert.csr -out /opt/bitnami/apache2/conf/server.crt -req -signkey /opt/bitnami/apache2/conf/server.key -days 365
  • Back up your private key in a safe location after generating a password-protected version as follows:

     $ sudo openssl rsa -des3 -in /opt/bitnami/apache2/conf/server.key -out privkey.pem

    Note that if you use this encrypted key in the Apache configuration file, it will be necessary to enter the password manually every time Apache starts. Regenerate the key without password protection from this file as follows:

     $ sudo openssl rsa -in privkey.pem -out /opt/bitnami/apache2/conf/server.key

Find more information about certificates at

How to force HTTPS redirection?

Add the following to the top of the /opt/bitnami/hadoop/conf/httpd-prefix.conf file:

RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule ^/(.*) https://%{SERVER_NAME}/$1 [R,L]

After modifying the Apache configuration files, restart Apache to apply the changes.

How to debug Apache errors?

Once Apache starts, it will create two log files at /opt/bitnami/apache2/logs/access_log and /opt/bitnami/apache2/logs/error_log respectively.

  • The access_log file is used to track client requests. When a client requests a document from the server, Apache records several parameters associated with the request in this file, such as: the IP address of the client, the document requested, the HTTP status code, and the current time.

  • The error_log file is used to record important events. This file includes error messages, startup messages, and any other significant events in the life cycle of the server. This is the first place to look when you run into a problem when using Apache.

If no error is found, you will see a message similar to:

Syntax OK

How to connect to Hadoop from a different machine?

For security reasons, ports used by Hadoop cannot be accessed over a public IP address. To connect to Hadoop from a different machine, you must the open port of the service you want to access remotely. Refer to the FAQ for more information on this.

Check the Hadoop ports section to see the complete list of the most relevant ports in Hadoop.

IMPORTANT: Making this application's network ports public is a significant security risk. You are strongly advised to only allow access to those ports from trusted networks. If, for development purposes, you need to access from outside of a trusted network, please do not allow access to those ports via a public IP address. Instead, use a secure channel such as a VPN or an SSH tunnel. Follow these instructions to remotely connect safely and reliably.

How to run a test job in Hadoop?

You can run jobs in Hadoop from the same computer where it is installed.

Hadoop bundles many examples that you can try. For instance, there is an example for obtaining an estimation of the Pi number's value. You can check it by running the following command:

$ hadoop jar /opt/bitnami/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100

When the job finishes, you will see an output similar to:

Estimated value of Pi is 3.14800000000000000000

In order to show a MapReduce example involving HDFS, you can try running the following commands for showing how many words begin with the letter "c" in a simple tongue twister:

$ echo "can you can a can as a canner can can a can" | hadoop fs -put - /tmp/hdfs-example-input
$ hadoop jar /opt/bitnami/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep /tmp/hdfs-example-input /tmp/hdfs-example-output 'c[a-z]+'
$ hadoop fs -cat /tmp/hdfs-example-output/part-r-00000

You will see the job resulted in the following output:

6       can
1       canner

How to connect to Hive?

The Bitnami Hadoop Stack includes Hive, Pig and Spark, and starts HiveServer2, Metastore and WebHCat by default.

How to connect to HiveServer2?

HiveServer2 is a server interface that enables remote clients to execute queries against Hive and retrieve the results. It listens to port 10000 by default.

In order to connect to HiveServer2, you have two options:

  • (Recommended): Connect to the HiveServer2 Thrift server (running on port 10000) through an SSH tunnel (refer to the FAQ for more information about SSH tunnels).
  • Open the HiveServer2 Thrift server's port 10000 for remote access (refer to the FAQ for more information about opening ports).

Once you have connected to the server through an SSH tunnel or you opened the port to allow the remote access, you can use the Beeline command-line utility. To connect to HiveServer2 using Beeline, run the following:

  • Connecting to HiveServer2 through an SSH tunnel:

    $ beeline -u jdbc:hive2://localhost:10000 -n hadoop
  • Connecting to HiveServer2 by opening the port 10000. (SERVER-IP is a placeholder, please replace it with the right value).

    $ beeline -u jdbc:hive2://SERVER-IP:10000

After some seconds, you will be able to access the prompt:

0: jdbc:hive2://localhost:10000>
How to access the HiveServer2 Web UI?

HiveServer2 has a Web UI which provides different features, such as logging, metrics and configuration information. It listens on port 10002. In order to access it, you have two options:

  • (Recommended) Access the HiveServer2 Web UI (running on port 10002) through an SSH tunnel (refer to the FAQ for more information about SSH tunnels).
  • Open the HiveServer2 port 10002 for remote access (refer to the FAQ for more information about opening ports).

How to access WebHCat?

HCatalog is a table and storage management layer for Hadoop. HCatalog is built on top of Metastore, another component of Hadoop. WebHCat is the REST API for HCatalog, and listens to port 50111 by default.

In order to access HCatalog, you have two options:

  • (Recommended): Access the WebHCat server (running on port 50111) through an SSH tunnel (refer to the FAQ for more information about SSH tunnels).
  • Open the WebHCat port 50111 for remote access (refer to the FAQ for more information about opening ports).

You can access WebHCat with the following commands:

  • Connecting to HCatalog through an SSH tunnel:

    $ curl -s 'http://localhost:50111/templeton/v1/status?'
  • Connecting to HCatalog by opening the port 50111 (SERVER-IP is a placeholder, please replace it with the right value).

    $ curl -s 'http://SERVER-IP:50111/templeton/v1/status?'

You should see the following output:


How to connect to Pig?

The Bitnami Hadoop Stack includes Pig, a platform for analyzing large data sets that consist of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

To use Pig, simply run:

$ pig

After a few moments, you will see the grunt prompt:


How to run Pig tutorial scripts?

In order to run the Pig tutorial scripts, you will first need to upload a file to HDFS:

$ hadoop fs -copyFromLocal /opt/bitnami/hadoop/pig/tutorial/data/excite.log.bz2 .

In this case we will run script1-hadoop.pig, which you can then run as following:

$ cd /opt/bitnami/hadoop/pig/tutorial
$ pig ./scripts/script1-hadoop.pig

The process takes some minutes, but once it finishes, you will find some output similar to the following indicating success:

Successfully read 944954 records (10409092 bytes) from: "hdfs://localhost:8020/user/hadoop/excite.log.bz2"

Successfully stored 13530 records (659954 bytes) in: "hdfs://localhost:8020/user/hadoop/script1-hadoop-results"


2018-02-20 09:11:36,947 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2018-02-20 09:11:36,976 [main] INFO  org.apache.pig.Main - Pig script completed in 3 minutes, 21 seconds and 433 milliseconds (201433 ms)

How to connect to Spark shell?

Spark provides a shell, which allows you to run Spark interactively. You can access the Spark shell with the following command:

$ spark-shell

After some seconds, you will see the prompt:


How to run Spark examples?

The Bitnami Hadoop Stack includes Spark, a fast and general-purpose cluster computing system. Spark includes several example programs.

In order to run Spark examples, you must use the run-example program. In order to estimate a value for Pi, you can run the following test:

$ run-example SparkPi 10

After some seconds, Spark will output the result:

2018-02-15 12:37:29,410 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.697757 s
Pi is roughly 3.1426351426351427