Big Data: Issues and Challenges

Introduction

Big Data | M9 Development
To begin a discussion on the issue of Big Data, it is worthwhile to first define the term “Big Data”.  According to [1], Big Data is defined as “technologies and initiatives that involve data that is too diverse, fast-changing or massive for conventional technologies, skills and infrastructure to address efficiently”.  This serves as a good definition as it enumerates the key issues in dealing with Big Data, namely, data that exhibits the following characteristics: it is fast-changing and/or massive, it does not fit into conventional data storage systems (i.e. relational database system) and is generated and captured rapidly [1].  While I believe this definition to be an accurate perception of Big Data in the scientific and business communities, it is worth noting that Big Data as a discipline is still in it’s infancy and thus open to different interpretations.  A quick study on the etymology of the word Big Data provides great insight into this (see [2]).  To further our understanding of Big Data, we will take a look at each of the main characteristics of Big Data as previously defined and discuss some of the primary issues that they introduce. Continue reading

Introduction to Bitmap Indexes

Introduction to Bitmap Indexes

To alleviate confusion, I will refer to bit-indexing and bit mapped indexing as a bitmap index.  Through my research I have seen these terms used interchangeably to define the same concept – bitmap indexes.

What is a bitmap index and how do they work?

Bitmap indexes are a mechanism, largely employed by Oracle databases, to increase search performance on large data sets.  Bitmap indexes are most effective when applied to columns that exhibit low cardinality.  In this case, cardinality represents the amount of unique values that a column may contain.  For example, a column called “Active” on a user account table would likely only contain two values: true or false (or active and disabled).  Regardless of the total number of tuples,  the values contained in this column is only two possible values.  This column exhibits low cardinality.  A column that exhibits high cardinality benefit less from creating an index on them and become candidates for a primary key – that is if each tuple contains a unique value. Continue reading