Friday, April 12, 2013

Future of Big Data in 3 Prezis



Once every few years comes along a revolutionary piece of technology that causes massive disruptions in the existing landscape because it caters to an unfulfilled need in a remarkable way at the right time and carves out a new market in the process. In my “relatively” short IT career I have been fortunate enough to be involved up-close with several of these game changing technologies including – MPP databases, Linux,  cloud computing, Hadoop, and something I am really excited about these days – Big SQL!

If you are already familiar with Big SQL – that’s great … you can skip over to further in this post. If not, all I’m going to say is that Big SQL is a natural extension to Hadoop and big data analytics, which Leon Katsnelson explains very succinctly in this first prezi:

(Click on Start Prezi, wait for it to load and use arrow buttons below the prezi to navigate forward ... better viewed fullscreen - use button on bottom right)




If your interest is peaked, you might as well get to know a bit more about Big SQL by going through this second prezi titled Big SQL Overview:




And finally this third prezi will tell you about how to go about getting some hands-on experience with  Big SQL:



Useful links:
Take the free online course on SQL for Hadoop

Wednesday, December 7, 2011

small data, BIG data on clouds

I have been talking about databases on the cloud for quite some time now. In earlier blog posts I've mentioned how cloud-o-nomics can accelerate testing and development of database solutions, get databases up-and-running quickly on public clouds, as well as using private clouds for deploying enterprise class database workloads in-house.

In a recent Chat with the Lab webinar experts in data management and cloud technologies - Leon Katsnelson (@katsnelson) from IBM and Uri Budnik (@uribudnik) from RightScale - reviewed several options for running databases (specifically DB2) in both public and private clouds, and even mentioned no cost options for doing so. They also presented an option for test-driving next generation database technology in the cloud with free credits thrown in by Amazon.

But the thrust of this webinar was around "Big Data". Its not surprising why analysts like IDC rate cloud computing and big data amongst the hottest technologies for 2012. Sure they are hot on their own but combine the two and even more magic happens. When IBM talks about big data you hear about the 3Vs that characterize this space - Volume, Variety, and Velocity (learn more about these when you watch the webcast recording below). The presenters in this webcast went further and talked about the 4th V, i.e. Value that is unleashed for big data by utilizing cloud economics.

The logic behind it is actually quite simple. We know big data sets involve very large volumes of data that require dozens, sometimes hundreds or even thousands of servers to process these data sets in parallel using paradigms like MapReduce for deriving insights. Traditional data center computing would require quite intensive capital investment to purchase and setup a cluster for processing big data, that may be hard to justify if the hardware utilization is low e.g. running a Hadoop job for only a few hours a day. Instead if you use cloud computing you can start up 100 servers costing 30 cents each per hour so an hour long big data job would only cost about $30, and after an hour you could shutdown the servers without incurring further costs.

The speakers also talked about BigInsights (IBM's hadoop powered solution for big data) and setting up a Hadoop cluster in minutes on IBM or Amazon cloud using pre-built, pre-configured BigInsights cloud images and server templates. And if you are interested in big data but don't have the skills or want learn how to quickly run hadoop clusters on the cloud, you can take free courses online on Big Data University.

One question that came up during this webcast was around how do you get your big data sets loaded into the cloud. So when you watch the recording be sure to listen to the Questions and Answers part towards the end.

Below is the recording of this webinar titled Leveraging Clouds for Small and Big Data...



If you want to be informed about new big data webinars from IBM, do sign up for the Big Data Insights Newsletter.

Monday, June 20, 2011

Quickly deploying Private Clouds for Database workloads

While the IT industry at large is flocking towards cloud computing (as is evident from triple digit growth in cloud usage) it is a rarity (at least so far) to see Enterprise IT departments touting cloud usage for workloads such as databases. And there are good reasons for it.

The situation is not much different from early 2000's when I was championing use of Linux for running database servers. While a large number of people I talked to understood the benefits of Linux, and many were starting to use Linux for file and print servers, they were invariably hesitant to use Linux for database workloads.

After all database systems are a backbone of virtually every enterprise application and hold sensitive data that needs to be secured, protected, backed-up reliably, and be available for recovery in the event of a disaster. Almost no enterprise IT decision maker wants to put their job on the line and start deploying leading edge technologies that have not yet been proven for enterprise-class workloads.

This attitude is quite prevalent when I talk to IT managers and DBAs at my company's clients, that tend to be some of the largest companies in banking, insurance, and retail sectors. Most of these folks get Cloud Computing, and understand its advantages but shy from being the trailblazers. So public domain clouds are typically not an option (other than for skills building and experimentation).

But what if the data was on the cloud and yet within the confines of your company's IT security parameter? I'm talking about Private Clouds residing in your company's datacenter, within your firewall where you can enforce your company's IT policies such as access and authentication protocols. The reaction I tend to get to this question is that building Clouds from scrtach is not an easy proposition. "We don't have the skills or the budget to hire cloud consulting services" or "do you know how hard it is to get any budget approved for new hardware these days?"

So what if you could cobble up a private cloud without any special skills and within couple of days or even hours? And what if it could be done by mostly repurposing your existing hardware and only adding one small server that does the cloud automation, virtualization, and provisiong magic right out of the box? By this time in the conversation I can see the interest building up, and thats no surprise because the recently launched Workload Deployer from IBM offers pretty compelling value pretty quickly. This is a follow-on offering to what was previously known as the WebSphere Cloudburst Appliance.

This 2U form-factor appliance comes pre-loaded with enterprise-class middleware packaged as virtual machine images and patterns that can be deployed onto virtualized servers in a private cloud with just a few mouse clicks. Examples of pre-loaded middleware included WebSphere Application Server and DB2 database server.

These pre-bundled DB2 images and patterns are well suited for rapidly provisioning standardized vitual database servers that can be used for development, test and variety of other web and other workloads. Databases can also be deployed in pairs in a high availability (HADR) configuration.

If you want further details about how to build and rapidly deploy databases in a private cloud, be sure to attend this free webinar on June 29.

Thursday, April 28, 2011

Cloud turns software release cycle on its head

I recall when the iPad first came out it was only available in the US and those of us closer to the North Pole (i.e. Canada) had to wait a few months to get one in the stores here. In fact it is not uncommon to have products launched for core markets first and then released for secondary markets.

So I was quite surprised and impressed when a colleague pinged me to say that he had just published DB2 Express-C 9.7.4 database in the Cloud. And that too available on all of Amazon's regions i.e. US, Europe and AP. This is probably the first time in the history of IBM that a product has been released on the Cloud first before being available through traditional channels.

At first I wondered how could that be ... the DB2 Express-C 9.7.4 install images aren't expected to be generally available for download for at least a few more days. So how could there be server templates for this newly updated version already available for running in the cloud? After all, Cloud is not where we get most of the usage of DB2 i.e. its not our primary market (at least not yet).

Well I suppose Cloud-o-nomics changes everything. Before new install images of a product are made available for download they typically go through several stages of testing and then some final sanity checking. In case of DB2 Express-C, which is the free community edition of the enterprise-class DB2 database from IBM, the final stages of testing involves installing it on several free/low-cost operating systems that would typically not get coverage during test cycles of the paid editions of DB2. In the normal test cycles for the paid editions of DB2 it is exercised on operating systems that IBM officially supports and are used in a business environment. However since DB2 Express-C is used by the community at large, and many members of this community prefer to use Windows Home Editions and free community versions of Linux distributions, IBM does try to test it on some of those systems.

If you're wondering what does the Cloud have to do with this? Well, in the case of DB2 Express-C it helped us cut down a significant amount of effort testing on operating systems that are not standard in an enterprise setting. While our test labs have lots of servers with Enterprise class operating systems and we can grab any of those to do testing, for the lower-cost/free OSes we need to spend time setting up systems/VMs with these OSes and their newer versions. The Cloud however saves us the effort of setting up such systems from scratch. We can just grab an existing OS image or server template on the cloud, launch it instantaneously, install our code, and make sure everything is okay - all in a matter of minutes. And it only costs us 10's of cents doing this as opposed to scrounging for free machines or putting in budget requests many months in advance to get new systems (which these days typically get rejected because these system would only be utilized for a very short duration).

Another reason we could get the DB2 Express-C 9.7.4 images out on the Cloud earlier than the regular images for download, was the use of RightScale cloud management platform and their concept of Server Templates. We already had older versions of DB2 Express-C for running on Amazon EC2 using RightScale Server Templates. DB2 was just an attachment to this template, and at launch time the server template would use an existing OS image to install and configure DB2. So for this new release, all we had to do was replace the attachment with newer version of DB2 code, and the template would then automatically install and configure this new DB2 release on the underlying operating system. So once this updated server template was launched, not only could we test the new version rather easily on this OS, but without any additional effort we also got a newer version of the server template that could be made public (with a click of just a few buttons) so others could also run this new DB2 Express-C code in the Cloud.

In conclusion, if you want to try out the latest release of DB2 Express-C, you can do so on the Amazon Cloud by running this DB2 Express-C 9.7.4 Database Server Template for Ubuntu 10.04** LTS, or you can wait a few days for the regular install images to become available for download. BTW, if you want to go the Cloud route you will need to have Amazon and (free) RightScale accounts and this video on ChannelDB2.com from Bradley Steinfeld walks you through this simple process.

BTW, did I mention that there is no charge for using DB2 Express-C database or Ubuntu linux or RightScale (if you have a free account), but you would still pay for use of Amazon's cloud services, statring at 8.5cents an hour.

** Yes we would have liked to go out with Ubuntu 11.04 but the underlying OS image was not yet available on RightScale, so we fell back on any OS image that we had previously worked with.

Monday, March 21, 2011

Hypervisor Edition for Database on your Private Cloud

While Cloud Computing is all the rage these days, most enterprises are wary of putting their sensitive or confidential data on an internet accessible cloud. So how do you take advantage of Cloud Computing and reap its benefits while still keeping your valuable data protected in a secure vault-like environment that you've built over the years in your enterprise data-centers?

That's where Private Clouds come in, enabling your Enterprise IT department to take the efficiency of your in-house data-centers resources to new levels and delivers IT infrastructure (server, networking and storage) and possibly even enterprise middleware as on-demand services.

A key component of a private cloud is a workload deployer that can provision these resources as virtual machines on-the-fly with custom configurations as and when required by project leads in the various lines of business.

As far as private cloud workload deployers are concerned, IBM WebSphere Cloudburst Appliance (WCA) requires a special mention because it comes preloaded with variety of middleware for provisioning enterprise-class applications. The software comes bundled in what are called Hypervisor Editions, which are essentially virtual images based on VMware ESX (for Linux and Windows) or PowerVM (AIX). These images however can be launched with custom parameters and options and can be configured to connect with other middleware to create complete enterprise application environments with web, app, and database servers.

Preloaded middleware includes WebSphere Application Server and DB2 database server. DB2 is a highly available and secure database management system that leading enterprises all over the world are using for their most mission critical needs. And DB2 pureScale is a highly scalable and continuously available edition of DB2 that can cluster dozens of servers for extreme demand workloads.

I will likely do future posts to delve into greater details about various items introduced in this post, however in the interim you can learn more about Hypervisor Editions, and WebSphere Cloudburst Appliance, and watch this demo to see how you use WCA to do pattern based workload deployments to provision application and database servers in your private cloud:

Saturday, October 2, 2010

Cloudify your Databases Faster

Up and running with your databases on Amazon is now even faster. IBM recently published AMIs for DB2 Express-C 9.7.2. For those not in the know, DB2 Express-C is the free edition of IBM's enterprise class relational and XML database - DB2 for Linux and Windows (and Mac OS). And although DB2 Express-C AMIs have been available for quite some time, the newer AMIs include a web-based interface for initial setup and config, which makes the process a lot quicker.



The DB2 Express-C 9.7.2 AMIs are available both 32-bit and 64-bit platforms and are based on SUSE Linux Enterprise Server (SLES) 10. There is no license fees for running these development-use AMIs so you only pay for hourly EC2 and any other AWS charges. But since these are available through Dev Pay, you first need to subscribe (which in Dev Pay speak implies clicking on the "Purchase" button) to use these AMIs. Here are the links to Dev Pay entries to subscribe to DB2 Express-C 32-bit and 64-bit AMIs.

Once subscribed you'll need to know the AMI IDs to launch them. Here are the details:

US Region:
32-bit ami-1b30db72 ec2-dev-ibm-images/ibm-db2-express-c-9.7.2-32-bit.manifest.xml
64-bit ami-1930db70 ec2-dev-ibm-images/ibm-db2-express-c-9.7.2-64-bit.manifest.xml

AP REGION:
32-bit ami-97c7b8c5 ec2-dev-ibm-images-ap/ibm-db2-express-c-9.7.2-32-bit.manifest.xml
64-bit ami-7bc6b929 ec2-dev-ibm-images-ap/ibm-db2-express-c-9.7.2-64-bit.manifest.xml

There is also a 5-min video walking through the process of launching DB2 AMIs using the Amazon console but it has yet to be updated for the new web-based DB2 AMI initial config.

There are also Ubuntu 10.04 based DB2 Express-C AMIs if you prefer. And if you want to run DB2 on a different OS, well, since DB2 Express-C is free, you can simply download DB2 Express-C and create an AMI on the Linux or Windows OS of your choice.

Thursday, August 19, 2010

Which database image to use on cloud?

I often get asked: I want to run an Enterprise-class database in the Cloud, what are my options and which image / service should I use? In this post I'll focus on the IBM DB2 database for Linux (, Unix, and Windows) and cover several Cloud options for it. I won't go into too much detail for now ... each option merits its own post. So I'll keep this posting brief, include some relevant links and resources, and as I write detailed post for each option I'll start linking them from here.


PUBLIC CLOUDS
1. IBM Dev/Test Cloud
Demo | Learn More

2. Amazon Elastic Compute Cloud (EC2)
Quickstart Video | Tutorial | Webcast | Free AMIs | Paid AMIs | FAQ | DB2 on Ubuntu AMI | RightScale Templates



PRIVATE CLOUDS
3. IBM WebSphere Cloudburst Appliance
Demo | Learn More

4. VMware Virtual Appliances
DB2 Express-C Appliance from IBM | DB2 Express-C on Ubuntu Appliance from 3rd party | DB2 Enterprise Evaluation Appliance

The forecast for databases is partly cloudy.

The forecast for databases is partly cloudy.