Cloud SOA Semantics and Data Science Conference

This is a guest post prepared by the SOA, Semantics, & Data Science conference for the Data Community DC Blog to provide introduction/context to the types of technology that the conference focuses on, and its applications.

CSDS_Logo_v2Featuring 15th SOA for E-Gov Conference:

Conference Title: Cloud: SOA, Semantics, & Data Science Theme: The Changing Landscape of Federal Information Technology Dates: 9/10/2013  to 9/11/2013 Location: The Waterford, Springfield, VA Contact: Tammy Kicker ( Web Site:

Federal organizations are racing to capitalize on social, mobile and cloud computing trends to provide solutions for their agency mission needs.  At the same time, there is great pressure to spend less while improving capability, service, cost and flexibility.

This event is an open knowledge exchange forum for communities of practice in Cloud, SOA, Semantics, and Data Science.  It brings together thought leaders and experts from the federal and business communities to continue the conversation around best practices in advancing SOA, semantic technology and data science within the Cloud construct.

The event builds on the successes of two previous events: the Service-Oriented Architecture (SOA) e-Government Conferences and the Department of Defense SOA and Semantic Technology Conferences.

This event is focused on SOA, Cloud Computing, Semantics Technology and Data Analytics.

SOA uses data as a service, which in turn requires dealing effectively with semantics.  Data science is used to process and analyze the data for those semantics to extract information.  Given the recent pronouncement by Dominic Sale, OMB (invited Keynote) that "all content is data", this conference is especially timely and focused.

Presenters and panelists will examine the benefits of governance frameworks and approaches Federal agencies are pursuing to increase the maturity and efficiency of their SOA, Cloud, Semantic Technology and Data Science.

The types of technology focused on and its applications are summarized in the table below.


Speaker Technology Applications Comments
Brand Niemann Data Science Data Visualization Tools (12 Leaders and Chalengers) OMB Analytic Data Sets and Public Data Sets for DC Data Science Community Director and Senior Data Scientist, Semantic Community and Founder of Federal SOA Community of Practice
Dominic Sale (invited) TBD TBD OMB Chief of Data Analytics & Reporting
Steve Woodward Cloud Computing in Canada New Agency CEO, Cloud Perspectives
David S. Linthicum Cloud and SOA Convergence Your Enterprise Cloud Computing Thought Leader, Executive, Consultant, Author, and Speaker
Denzil Wasson Semantics Cloud and SOA for Government Chief Technology Officer, Everware-CBDI
Vendor Showcase Multiple Multiple Always a Favorite at These Conferences
Use Cases and Pilots Cray Grph Computer Semantic Medline - National Library of Medicine & White House OSTP’s NITRD Federal Big Data Senior Steering WG Discovery of Disease Cause and Effect
Geoffrey Charles Fox Cyberinfrastructure Enabling e-Government, e-Business and e-More Or Less Anything Associate Dean for Graduate Studies & Research Distinguished Professor of Computer Science and Informatics, Indiana University
Dennis Wisnosky Semantics Mainstream Former DoD Business Mission Area CTO, member of the Enterprise Data Management Council
Michaela Iorga Cloud Computing and Security Government and Industry Senior Security Technical Lead for Cloud Computing and Chair, NIST Cloud Computing Security WG
Use Cases and Pilots Semantics & Other Mission/Business Transformation Needs Four Applications for Department of Veterans Affairs and Other Organizations

Your participation and suggested contributions are welcomed to continue to build and sustain this unique community of communities of practice to improve the delivery of government services in support of the US Federal Digital Government Strategy and Open Government Data Initiatives.


Brand Niemann, former Senior Enterprise Architect and Data Scientist with the US EPA, completed 30 years of federal service in 2010. Since then he has worked as a data scientist for a number of organizations, produced data science products for a large number of data sets, and published data stories for Federal Computer Week, Semantic Community and AOL/Breaking Government.

A Revolution in Cloud Pricing: Minute By Minute Cloud Billing for Everyone

Google IO wrapped up last week with a tremendous number of data-related announcements. Today's post is going to focus on Google Compute Engine (GCE), Google's answer to Amazon's Elastic Compute Cloud (EC2) that allows you to create and run virtual compute instances within Google's cloud. We have spent a good amount of time talking about GCE in the past, in particular, benchmarking it against EC2 here, here, here, and here.clock The main GCE announcement at IO was, of course, the fact that now **anyone** and **everyone** can try out and use GCE. Yes, GCE instances now support up to 10 terabytes per disk volume, which is a BIG deal. However, the fact that GCE will use minute-by-minute pricing, which might not seem incredibly significant on the surface, is an absolute game changer.

Let's say that I have a job that will take just a thousand instances each a little bit over an hour to finish (a total of just over a thousand "instance hours"). I launch my thousand instances, run the needed job, and then shut down my cloud 61 minutes later. Let's also assume that Amazon and Google both charge about the same amount, say $0.50 per instance per hour (a relatively safe assumption) and that Amazon's and Google's instances have the same computational horsepower (this is not true, see my benchmark results). As Amazon charges by the hour, Amazon would charge me for two hours per instance or $1000.00 total (1000 instances x $0.50 per instance per hour x 2 hours per instance) whereas Google would only charge me $508.34 (1000 instances x $0.50 per instance per hour x 61/60 hours per instance). In this circumstance, Amazon's hourly billing has almost doubled my costs but the impact is far worse.

If I want to try to mitigate the over charge, I can run the job with fewer instances but for a longer time. One option would be to run 100 instances for just over 10 hours each. This setup would then cost me $550 (100 instances x 11 hours per instance x $0.50 per instance per hour). If I am exceedingly price sensitive, I could run a single instance for a 1001 hours and get the same job complete at a total cost of $500.50. At this point, I am only getting overcharged $0.50 cents but, if you are willing to wait 1000 hours for your results, why use the cloud at all?

Ok, now let's say completing the task is incredibly important to you and time is of the essence. In this case, let's throw 5,000 instances at the problem which now takes just over 12 minutes to solve (let's call this 13 minutes). Running these 5,000 instances in GCE would cost $541.66 (5000 instances x 13/60 hours per instance x $0.50 per hour per instance) whereas the same run in Amazon would cost $2500 (5000 instances x 1 hour per instance x $0.50 per hour per instance)!!!!

With GCE, I don't have to worry about this overcharge until I hit the 10-minute minimum charge window. Thus, whenever I use GCE, I should simply throw as many instances as possible at the problem without thinking as the price is going to wind up about the same in either case. Or, put another way, look at the best case that GCE provides (I get my job done in 13 minutes for about $540) whereas for the same amount of money ($550), Amazon completes this task in 10 hours. Which one would you choose?

This is the true beauty of the cloud. GCE's pricing scheme incentivizes users to take full advantage of the cloud (massive parallelization for bursty computation) whereas Amazon's does not.  When using GCE, I will spin up as many instances to get the job done as fast as possible. With Amazon, I will not due to the billing overcharges. Even with all other things equal, GCE wins every time. Once users get used to getting immediate results, they won't go back.

My guess is that Amazon changes their hourly billing practices much sooner than later.