-->

Microsoft prepares a new SQL language for Big Data

It's refreshing to see Microsoft shed the last bits of its not-invented-here mentality and embrace new industry standards without conditions, like it did to Java 20 years ago. You see it rather clearly in its support for Hadoop and Big Data.

Earlier this year, Microsoft announced plans for a Hadoop File System-compatible data store called Azure Data Lake Store that could run large analytics workloads. Data Lakes are a new term coined by the Big Data industry for massive data stores that are to be acted on at a later time. While some Big Data is meant for real-time or immediate processing, Data Lakes are more, “set it aside and we'll get to it later.”

Which is how Microsoft describes Azure Data Lake Store. In a blog post, T. K. "Ranga" Rengarajan, Microsoft's corporate vice president for data platform, laid out the three parts of the Azure Data Lake, of which Store is one of the three.

It's a single repository that lets users capture data of any size, type, or format without requiring changes to the application as the data scales. Data can be securely stored, shared, and can be processed and queried from HDFS-based applications and tools.

Rengarajan also announced the Azure Data Lake Analytics, an Apache YARN-based service that's designed to scale to handle large Big Data workloads dynamically. Azure Data Analytics service will be based on U-SQL, a language that will "unify the benefits of SQL with the power of expressive code," as Rengarajan put it.

U-SQL's scalable distributed query capability enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database, and Azure SQL Data Warehouse.

Finally, there is Azure HDInsight, a fully managed Apache Hadoop cluster service with a broad range of open source analytics engines, including Hive, Spark, HBase, and Storm. Microsoft announced the general availability of managed clusters on Linux with an industry-leading 99.9% uptime SLA.

"Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages," Rengarajan wrote.

(Network World)

No comments:

Post a Comment