-->

Big Data Technology for Manufacturing

In order to consider big data solutions for manufacturing in a holistic manner, the following diagram divides up big data into four primary components—analytics, data integration, data management, and infrastructure. In addition, Dell provides state-of-the-art big data solutions in part with the help of strategic partners like Cloudera, Intel, Microsoft, Oracle, and SAP along with its professional services arm to ensure a quality big data deployment that can address enterprise data processing requirements at scale.

Analytics Layer
With analytics, the strategy is to turn enterprise data assets into better and faster actionable insights through managing data, processing data, storing data, and eventually analyzing data.

For analytics, Dell’s Statistica for Big Data Analytics platform is a good option as it extends the Statistica portfolio with advanced natural language processing (NLP), entity extraction, interactive visualizations and dashboards, and distributed advanced analytic models across Hadoop, databases and database appliances. The unique aspect of Statistica is that it has over 4,000 different models created for different vertical markets, including some for manufacturing. The attractiveness of Statistica is that it has these models created so manufacturers will not have to start from the ground up to build models to analyze their data, whether it’s a model for manufacturing efficiency, or a model to look at failure points within a product.
Data Integration Layer
In building a solution focused on data analytics, a manufacturer must first take the steps necessary to implement the end-to-end solution. One key driver is how to integrate the data into a suite of big data tools in order to manage it, process it, store it, and analyze it. For data integration, Dell solutions like Boomi and Shareplex resonate in a significant way. Boomi is a data integration platform and is unique in that it allows you to integrate any type of data source (e.g. log data, social media data, sensor data, machine data, etc.) into an application, plus you can integrate that data source on premise or in the cloud. The Shareplex Connector for Hadoop is another important tool which is key here because most of the time the first step enterprises take with big data is to integrate their existing data assets from a relational database or enterprise data warehouse. Shareplex enables you to easily load and continuously replicate changes from an Oracle database to a Hadoop cluster. The manner in which the data transfer takes place is not simply a big download, but rather with Shareplex you replicate the Oracle data directly into Hive or HBase and HDFS environments, essentially streamlining this cumbersome task.

Data Management Layer
When building your big data solution, at an early point in the journey you must determine where you will put the data so that you can then manage the various types of data that will be a part of your eventual analysis. When dealing with diverse types of data, one option for the management layer is to deploy Hadoop as your data management platform. Hadoop allows you to collect, manage, analyze and store data in a scalable, flexible and cost effective solution. The reason this is key and the reason Hadoop is recommended is because Hadoop allows you to store the data in its native format. One of the biggest problems today with a relational database is before you can actually store data in the database you must clean it, parse it, and make it fit in a table, row, and field within the database. With Hadoop, you will not have to do any of that initial work. You can deliver the data as is and eventually when you want to start analyzing it that’s when you begin the cleaning process. This is why enterprises like it, where organizations find the value in Hadoop, because whether that data is structured, unstructured, or semi-structured, you can put it in Hadoop today. Hadoop is the tool of choice for the data management layer.

Infrastructure Layer
Successful big data deployments depend on reliable hardware infrastructures. The Dell PowerEdge R730xd, based on Intel® Xeon® processor technology, offers an exceptionally flexible and scalable, two-socket 2U rack server that delivers high performance processing and a broad range of workload-optimized local storage possibilities, including hybrid tiering. This is a hardware solution ideal to run the Hadoop distributed computing platform for a solution to big data problems.
Additionally, Dell, together with Cloudera and Intel, provides a turnkey, purpose built in-memory advanced analytics data platform. The Dell In-Memory Appliance for Cloudera Enterprise represents a unique collaboration of partners within the big data ecosystem. Together Dell, Cloudera and Intel deliver both the platform and the software to help manufacturers capitalize on high-performance data analysis by leveraging the Cloudera Enterprise in-memory features (Apache Spark) for interactive analytics and multiple types of workloads. Cloudera Enterprise also features
Impala for fast query and Cloudera Search for interactive search. The result—one tool for both processing and analytics.

Professional Services
If a manufacturing firm needs professional services to come in and build the entire solution, deploy and integrate, essentially build a solution from the ground up, Dell has the big data services to deliver that to you. In addition, there are the Dell Solution Centers—if you’re new to this technology and you don’t know how it will act in your environment, you can go to a Dell Solution Center to do a proof of concept without making a big investment—Dell offers this as a free service to their customers. Finally, Dell Financial Services can package up new and creative ways to finance these types of solutions so you don’t have to take it out of your capital expenditures but instead make it an operating expense.

Big Data Technology Stack for Manufacturers
The available technology stack for applying a big data methodology to manufacturing applications is growing by leaps and bounds. Here are the top level components of a big data initiative.
  • Big Data Software – the early focus is on end-to-end solutions for big data starting with integral software applications like Statistica for Big Data Analytics. With this content mining and analytics solution, you’ll transform complex and time-consuming manipulation of web-scale data resources into a fast and intuitive process. You can harvest sentiments from Twitter feeds, blogs, news reports, CRM systems, and other sources, and combine them with demographic and regional data to better understand market traction and opportunities.
  • Hadoop – In order for a manufacturing firm to extract value from an ever-growing onslaught of data, the organization needs next generation data management, integration, storage and processing systems that allow the company to collect, manage, store and analyze data quickly, efficiently and cost-effectively. The Hadoop distributed computing platform provides end-to-end scalable infrastructure, leveraging open source technologies, to allow you to simultaneously store and process large data sets in a distributed environment for data mining and analysis, on both structured and unstructured data. Hadoop is scalable, fault-tolerant and distributed. The open source software was originally developed by the world’s largest Internet companies to capture and analyze the massive amounts of data they generate. Now, manufacturing companies are climbing aboard the big data bandwagon with Hadoop as their chosen architecture. Unlike earlier platforms, Hadoop can store any kind of data in its native format and be used to perform a wide variety of analyses and transformations on that data. Using Dell™ | Cloudera® Apache™ Hadoop® solutions for big data, Dell offers three ways to initiate your journey: deployment of the Dell QuickStart for Cloudera Hadoop packaged solution, exploration of Hadoop software via a Dell Solution Center and on-premises work with a fully functioning Hadoop environment via the Dell Hadoop Pod Loaner Program.
  • Security for Big Data – Security is a key component for all big data projects. All solution designs must encompass performance, access, compliance and security. Security should be defined at all levels of the system implementation and account for both at rest and in-flight data. Big data systems introduce new challenges for security that must be accounted for including data plus data policies and the handling of documents that may contain multiple levels of security. Big data projects are unique and should be carefully crafted and designed beginning with a use case definition and then allowing teams to work with low risk data as the focal point to enable organizations to become comfortable with new technologies as well as to determine how best to ensure the solution can be implemented to conform to corporate security policies. Compliance is an imperative part of the security of data. Strong tools must be deployed as part of any big data solution to ensure that all data access and use can be reported on, and alerts generated for inappropriate data access. As data sets become more complex and more disparate data sets are integrated, ensuring compliance will become more difficult, but can be managed if data is integrated in steps, rather than all at once. One good option for a manufacturer to employ for securing their big data solution is Dell SecureWorks, an information security service organization. Dell SecureWorks helps organizations worldwide protect their IT assets, comply with regulations and reduce security costs. These managed security services clients range from small local manufacturers to global industry leaders.

If you prefer, the complete insideBIGDATA Guide to Manufacturing is available for download in PDF from the insideBIGDATA White Paper Library, courtesy of Dell and Intel.

No comments:

Post a Comment