High-performance Data Cloud Platform
Application background of high-performance data cloud platform
l High-performance: The capacity of the existing database is large and the query and analysis speed is slow;
l Flexibility: The existing database architecture is not flexible, and it is required to shut down during the maintenance and capacity expansion.
l Low budget: Compared with the traditional database architecture of power+storage, the initial investment is much smaller
l The user is going to construct the data warehouse and data mart.
Key points of solution
The performance ZC data platform (ICCS-DW) of data is developed and researched by taking Greenplum engine as a core, and the share-free/MPP architecture and column storage database is adopted.
The internal compression of database, MapReduce, can realize the capacity expansion without shutdown, and multilevel fault tolerance. It is an OLAP product that specializes in quickly realizing the complex search and analysis of the mass database. It is:
l A data platform integrating with the software and hardware;
l Passed EMC test certitification;
l Loaded with GreenPlum, a parallel calculation engine;
l The standard database function and customizable ETL ability;
l High-speed parallel cloud data processing and loading capacity;
l Unique online capacity expansion and fault handling mechanism.
It can help clients build a virtualized calculation environment for data warehouse, create an autonomous virtualized data warehouse for different data calculation models and tasks, and conduct the centralized management of all kinds of structured and unstructured data with different data volumes. Meanwhile, the parallel architecture of the products also provides the virtualized data warehouse with an extremely high processing speed, which greatly improve the processing efficiency and analysis quality of each analysis model and task of the virtual database.
Characteristics of high-performance cloud data platform (ICCS-DW):
lShare-free/MPP core architecture 。
Data engine evenly distributes all data to all node servers of the system, all nodes store partial rows of each table or table partition, all data loadings and queries are automatically operated in parallel on each node server, and the architecture supports the expansion to tens of thousands of nodes.
l The mixed storage and implementation (according to column or row)
Support the mixed data storage according to the column or row According to the application requirements, the administrator could designate the storage and compression methods of each table or table partition. Based on this function, For any table or table partition, the user can select to store and process data according to the row or column
l PB-level loading capacity
The high-performance parallel and loading function based on MPP Scatter/Gather flow technology. The loading speed is increased linearly with the node, and actually exceed 4TB/hour.
l Index function
Support the index technology of the databases, and the databases stored in the form of row and column support the index.
lClient access and third-party tool support
l Multi-level fault-tolerant capability
l Online system capacity expansion (never shut down)
l Workload Management
l Flexible external data access
l Completely comply with the latest standard of SQL