Data Analytics Acceleration Library

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Data Analytics Acceleration Library
Developer(s) Intel
Initial release August 25, 2015; 2 years ago (2015-08-25)
Stable release
2018 Update 1 / November 17, 2017[1]
Written in C++, Java, Python[2]
Operating system Microsoft Windows, Linux, macOS[2]
Platform Intel Atom, Intel Core, Intel Xeon, Intel Xeon Phi[2]
Type Library or framework
License Apache License 2.0[3]
Website software.intel.com/intel-daal

Intel Data Analytics Acceleration Library (Intel DAAL) is a library of optimized algorithmic building blocks for data analysis stages most commonly associated with solving Big Data problems.[4][5][6][7]

The library supports Intel processors and is available for Windows, Linux and macOS operating systems.[2] The library is designed for use popular data platforms including Hadoop, Spark, R, and Matlab.[4][8]

History

Intel launched the Data Analytics Acceleration Library on August 25, 2015 and called it Intel Data Analytics Acceleration Library 2016 (Intel DAAL 2016).[9] DAAL is bundled with Intel Parallel Studio XE as a commercial product. A standalone version is available commercially or freely,[3][10] the only difference being support and maintenance related.

License

Apache License 2.0

Details

Functional categories

Intel DAAL has the following algorithms:[11][4][12]

  • Analysis
    • Low Order Moments: Includes computing min, max, mean, standard deviation, variance, etc. for a dataset.
    • Quantiles: splitting observations into equal-sized groups defined by quantile orders.
    • Correlation matrix and variance-covariance matrix: A basic tool in understanding statistical dependence among variables. The degree of correlation indicates the tendency of one change to indicate the likely change in another.
    • Cosine distance matrix: Measuring pairwise distance using cosine distance.
    • Correlation distance matrix: Measuring pairwise distance between items using correlation distance.
    • Clustering: Grouping data into unlabeled groups. This is a typical technique used in “unsupervised learning” where there is not established model to rely on. Intel DAAL provides 2 algorithms for clustering: K-Means and “EM for GMM.”
    • Principal Component Analysis (PCA): the most popular algorithm for dimensionality reduction.
    • Association rules mining: Detecting co-occurrence patterns. Commonly known as “shopping basket mining.”
    • Data transformation through matrix decomposition: DAAL provides Cholesky, QR, and SVD decomposition algorithms.
    • Outlier detection: Identifying observations that are abnormally distant from typical distribution of other observations.
  • Training and Prediction
    • Regression
      • Linear regression: The simplest regression method. Fitting a linear equation to model the relationship between dependent variables (things to be predicted) and explanatory variables (things known).
    • Classification: Building a model to assign items into different labeled groups. DAAL provides multiple algorithms in this area, including Naïve Bayes classifier, Support Vector Machine, and multi-class classifiers.
    • Recommendation systems
    • Neural networks

Intel DAAL supported three processing modes:

  • Batch processing: When all data fits in the memory, a function is called to process the data all at once.
  • Online processing (also called Streaming): when all data does not fit in memory. Intel® DAAL can process data chunks individually and combine all partial results at the finalizing stage.
  • Distributed processing: DAAL supports a model similar to MapReduce. Consumers in a cluster process local data (map stage), and then the Producer process collects and combines partial results from Consumers (reduce stage). Intel DAAL offers flexibility in this mode by leaving the communication functions completely to the developer. Developers can choose to use the data movement in a framework such as Hadoop or Spark, or explicitly coding communications most likely with MPI.

References

  1. ^ "Intel® Data Analytics Acceleration Library 2018 Release Notes". 
  2. ^ a b c d Intel® Data Analytics Acceleration Library (Intel® DAAL) | Intel® Software
  3. ^ a b "Open Source Project: Intel Data Analytics Acceleration Library (DAAL)". 
  4. ^ a b c "DAAL github". 
  5. ^ "Intel Updates Developer Toolkit with Data Analytics Acceleration Library". 
  6. ^ "Intel adds big data functions to math libraries". 
  7. ^ "Intel Leverages HPC Core for Analytics Tooling Push". nextplatform.com. 2015-08-25. 
  8. ^ "Try Out Intel DAAL to Process Big Data". 
  9. ^ "Intel Data Analytics Acceleration Library". 
  10. ^ "Community Licensing of Intel Performance Libraries". 
  11. ^ Developer Guide and Reference for Intel(R) Data Analytics Acceleration Library 2017
  12. ^ "Introduction to Intel DAAL, Part 1: Polynomial Regression with Batch Mode Computation". 

External links

  • daal on GitHub
  • DAAL Official Product Website
  • DAAL Support
  • DAAL User Forum
  • DAAL Support Channel
Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_Analytics_Acceleration_Library&oldid=840000092"
This content was retrieved from Wikipedia : http://en.wikipedia.org/wiki/Data_Analytics_Acceleration_Library
This page is based on the copyrighted Wikipedia article "Data Analytics Acceleration Library"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA