skip to main content
10.1145/2909437.2909443acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
extended-abstract

OpenCL caffe: Accelerating and enabling a cross platform machine learning framework

Published:19 April 2016Publication History

ABSTRACT

Deep neural networks (DNN) achieved significant breakthrough in vision recognition in 2012 and quickly became the leading machine learning algorithm in Big Data based large scale object recognition applications. The successful deployment of DNN based applications pose challenges for a cross platform software framework that enable multiple user scenarios, including offline model training on HPC clusters and online recognition in embedded environments. Existing DNN frameworks are mostly focused on a closed format CUDA implementations, which is limiting of deploy breadth of DNN hardware systems.

This paper presents OpenCL™ caffe, which targets in transforming the popular CUDA based framework caffe [1] into open standard OpenCL backend. The goal is to enable a heterogeneous platform compatible DNN framework and achieve competitive performance based on OpenCL tool chain. Due to DNN models' high complexity, we use a two-phase strategy. First we introduce the OpenCL porting strategies that guarantee algorithm convergence; then we analyze OpenCL's performance bottlenecks in DNN domain and propose a few optimization techniques including batched manner data layout and multiple command queues to better map the problem size into existing BLAS library, improve hardware resources utilization and boost OpenCL runtime efficiency.

We verify OpenCL caffe's successful offline training and online recognition on both server-end and consumer-end GPUs. Experimental results show that the phase-two's optimized OpenCL caffe achieved a 4.5x speedup without modifying BLAS library. The user can directly run mainstream DNN models and achieves the best performance for a specific processors by choosing the optimal batch number depending on H/W properties and input data size.

References

  1. caffe. http://caffe.berkeleyvision.org.Google ScholarGoogle Scholar
  2. Cifar. http://www.cs.toronto.edu/~kriz/cifar.html.Google ScholarGoogle Scholar
  3. Convnet. https://github.com/sdemyanov/ConvNet.Google ScholarGoogle Scholar
  4. Mxnet. https://github.com/dmlc/mxnet.Google ScholarGoogle Scholar
  5. Tensor flow. http://www.tensorflow.org/tutorials.Google ScholarGoogle Scholar
  6. Threefry random generator. https://www.deshawresearch.com/resources_random123.html.Google ScholarGoogle Scholar
  7. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. page arXiv:1512.03385, 2015.Google ScholarGoogle Scholar
  8. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. page 1097Ű1105, December 2012.Google ScholarGoogle Scholar
  9. e. Olga Russakovsky, Jia Deng*. Imagenet large scale visual recognition challenge. pages arXiv:1409.0575, 2014, 2014.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    IWOCL '16: Proceedings of the 4th International Workshop on OpenCL
    April 2016
    131 pages
    ISBN:9781450343381
    DOI:10.1145/2909437

    Copyright © 2016 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 19 April 2016

    Check for updates

    Qualifiers

    • extended-abstract
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate84of152submissions,55%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader