TensorFlow-with-dynamic-scaling/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/alibaba/GPU-scheduler-for-deep-learning

Links

Open Source Insights

README ¶

README

This repository contains our customized version of TensorFlow which contains the dynamic scaling mechanism of AntMan paper (OSDI'20).

The modification of TensorFlow is mostly in three components: memory allocator, executor, and interfaces. To enable dynamic universal memory, BFCAllocator is modified to introduce an adjustable upper limit for memory. The memory allocator keeps track of the total bytes of memory allocation and triggers out-of-memory when total bytes exceed the upper limit. A new universal memory allocator, GPUVMemAllocator, is also added to wrap the GPU memory allocator and host memory allocator (i.e., using cudaHostMalloc for memory allocation). When a memory allocation is triggered by the request of a tensor, GPUVMemAllocator tries to allocate the memory using the GPU memory allocator and treats the CPU memory allocator as a backup if there is insufficient GPU memory left over. Note that, the GPUVMemAllocator maintains a set data structure that records the pointers of memory regions allocated by GPU, which is used to classify the memory pointers for de-allocation.

To enable dynamic computation unit scaling, a GpuOpManager with an operator processing queue, which runs in a standalone thread, is introduced in TensorFlow. The operator executor of TensorFlow is modified accordingly to insert GPU operators to GpuOpManager queue in order so as to dedicate the execution of GPU operators to it. GpuOpManager may delay the actual execution of the GPU operators based on a limited percentage of the computation capacity.

The statistics of memory usage patterns and the execution information are aggregated for the local coordinator. The DL frameworks and local coordinator communicate through the file system. They both have a monitor thread to check the file for receiving either job statistics or control signals. To minimize the overhead of memory management, the dynamic scaling of memory is triggered at the mini-batch boundaries (end of session.run()).

`Documentation`

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backwards compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to announce@tensorflow.org. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release:

$ pip install tensorflow

The tensorflow package also includes GPU support on Linux and Windows.

If package size is a concern, CPU-only packages can be installed with:

$ pip install tensorflow-cpu

Nightly binaries are available for testing using the tf-nightly and tf-nightly-gpu packages on PyPi.

Try your first TensorFlow program

$ python

>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

Continuous build status

Official Builds

Build Type	Status	Artifacts
Linux CPU		pypi
Linux GPU		pypi
Linux XLA		TBA
MacOS		pypi
Windows CPU		pypi
Windows GPU		pypi
Android
Raspberry Pi 0 and 1		Py2 Py3
Raspberry Pi 2 and 3		Py2 Py3

Community Supported Builds

Build Type	Status	Artifacts
Linux AMD ROCm GPU Nightly		Nightly
Linux AMD ROCm GPU Stable Release		Release
Linux s390x Nightly		Nightly
Linux s390x CPU Stable Release		Release
Linux ppc64le CPU Nightly		Nightly
Linux ppc64le CPU Stable Release		Release
Linux ppc64le GPU Nightly		Nightly
Linux ppc64le GPU Stable Release		Release
Linux CPU with Intel® MKL-DNN Nightly		Nightly
Linux CPU with Intel® MKL-DNN Supports Python 2.7, 3.4, 3.5, and 3.6		1.13.1 pypi
Red Hat® Enterprise Linux® 7.6 CPU & GPU Python 2.7, 3.6		1.13.1 pypi

Resources

Learn more about the TensorFlow community and how to contribute.

License

Apache License 2.0

Directories ¶

Path	Synopsis
tensorflow
go Package tensorflow is a Go binding to TensorFlow.	Package tensorflow is a Go binding to TensorFlow.
go/genop command Command genop generates a Go source file with functions for TensorFlow ops.	Command genop generates a Go source file with functions for TensorFlow ops.
go/genop/internal Package internal generates Go source code with functions for TensorFlow operations.	Package internal generates Go source code with functions for TensorFlow operations.
go/op Package op defines functions for adding TensorFlow operations to a Graph.	Package op defines functions for adding TensorFlow operations to a Graph.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL