data partition within group by nudles · Pull Request #1 · apache/singa

nudles · 2015-05-16T16:19:28Z

This training scheme partitions one batch data into sub-batches where each worker in the group process one sub-batch. It is implemented by partitioning the layers (except the data and parser layers) of the original neural network into sub-layers where each sub-layer holds a sub-batch of the features. These sub-layers share the same set of parameter objects. There is one worker who is the owner for each parameter object. Each worker owns a partition of the neural network and computes the gradients of parameters over it.

The work flow is,

Each parameter object is initialized by its owner worker. A put request is then sent to the server by the owner worker.
Each worker waits for the fresh parameters and runs the Back-propagation algorithm over its owned partitions (i.e., layers). Once getting the gradients, it send an update message to the main thread (i.e., the stub).
The main thread averages the gradients from all workers for the same parameter and sends the update request the the server.
The main thread handles the responses for the update requests and update the parameter version and data field.

TODO,

optimization for single node case, where memory copy can be avoided by sharing memory between servers and workers.
consider multiple nodes case.
update the autoconf files to delete some files which are merged into other files.

TODO 1. update the performance collection by reporting performance to the stub. 2. let workers pass requests to the stub without copying data (passing addr or param id). messages to servers are then generated by the stub which can aggregate gradients of shared parameters from all workers and collect the updated parameters for them.

…implify the logics. now workers send simple messages to the stub thread which construct the real update/get/put requests. the stub thread also handles the responses from servers. E.g., the get/update response is handled by the stub now. the workers then wait until its param's version is updated in the collect function. avoid deadlocks for param_dealer_ and layer_dealer_ 2. tested data partition in single group in one procs. 3. generate a json file under workspace/visualization representing the neural net structure. users can create an image using the python script (scirpt/graph.py) reading the json file.

check build python package in mac

SINGA-473

changes EXPECT_EQ to EXPECT_NEAR

update tensor.md

Scatter

Update from apache:dev branch

nudles and others added 3 commits May 12, 2015 09:50

update automake

0d47ec5

asfgit merged commit 0d47ec5 into apache:master May 17, 2015

asfgit pushed a commit that referenced this pull request Aug 30, 2016

Merge pull request #1 from aaronwwf/test

2008446

check build python package in mac

nudles pushed a commit that referenced this pull request Aug 9, 2019

Merge pull request #1 from apache/master

9daeb9e

SINGA-473

nudles pushed a commit that referenced this pull request Nov 27, 2019

Merge pull request #1 from chrishkchris/SINGA-496_fix

b872b72

changes EXPECT_EQ to EXPECT_NEAR

joddiy referenced this pull request in joddiy/incubator-singa Jan 14, 2020

Merge pull request #1 from dcslin/tensor-doc-update

b9e5fe5

update tensor.md

nudles pushed a commit that referenced this pull request Aug 12, 2020

Merge pull request #1 from joddiy/scatter

b9b6b8d

Scatter

nudles pushed a commit that referenced this pull request Aug 24, 2020

Merge pull request #1 from apache/dev

2d838ee

Update from apache:dev branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data partition within group#1

data partition within group#1
asfgit merged 3 commits intoapache:masterfrom
nudles:master

nudles commented May 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nudles commented May 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants