Get desktop application:
View/edit binary Protocol Buffers messages
assignment operators: http://en.cppreference.com/w/cpp/language/operator_assignment
Used in:
a = b
a += b
a -= b
a *= b
a -= b
a &= b
a |= b
a ^= b
Used in:
optional int32 feature_group_id = 3;
Used in:
save w
SAVE_AS_DENSE = 7; // save X * w in a given key range
Used in:
Divide a feature group into feature_block_ratio x nnz_feature_per_example blocks
Use a random order to updating feature block. Turn it off to elimiate the randomness (often for debuging), but may affects the convergence rate.
Updating important feature group at the beginning to get a good initial start point.
Bounded-delay consistency
max number pass of traing data
convergance critiria. stop if the relative objective <= epsilon
features which occurs <= *tail_feature_freq* will be filtered before training
countmin sketch is used to filter the tail features. It has two parameters, k and n.
n = the_first_arrive_key_length * num_workers * countmin_n_ratio
performance
Used in:
Used in:
a node => the scheduler
the scheduler => a node
Used in:
, , ,filenames, supports regular expressions
files stored in hdfs
ignore the feature group information
the maximal number of files will be assigned to a worker, -1 means no limit
the maximal number of lines will be read from a file, -1 means no limit
randomly shuffle the file order
only valid for the binary format
duplicate the file several times
Used in:
see https://github.com/mli/parameter_server/wiki/Data
Used in:
Used in:
Used in:
total number of instances
Used in:
, ,-- key caching -- if the task is done, then clear the cache (to save memory)
-- fixing float filter --
-- nosie --
-- runtime parameters used by the system --
Used in:
Used in:
cache the keys at both sender and receiver
compress data by snappy
convert a float/double into a fixed-point integer
add noise to data
Used in:
HADOOP_HOME
hadoop.job.ugi, format: user,passwd
fs.default.name
time stamp latest heartbeat report the scheduler has ever received from a specified worker/server
recv/sent bytes via zmq
user+sys (percentage)
host's network in/out bandwidth usage (MB/s)
e.g. feature group id
size
number of non-zero entries
Used in:
TODO gpus
Used in:
network address
Used in:
each running application has a single scheduler
a virtual node, present a group of node
a backup node, could turn into another node
Used in:
push or pull
merge operator
it's a replica request
Used in:
gaussian random
Used in:
TODO may change it to string
Used in:
, , , ,Used in:
Used in:
Used in:
Used in:
total number of non-zero elements
total number of non-zero instance, (non-empty rows)
Used in:
Used in:
true: system control task, typically *ctrl* should be set false: a task for a customer, and *customer_id* should be set
true: a request task false: the response task to the request task with the same *time*
the unique id of a customer
the timestamp if this task
the depended tasks of this one. that is, this task is executed only if all tasks from the same node with time contained in *wait_time* are finished. only valid if *request*=true
the key range of this task
namespace of keys
true: the message sent with this task will contain a list of keys
data type
filters applied to the data
if true, the tasks will be packed into a single one during sending
the place to store a small amount of data
system control signals
parameters
Used in:
Used in:
the workload id
the data associated with this workload
randomly shuffle the file order
duplicate the data several times
all finished workload ids
all workload is been done