package speech.sparrowhawk

Mouse Melon logoGet desktop application:
View/edit binary Protocol Buffers messages

message Abbreviation

semiotic_classes.proto:256

Abbreviation which will be expanded using morphosyntactic features.

Used in: Token

message Cardinal

semiotic_classes.proto:29

A number that should be read as an cardinal, e.g. "one". The input is specified by a string and verbalization via optional morphosyntactic features for gender, case etc.

Used in: Measure, Token

message Connector

semiotic_classes.proto:245

A token that connects two other tokens, such as the : in 1:1 or the x in 4x3.

Used in: Token

message Date

semiotic_classes.proto:156

A date, give in day, month year. 'style' could be used to control, e.g. whether day=4, month=5, year=1997 is said "fourth of may nineteen ninety seven", or "may the fourth nineteen ninety seven" 'era' is for era indicators such as "CE", "BCE", "AD", "CE" etc.

Used in: Token

message Decimal

semiotic_classes.proto:107

A number read out with a decimal point. For example "34.45" is integer_part: "34" fraction: "45" -23.64 is integer_part: "23" fraction: "64" negative: true Optional field for exponent to handle cases like -3.24E29 The quantity field is intended to be used if exponent isn't to represent quantities expressed as words --- e.g. 2.3 billion. This is on analogy of what is done in Money, except that it is really more general than Money, and should be available to other classes that use Decimal.

Used in: Measure, Money, Token

message Electronic

semiotic_classes.proto:224

Electronic items such as URLs, email addresses, etc. The full schema for URLs, which email addresses can effectively be seen as a subset of, is: protocol://username:password@domain:port/path?query_string#fragment_id Hence populating just username and domain will read as an email address.

Used in: Token

message Fraction

semiotic_classes.proto:59

A number that should be read as a fraction, e.g. "three quarters". The input is specified as a separate numerator and denominator.

Used in: Measure, Token

message Grammar

rule_order.proto:44

message LinguisticStructure

items.proto:181

A single utterance's linguistic structure

Used in: Utterance

Used in: Token, Word

message Measure

semiotic_classes.proto:136

A measure, e.g. 6 feet, 9 meters etc. units are the units of the measure e.g. "miles"; definitions of all legal units are in a fixed list in the text norm params. Cardinal is to make it easier to incorporate East Asian counter words as measures. The vast majority of the time one sees these they are after an integer, so this just allows one to avoid the excess baggage of using decimal or fraction markup. More generally for real measures in other languages this could be useful for similar reasons. BTW, The motivation for treating counter words as measures is that in languages that have them, more familiar measures are treated as a subset of counter words, in that one never gets a counter word AND a measure.

Used in: Token

message Money

semiotic_classes.proto:180

An amount of money, eg. $12.50, £15, etc. style could be used to control how it is read. for the example $12.50: 1: "twelve dollars and fifty cents" 2: "twelve dollars fifty" 3: "twelve united states dollars and fifty cents"

Used in: Token

message Ordinal

semiotic_classes.proto:45

A number that should be read as an ordinal, e.g. "first". The input is specified by a string and optional morphosyntactic features for gender, case etc.

Used in: Token

message Rule

rule_order.proto:37

Used in: Grammar

message SparrowhawkConfiguration

sparrowhawk_configuration.proto:21

message Telephone

semiotic_classes.proto:199

A telephone number. NB. There should always be at least one number_part.

Used in: Token

message Time

semiotic_classes.proto:80

A time, given as hours, minutes and seconds. 'style' controls how the time should be spoken. For example "hours=13, minutes=15" could be "one fifteen pm", "thirteen fifteen", "a quarter past one" etc. The styles are defined in the language specific verbalizer. 'zone' contains an optional time zone which is verbalized letter by letter, eg. PST, GMT etc.

Used in: Token

message Token

items.proto:25

Message containing the contents for a single token as determined by the tokenizer. Roughly speaking, a token corresponds to a single verbalizable entity, such as a single word, or single semiotic object such as "$15.60".

Used in: LinguisticStructure

enum Token.PauseLength

items.proto:43

General pause duration lengths.

Used in: Token

enum Token.Type

items.proto:27

Describes the kind of entity this token represents.

Used in: Token

message Utterance

items.proto:195

An utterance

message Word

items.proto:146

A single word

Used in: LinguisticStructure