Appen logoHome

Corpora and Language Materials
Off-the-shelf Speech and Language Products
Speech and Language Data Collection Services
Transcription Services
Lexicon Development
Speech Application Development and Tuning Services
Other Speech and Language Technology Products and Services

Send us your questions or comments.

Your name:

Your email address:

Questions or Comments:

  

Off-the-shelf Speech and Language Products

Appen has a number of speech technology resources available for license. Brief descriptions and/or samples of many of the Appen databases, lexicons and grammars currently available are listed below. Please use the feedback form at left to contact us for more information about the listed offerings or to find out more about other speech and language resources that are currently in development or may be planned for the near future.

Appen
Downloadable Appen brochure.
Appen Brochure (462KB / pdf)

Australian English (1) - Telephony ASR Database
This is a 500 speaker/82,500 speaker telephony database (mobile and fixed line).
AUS_ASR001.txt (0KB / txt)

Australian English (2) - Telephony ASR Database
This is a 1,000 speaker/75,000 utterance telephony database (mixture of mobile and fixed).
AUS_ASR002.txt (1KB / txt)

Italian - In-Car Microphone recorded ASR database
This is a 103 speaker/35,875 utterance In-Car database.
ITA_ASR002.txt (0KB / txt)

Canadian English - Telephony ASR Database
This is a 1,000 speaker/100,000 utterance Canadian English telephony database (100% mobile).
ENC_ASR001.txt (1KB / txt)

Canadian French - Telephony ASR Database
This is a 1,000 speaker/100,000 utterance Canadian French telephony database (100% mobile)
FRC_ASR001.txt (1KB / txt)

Colloquial Gulf Arabic - Speech Database
150 Speakers (75 UAE, 75 Saudi Arabia), 35 minutes of audio per speaker per channel, 4 channels (1 headset, 3 mid-distance including 1 array mic), 280 utterances per speaker. See CGA001.txt below for further information.
CGA_ASR001.zip (0KB / zip)

Italian - Speech Database
200 Speakers, 4 channels (1 headset, 3 mid-distance including 1 array mic), 200 utterances per speaker. See ITA_ASR001.txt below for further information.
ITA_ASR001.txt (0KB / txt)

Spanish - Speech Database
200 Speakers, 4 channels (1 headset, 3 mid-distance including 1 array mic), 200 utterances per speaker. See ESP_ASR001.txt below for further information.
ESP_ASR001.txt (0KB / txt)

Lexicons
Lexicons are available for Arabic, Bahasa Indonesia, Dutch, French, German, Italian, Korean, Portuguese, Spanish, and a number of varieties of English. Australian and British English lexicons include extensive place name coverage. For further information, see AppenLexiconHoldings.pdf below.
AppenOffTheShelfLexicons.pdf (0KB / pdf)

Spanish - TTS Speech Database
Male speaker, studio recording, 1787 unique phonetically rich sentences, legal triphone coverage.
ESP_TTS001.txt (0KB / txt)

Italian - TTS Speech Database
Male speaker, studio recording, 3300 unique phonetically rich sentences, legal triphone coverage.
ITA_TTS001.txt (0KB / txt)