Sign-Speak API Improve your accessibility like never before with the Sign-Speak API ## Sections • [Welcome](https://app.theneo.io/sign-speak/sign-speak-api/offerings.md): Welcome to Sign-Speak's API and SDK docs! Sign-Speak provides Sign Language Recognition and Avatar technology with just one API call! Our models have high quantitative benchmarks (BLEU=55.1), and our avatars have similarly been rated by the Deaf community as understandable and accurate. In addition to our Sign Language offerings, we additionally provide a suite of other services (such as text-to-speech and speech-to-text) at similarly high levels of accuracy. Sign-Speak is your one-stop shop for building services for anyone in the Deaf and Hard of Hearing space! If you are interested, you can sign up at our development portal . If you have any questions, reach out to us at management@sign-speak.com . Offerings Sign-Speak's services are available in four levels of granularity: Little customization requirements - Sign-Speak's Automated Interpreting System (AIS): Access our all-inclusive platform that leverages AI and human interpreters to provide 24/7 access to your customers. Provide more access for less. Moderate Customization Requirements - SDK and Sign-Speak Component Library: Easily request and surface results via the Sign-Speak Component Library. You can customize the elements via tailwind. This is currently only provided in React, but we are looking at expanding. High Customization Requirements - SDK Access: Access the Sign-Speak API via the SDK in select languages. This allows you to easily submit translation requests and get back the result easily. Highest Customization Requirements - API Access: Access the Sign-Speak API from any programming language via a RESTful and Websockets interface. This provides you with the most flexibility, though may require more code. Note that customization requirements is tied to how much customization you need from an off-the-shelf solution. We recommend you to start by trying our AIS system to see our API in action. Request Types Sign-Speak's API is setup to be flexible to your use case. There are various endpoints to interact with the system in a realtime or batch manner. Requests are either inference-based requests (e.g. sign language recognition, sign language avatar), in which Sign-Speak's AI is used, or administrative requests (this is when no AI is engaged, such as for key management). All administrative requests operate via a RESTful API. However, inference requests fall into three types: BLOCKING (RESTful): BLOCKING requests attempt to execute a request synchronously. These requests will keep the connection open and blocked for 30 seconds. If our AI models finish processing your request by that time, the request will be returned synchronously. In a vast majority of cases, this occurs. However, if your request takes longer than 30 seconds, it will undergo a batch-downgrade. In batch-downgrades, the request will return a request_id (under a HTTP code of 202 ). This request will then become a BATCH request (and can be querried as such). BATCH (RESTful): BATCH requests are useful for fire-and-forget jobs, or jobs in batch pipelines. Batch requests return a request_id that identify that request. You can then query via a GET to see if that request has been completed (and if so, what the result is). BATCH requests may be longrunning. Websocket request: Websocket requests are the preferred mechanism to interact with our system when integrating into realtime applications. Websocket requests process the request while a video is being recorded . This allows us to up-front some of the processing time prior to the user pressing the stop recording button (allowing for lower latency). Payment Mechanism Inference API requests are billed at a per-second rate (per-second of input). This per-second rate can be viewed by logging into your portal. Note that the per-second rate does not accumulate or charge you until after your trial period . Requests are additionally rate-limited and quota-limited. Account-wide quota: Your account has a monthly quota which dictates the maximum amount of minutes processed. We automatically set this behind the scenes and adjust it to ensure flexibility, and prevent abuse. You can view this via the administrative RESTful API. Reach out to use if you'd like this to be increased. Key-level quota: You can set quotas per key to allow for individual key management. You can manage this via the administrative RESTful API. Account-wide rate-limits: We automatically impose reasonable rate-limits to each account to prevent abuse. These rate-limits restrict the number of requests you can issue in a minute time-frame. Reach out to use if you'd like this to be increased. Keys Our API access are built around API keys . You should be able to see your APIs key by logging into your development portal . Our API keys come in three flavors: PRIVATE Keys: These keys can manage your entire account. For example, they can mint new keys, set quotas on keys, delete and promote keys, and view usage. Private keys can additionally view requests and results from all keys, as well as using the inference APIs. Only use these keys on the server. Never deploy a private key to your front end code. ROOT key: These are a special type of private key which cannot be delted. Only use this key to mint other keys . This key exists to ensure that your account always has an API key. PUBLIC Keys: Public keys can only execute inference-based requests, and query their own requests (during BATCH mode). Each PUBLIC key is sandboxed from eachother. Limited offerings Sign-Speak has several features which it only offers to partners and clients on a case-by-case basis. Personalized Avatar: Our avatar has the ability to mimic a person's likeness dynamically. Streaming Mode: When streaming mode is enabled and using websockets, our model will produce Interim Results. Zero-Input Mode: When using zero-input mode, our models do not require the user to press a “start” or “stop” button. Rather, they can continuously sign into the model and it will predict. Fine-Tuning: Fine-tune our general models to match your use-case. Note that this is unnessasary in most everyday use cases, but can be helpful if you have unique terminology. Avatar Signing Style: Allow your users to set the avatar's signing style between SEE, PSE, ASL, Classifiers, and more! Welcome to Sign-Speak's API and SDK docs! Sign-Speak provides Sign Language Recognition and Avatar technology with just one API call! Our models have high quantitative benchmarks (BLEU=55.1), and our avatars have similarly been rated by the Deaf community as understandable and accurate. In addition to our Sign Language offerings, we additionally provide a suite of other services (such as text-to-speech and speech-to-text) at similarly high levels of accuracy. Sign-Speak is your one-stop shop for building services for anyone in the Deaf and Hard of Hearing space! If you are interested, you can sign up at our development portal . If you have any questions, reach out to us at management@sign-speak.com . Offerings Sign-Speak's services are available in four levels of granularity: Little customization requirements - Sign-Speak's Automated Interpreting System (AIS): Access our all-inclusive platform that leverages AI and human interpreters to provide 24/7 access to your customers. Provide more access for less. Moderate Customization Requirements - SDK and Sign-Speak Component Library: Easily request and surface results via the Sign-Speak Component Library. You can customize the elements via tailwind. This is currently only provided in React, but we are looking at expanding. High Customization Requirements - SDK Access: Access the Sign-Speak API via the SDK in select languages. This allows you to easily submit translation requests and get back the result easily. Highest Customization Requirements - API Access: Access the Sign-Speak API from any programming language via a RESTful and Websockets interface. This provides you with the most flexibility, though may require more code. Note that customization requirements is tied to how much customization you need from an off-the-shelf solution. We recommend you to start by trying our AIS system to see our API in action. Request Types Sign-Speak's API is setup to be flexible to your use case. There are various endpoints to interact with the system in a realtime or batch manner. Requests are either inference-based requests (e.g. sign language recognition, sign language avatar), in which Sign-Speak's AI is used, or administrative requests (this is when no AI is engaged, such as for key management). All administrative requests operate via a RESTful API. However, inference requests fall into three types: BLOCKING (RESTful): BLOCKING requests attempt to execute a request synchronously. These requests will keep the connection open and blocked for 30 seconds. If our AI models finish processing your request by that time, the request will be returned synchronously. In a vast majority of cases, this occurs. However, if your request takes longer than 30 seconds, it will undergo a batch-downgrade. In batch-downgrades, the request will return a request_id (under a HTTP code of 202 ). This request will then become a BATCH request (and can be querried as such). BATCH (RESTful): BATCH requests are useful for fire-and-forget jobs, or jobs in batch pipelines. Batch requests return a request_id that identify that request. You can then query via a GET to see if that request has been completed (and if so, what the result is). BATCH requests may be longrunning. Websocket request: Websocket requests are the preferred mechanism to interact with our system when integrating into realtime applications. Websocket requests process the request while a video is being recorded . This allows us to up-front some of the processing time prior to the user pressing the stop recording button (allowing for lower latency). Payment Mechanism Inference API requests are billed at a per-second rate (per-second of input). This per-second rate can be viewed by logging into your portal. Note that the per-second rate does not accumulate or charge you until after your trial period . Requests are additionally rate-limited and quota-limited. Account-wide quota: Your account has a monthly quota which dictates the maximum amount of minutes processed. We automatically set this behind the scenes and adjust it to ensure flexibility, and prevent abuse. You can view this via the administrative RESTful API. Reach out to use if you'd like this to be increased. Key-level quota: You can set quotas per key to allow for individual key management. You can manage this via the administrative RESTful API. Account-wide rate-limits: We automatically impose reasonable rate-limits to each account to prevent abuse. These rate-limits restrict the number of requests you can issue in a minute time-frame. Reach out to use if you'd like this to be increased. Keys Our API access are built around API keys . You should be able to see your APIs key by logging into your development portal . Our API keys come in three flavors: PRIVATE Keys: These keys can manage your entire account. For example, they can mint new keys, set quotas on keys, delete and promote keys, and view usage. Private keys can additionally view requests and results from all keys, as well as using the inference APIs. Only use these keys on the server. Never deploy a private key to your front end code. ROOT key: These are a special type of private key which cannot be delted. Only use this key to mint other keys . This key exists to ensure that your account always has an API key. PUBLIC Keys: Public keys can only execute inference-base d requests, and query their own requests (during BATCH mode). Each PUBLIC key is sandboxed from eachother. Need help testing? We can help. We understand that you may need to test your integration without prior ASL knowledge. To help you get started quickly, we've provided some sample videos available in the “Sample Sentences” section below. • [API Specifications](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications.md): This section documents our RESTful and Websockets API. Note that there are some undocumentd features (in particular surrounding the “Limited offerings” listed above. • [RESTful ASL Recognition](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/asl-recognition.md): The American Sign Language (ASL) Recognition section of the Sign-Speak API allows users to submit videos for ASL recognition. For best results Use conversational ASL: Frequently, learning (or ASL instructional) videos employ slow signing or not frequently used conversational constructions. We recommend using conversational ASL signed at a normal speed. Our models have been trained on thousands of Deaf signers, and thus work best on those who's primary language is ASL. Our models operate best when individual's use their normal signing style rather than trying to sign slow or “code switch” (in fact, signing slowly, or trying to “over-enunciate signs” decreases accuracy). Proper positioning: Try to position your users as shown below. For best accuracy, have users sign at about this distance, and use a landscape video. Our models can ingest all types of videos (including one-handed signing on vertical videos!), but these do the best. Start Small: Start with one to two sentences before graduating up to more complex samples. This will ensure that your setup is good before moving to more complex input. Use Standard Signs: Our model does not currently work well with personal sign names. Our model has, however, been trained on a wide variety of dialects and regional signs across America. This guidance extends to all sign language recognition use. Available Models: We currently provide models of varying complexity of models. The modeling version follows SLR.[model version].[model size]. These models include SLR.2.xs, SLR.2.sm, SLR.2.md, and SLR.3.lg. Note that SLR.2.md and SLR.3.lg are currently only offered to select customers. Additional Pointers and notes: For the RESTful API, the video MUST be base64encoded. We do not currently support uploading links or such. Our websocket API allows for raw binary. Our model can be configured to output caption regions (this is similar to the concept of a single subtitle within an SRT file) . These correspond to each individual sentence the user signed, and presents a start and stop timecode, a prediction, and a log-confidence. Our models output confidence scores in a log base-e format. To recover the 0-1 log, simply compute EXP(confidence) . You can provide feedback to our models to improve (or alternatively personalize your models depending on what featureset you have enabled). • [Retrieve Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/asl-recognition/retrieve-batch-result.md): Retrieves The Result Of A Batch Recognition Request Using The Provided Id. • [Delete Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/asl-recognition/delete-batch-result.md): The Delete Batch Result API section allows users to remove the result of a BATCH Recognition Request by providing the corresponding ID. • [RESTful English Speech Recognition](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/english-speech-recognition.md): English Speech Recognition is a section of the Sign-Speak API that allows users to perform speech recognition on English audio files. Note that our speech recognition models have been tuned to be used in real-world scenerios. Currently, our models filter for near-filed voices specifically (so that they do not capture background noise). • [Retrieve Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/english-speech-recognition/retrieve-batch-result.md): Retrieves The Result Of A Batch Recognition Request For English Speech Using The Provided Id. • [Delete Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/english-speech-recognition/delete-batch-result.md): The Delete Batch Result section allows users to remove the result of a batch recognition request for English speech using a provided ID. • [RESTful ASL Production](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/asl-production.md): Our ASL avatars look just like a human (see below). You can use these avatars to sign a message in ASL. These models have been widely rated by the Deaf community as accurate. We recommend you start with our MALE avatar, and primarily use grammatically correct sentences. Note that as well as raw text, you can also input time-coded text (e.g. from an SRT) if you'd like to align the signing with some external source (such as a video). Make sure you only either provide English or the Timestamped English. Additionally, note that we have different avatar models providing different quality of signs. The modeling naming follows SLP.[model version].[model size]. These models include SLP.2.xs, SLP.2.sm, SLP.2.md, and SLP.3.lg. Note that SLP.2.md and SLP.3.lg are currently only available to select partners due to capacity. • [Retrieve Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/asl-production/retrieve-batch-result.md): Retrieves The Result Of A Batch Production Request In Asl Using The Provided Id. • [Delete Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/asl-production/delete-batch-result.md): The Delete Batch Result section of the Sign-Speak API allows users to remove the result of a batch production request in ASL using the provided ID. This action enables users to manage and maintain their batch production requests by deleting specific results as needed. • [RESTful English Speech Generation](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/english-speech-generation.md): The English Speech Generation section of the Sign-Speak API enables users to convert text into spoken English. By utilizing this functionality, users can generate MP3 audio files of the spoken text. We use neural voices to sound lifelike. Note that you can only either provide the english or the timestampped english. • [Retrieve Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/english-speech-generation/retrieve-batch-result.md): Retrieves The Result Of A Batch Production Request For English Speech Using The Provided Id. • [Delete Batch Result](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/english-speech-generation/delete-batch-result.md): The Delete Batch Result section of the Sign-Speak API allows users to delete the result of a batch production request for English speech using a provided ID. • [WebSockets Sign Language Recognition](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/websockets-sign-language-recognition.md): Performing WebSockets based Sign Language Recognition involves several steps. We recommend using WebSockets within a Javascript context as illustrated below: Start a video stream. JavaScript let options: MediaRecorderOptions; let isChromeBrowser = navigator.userAgent.includes('chrome') && !navigator.userAgent.includes('edge') && !navigator.userAgent.includes('opr') && !navigator.userAgent.includes('crios'); if (isChromeBrowser) { options = { bitsPerSecond: 1000000, mimeType: "video/webm;codecs=VP9" }; } else { options = { bitsPerSecond: 1000000 }; } let mediaRecorder = new MediaRecorder(mediaStream, options); 2. Initiate a socket connection JavaScript let configPacket = { api_key: “API_KEY”, slice_length: 500, // milliseconds single_recognition_mode: false, … // all other configuration details you would like to specify } socket = new WebSocket("wss://api.sign-speak.com/stream-recognize-sign"); await new Promise((resolve, reject) => { socket.addEventListener("open", () => { socket.send(JSON.stringify(configPacket)); resolve(null) }) }) 3. Setup the socket Listener JavaScript socket.addEventListener("message", (event) => { // Parse JSON let data; try { data = JSON.parse(event.data); } catch (err) { console.warn("Received non-JSON from recognition socket:", event.data); return; } if (data === "OUT_OF_QUOTA") { // this means you have no more quota } for (const pred of data.prediction || []) { const { finished, prediction, confidence, feedback_id } = pred; } }) 4. Send video data (ideally as it is being captured) JavaScript mediaRecorder.ondataavailable = (e) => { const blob = new Blob([e.data], { type: "video/mp4" }); let reader = new FileReader(); reader.readAsArrayBuffer(blob); reader.onloadend = () => { socket.send(reader.result as ArrayBuffer); } }; mediaRecorder.start(500); // this 500 must match the slice_length 5. When the user presses the stop button, send the done signal Plain text socket.send("NEXT"); // or socket.send("DONE") You can send DONE if you'd like to close the websocket after receiving the output. If you send NEXT , you will be able to start sending another set of inputs corresponding to another session to the same websocket. • [Feedback](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/feedback.md): The Feedback section of the Sign-Speak API allows users to provide feedback for a prediction made by the API. With various feedback mechanisms available, users have the flexibility to provide feedback on specific parts or the entire recognition. This section enables users to improve the accuracy and performance of the Sign-Speak API based on their input. Different feedback mechanisms are available. • [Private API Requests](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests.md): The Private API Requests section of the Sign-Speak API enables users to securely authenticate their requests and access protected resources. By generating a private API key, users can ensure the confidentiality and integrity of their data while interacting with the API. • [Retrieve Usage](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests/retrieve-usage.md): The Retrieve Usage section of the Sign-Speak API enables users to access detailed information about their API usage, including per-key usage and global quota limits. This data allows users to monitor and manage their API usage, providing insights into the maximum allowed requests for sign language recognition, sign language production, English recognition, and English production, as well as the total usage within the current quota period. Note that reported and set limits are in seconds. Note that the quota period is a rolling 30 day window. This includes all keys (including deactivated keys) • [Key Management](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests/key-1.md): The Key section of the Sign-Speak API allows users to manage and access authentication keys for interacting with the API. By using this section, users can generate, retrieve, update, and delete keys, enabling secure and authorized access to the Sign-Speak API's functionality. • [Create a new key](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests/key-1/create-a-new-key.md): This section enables users to create a new key for accessing the Sign-Speak API. By configuring the type of key and setting quota limits, users can control the maximum number of sign language recognition and production requests, as well as English recognition and production requests, that can be made using the key. Users also have the option to define a custom quota for their keys. • [List Keys](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests/key-1/list-keys.md): The "List Keys" section of the Sign-Speak API provides users with access to information about their keys. This only lists out keys in use • [Retrieve a specific key](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests/key-1/retrieve-a-specific-key.md): The "Retrieve a specific key" section of the Sign-Speak API enables users to retrieve detailed information about a specific API key. Users can view the type of key and the maximum allowed limits for sign language recognition, sign language production, English recognition, and English production requests. Additionally, users can access any custom quotas associated with the key. • [Update an existing key](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests/key-1/update-an-existing-key.md): This section enables users to update an existing key in the Sign-Speak API by modifying various parameters. By adjusting the limits for sign language recognition and production requests, as well as English recognition and production requests, users can customize the usage restrictions for the API key. • [Delete a specific key](https://app.theneo.io/sign-speak/sign-speak-api/api-specifications/private-api-requests/key-1/delete-a-specific-key.md): The "Delete a specific key" section of the Sign-Speak API allows users to securely remove a specific API key from their Sign-Speak projects. By utilizing this functionality, users can effectively manage their API keys and ensure access control. Simply provide the unique identifier associated with the key to delete it. Note that the usage data and usage of the key will be retained (the key will be merely deactivated). This is permanent. • [SDK Specifications](https://app.theneo.io/sign-speak/sign-speak-api/sdk-specifications.md): The Sign-Speak JavaScript SDK enables seamless integration of advanced American Sign Language (ASL) and speech recognition technologies into your applications, promoting inclusive user experiences. Install the SDK using NPM or Yarn Plain text npm install @sign-speak/react-sdk # -or- yarn add @sign-speak/react-sdk Then, set your API key, either by setting the env variable SIGN_SPEAK_API_KEY or via javascript Plain text import { setKey } from'sign-speak-sdk/network/key'; // Set the API Key setKey('YOUR_API_KEY'); You can view more information on the SDK at NPM , so here, we will only briefly go through the various aspects of the SDK. Our SDK exposes three levels of access: Network Calls - You can use the network calls component of the SDK to directly invoke various functionality of the API. Use this when you want the most control over the functionality. Example: Plain text import { produceSign } from'sign-speak-sdk/network/rest'; const response = await produceSign({ english: "Welcome to Sign-Speak!" }, {, model: "MALE", request_class: "BATCH" }); React Hooks - You can use the react hooks in a large number of use cases when you want to manage the display of the components, but want to delegate the actual state management to the SDK. Plain text import React, { useEffect } from 'react'; import { useSignLanguageRecognition } from 'sign-speak-sdk'; function ASLHookExample() { const { prediction, startRecognition, stopRecognition } = useSignLanguageRecognition(); useEffect(() => { startRecognition(); return () => stopRecognition(); }, []); return ASL Output: {prediction} ; } Component Library - The component library allows for one-line integration. Our component library follows the model of ShadCN, in that we recommend you to copy the actual component source code into your project and use it as a starting point. You do have limited customization options within the actual component library if you wish Plain text import { SignRecognition } from 'sign-speak-sdk'; const MyASLRecognition = () => ( ASL Recognition ); You can easily mix-and-match these to get your desired level of customization. As a quick example, below is an example of how to use the Sign Language Recognition and Sign Language Production components: Plain text import { useEffect, useState } from "react"; import { RecognitionResult, setKey, SignProduction, SignRecognition, SpeechProduction } from "@sign-speak/react-sdk" export default function Demo({ apiKey }: { apiKey: string }) { const [directionToASL, setDirectionToASL] = useState(false) const [engText, setEngText] = useState("Nice to meet you.") const [submittedEngText, setSubmittedEngText] = useState("Nice to meet you.") const [recognizedASL, setRecognizedASL] = useState("") const [show, setShow] = useState(false) useEffect(() => { setKey(apiKey) setShow(true) }, [apiKey]) const parseSLRResult = (result: RecognitionResult | null) => { if (result) { setRecognizedASL(result.prediction.filter(x => x.confidence > Math.log(0.5)).map(x => x.prediction).join(" ")) } } if (!show) { return null } return Translation Demo setDirectionToASL(!directionToASL)} className="btn bg-base-100">{ directionToASL ? <> English to ASL : <> ASL to English } { directionToASL ? <> English setEngText(e.target.value)} placeholder="Enter Text in English" className="bg-base-200 rounded-lg mt-4 p-2 textarea resize-none" /> setSubmittedEngText(engText)} className="btn btn-primary mt-2" > Translate ASL Translation : <> ASL English Translation {recognizedASL} } } • [Sample Sentences](https://app.theneo.io/sign-speak/sign-speak-api/sample-sentences.md): We've provided sample sentences, including easy, medium, and advanced examples, for your reference: https://drive.google.com/drive/folders/17_FPHjlYEye-0pmKA9HebckynAaXhCyj?usp=sharing