25th conference on Text Speech Dialogue (TSD)

Brno, Hotel Continental, AE520907

Demo :: Digital Primer v1 :: One Goal, Two Prototypes

The ultimate aim of the “Digital Primer” (DP) project is development, optimization and deployment of digital education instrument (Bildunginstrument) for fostering of acquisition of basic literacy in primary school pupils. DP has two sub-projects: a “physical” Personal Primer (π2) branch focuses on design of a post-smartphone open hardware artefact based on “Raspberry Pi Zero” technology. The “Web Primer” sub-project provides extended functionality in browser. Both sub-projects provide audiotext support, implement human-machine peer learning curricula and use Mozilla’s DeepSpeech acoustic models embelished with our own exercise-specific language models.

One Goal


Two Prototypes

Personal Primer

Personal Primer (is a physical, do-it-yourself (DIY) book-like (embooked) Bildunginstrument for fostering of reading skills in younger pupils and informatic skills in older pupils. The idea is simple:

older and/or more expert students strenghten their informatic competences by making the device and fine-tuning acoustic models

younger ones (9-12 yrs.) pupils strenghten their media competence by producing and curating (audiotext) content youngest (6-8 yrs.)

pupils use the device to strenghten their basic literacy (e.g. reading) competence

Web primer :: https://fibel.digital

Web Primer allows wider public to benefit from our own collection audiotext open educational resources (OERs) without necessity to build an own pi2. Coupling of auditive, graphemic and haptic sensory modalities which enables creation of audiotexts imitates the “finger-pointing” technique used by parents when reading to their pre-school children. Creation of sub-title like audio-textual couplings on sentence-, lexical- or even sub-lexical (i.e. syllabic) level is as simple as moving one’s finger on a touchscreen.

Tutorial 0 :: Speech recognition on the edge :: From Empowerment to voice-shell halt in less than 10 steps

During this tutorial, participants will be introduced to diverse ways how speech-to-text (STT) inferences can be realized on non-cloud, local (i.e. edge-computing) architectures. Participants will acquire knowledge and competence concerning intricacies and nuances of execution of two different types of ASR systems (DeepSpeech and Random Forests) on three different hardware architectures (e.g. RaspberryPiZero (armv6); RaspberryPi 4 (armv7 without CUDA) and NVIDIA Jetson Xavier (armv8 / aarch64 with CUDA). Thus, in 90 minutes of a hands-on tutorial participants will acquire practical know-how about how to transform all three hardware platforms into a low-cost local STT inference engine.

One Goal


Edge Computing

Edge computing is a computing paradigm that brings computation and data storage closer to the sources of data. Edge computing concentrates on servers "in proximity to the last mile network".

Speech recognition

Dramatis personae:


https://github.com/coqui-ai/STT https://coqui.ai/models

https://gitlab.com/Jaco-Assistant/Scribosermo Quartznet model

Connectionist Temporal Classification (CTC) beam search

Tensorflow & Tensorflow Lite

Random forests (treelite)

Raspberry Pi 4

full-fledged universal turing machine :: 1.5 GHz 64-bit quad core ARM Cortex-A72 processor, on-board 802.11ac Wi-Fi, Bluetooth 5, full gigabit Ethernet, two USB 2.0 ports, two USB 3.0 ports, 1–8 GB of RAM

5volts only; power-consumption between 3-10 Watt

during this tutorial, You will interact with Raspi4 running Raspbian 10 (buster) 32-bit armv7l

TeacherNet example

NVIDIA Jetson Xavier

8-core ARM v8.2 64-bit CPU (aarch64) ; 32 GB RAM; 512-core Volta GPU with Tensor Cores


every now, NVIDIA releases an debian-based package (i.e. Linux4Tegra, L4t) with all packages You need packed in a so-called "Jetpack" suite

lesen.digital example

Raspberry Pi Zero

the least efficient (1 - 3 Watts...) universal turing machine out there


1GHz 512 GB RAM

Time for a little nerdy exercise...

Please connect to Hotspot FibelNet

password: cirrostratus

then try to connect to one two Pi4s we made available for You

ssh demo-tutorial@tutorial0.local


ssh demo-tutorial@tutorial1.local

password: takarthbr

...exercise for non-nerds

Please create a list of 10-50 tokens relevant to Your domain of interest and mail them to daniel@udk-berlin.de and hjk@udk-berlin.de , we will create You a microscopic language model (scorer) out of it.

Help extending the "VoiceShell" dataset by doing recordings here: https://fibel.digital/22354 (again, login with l:demo-tutorial p: takarthbr )

Feel free not to do anything or leave the room

scorers, parameters, commands

Let's load some new scorers and see how systems perform.


please turn off tutorial raspberrys with a voiceshell command "halt"

discussion & ...

Hope You liked it and let's stay in touch

Daniel & Hyungjoong 

daniel@udk-berlin.de hjk@udk-berlin.de