TSD2022

25th conference on Text Speech Dialogue (TSD)

Brno, Hotel Continental, AE520907

Tutorial 0 :: Speech recognition on the edge :: From Empowerment to voice-shell halt in less than 10 steps

During this tutorial, participants will be introduced to diverse ways how speech-to-text (STT) inferences can be realized on non-cloud, local (i.e. edge-computing) architectures. Participants will acquire knowledge and competence concerning intricacies and nuances of execution of two different types of ASR systems (DeepSpeech and Random Forests) on three different hardware architectures (e.g. RaspberryPiZero (armv6); RaspberryPi 4 (armv7 without CUDA) and NVIDIA Jetson Xavier (armv8 / aarch64 with CUDA). Thus, in 90 minutes of a hands-on tutorial participants will acquire practical know-how about how to transform all three hardware platforms into a low-cost local STT inference engine.

discussion & ...

Hope You liked it and let's stay in touch

Daniel & Hyungjoong 

daniel@udk-berlin.de hjk@udk-berlin.de

@DigiEduBerlin

https://github.com/hromi/lesen-mikroserver

halt

please turn off tutorial raspberrys with a voiceshell command "halt"

scorers, parameters, commands

Let's load some new scorers and see how systems perform.

...exercise for non-nerds

Please create a list of 10-50 tokens relevant to Your domain of interest and mail them to daniel@udk-berlin.de and hjk@udk-berlin.de , we will create You a microscopic language model (scorer) out of it.

Help extending the "VoiceShell" dataset by doing recordings here: https://fibel.digital/22354 (again, login with l:demo-tutorial p: takarthbr )

Feel free not to do anything or leave the room

Time for a little nerdy exercise...

Please connect to Hotspot FibelNet

password: cirrostratus

then try to connect to one two Pi4s we made available for You

ssh demo-tutorial@tutorial0.local

or 

ssh demo-tutorial@tutorial1.local

password: takarthbr

Raspberry Pi Zero

the least efficient (1 - 3 Watts...) universal turing machine out there

armv6

1GHz 512 GB RAM

NVIDIA Jetson Xavier

8-core ARM v8.2 64-bit CPU (aarch64) ; 32 GB RAM; 512-core Volta GPU with Tensor Cores

CUDA-support

every now, NVIDIA releases an debian-based package (i.e. Linux4Tegra, L4t) with all packages You need packed in a so-called "Jetpack" suite

lesen.digital example

Raspberry Pi 4

full-fledged universal turing machine :: 1.5 GHz 64-bit quad core ARM Cortex-A72 processor, on-board 802.11ac Wi-Fi, Bluetooth 5, full gigabit Ethernet, two USB 2.0 ports, two USB 3.0 ports, 1–8 GB of RAM

5volts only; power-consumption between 3-10 Watt

during this tutorial, You will interact with Raspi4 running Raspbian 10 (buster) 32-bit armv7l

TeacherNet example

Speech recognition

Dramatis personae:

https://github.com/mozilla/DeepSpeech

https://github.com/coqui-ai/STT https://coqui.ai/models

https://gitlab.com/Jaco-Assistant/Scribosermo Quartznet model

Connectionist Temporal Classification (CTC) beam search

Tensorflow & Tensorflow Lite

Random forests (treelite)

Edge Computing

Edge computing is a computing paradigm that brings computation and data storage closer to the sources of data. Edge computing concentrates on servers "in proximity to the last mile network".

One Goal



Mündigkeit

Web primer :: https://fibel.digital

Web Primer allows wider public to benefit from our growing collection of open educational resources (OERs) without necessity to build a physical Primer. Frontend is a Progressive Web App, backend is a quite sophisticated "knowledge graph". Artificial Intelligence, speech technologies (notably automatic speech recognition ASR) and so-called audio-text play an important role. Current modules:

👄 lesen (ASR)

👂 hören (multi-voice)

👩🏼‍💻 trainieren and 💯 testen (human-machine peer learning)

🎴 memory (single-player) spielen

Personal Primer

Personal Primer (is a physical, do-it-yourself (DIY) book-like (embooked) Bildunginstrument for fostering of reading skills in younger pupils and informatic skills in older pupils. The idea is simple:

older and/or more expert students strenghten their informatic competences by making the device and fine-tuning acoustic models

younger ones (9-12 yrs.) pupils strenghten their media competence by producing and curating (audiotext) content youngest (6-8 yrs.)

pupils use the device to strenghten their basic literacy (e.g. reading) competence

Digital Primer / fibel.digital

The ultimate aim of the “Digital Primer” (DP) project is development, optimization and deployment of digital education instrument (Bildunginstrument) for fostering of acquisition of basic literacy in primary school pupils. DP has two sub-projects:

a “physical” Personal Primer (π2) branch focuses on design of a post-smartphone open hardware artefact based on “Raspberry Pi Zero” technology.

the “Web Primer” sub-project provides extended functionality in browser

Both sub-projects provide audiotext support, implement human-machine peer learning curricula and use Mozilla’s DeepSpeech acoustic models embelished with our own exercise-specific language models.

Palope

Palope is a Fibel developed by prof. Christa Röber (Germanistik / Pädagogik, Uni Freiburg) and her team

the essence of Fibel is the scaffolding sequence - from simplest syllables to evermore complex structures

in Palope, the trochaic structure of German language is exploited to maximum to facilitate the Einstieg in the world of written letters

good design choices (not phoneme-driven but syllable-driven) many interesting inovations (color coding of different syllable types) and cognitivelly powerful methods (e.g. "Silbentepiche")

and it is a community project ! (e.V., all OERs under Creative Commons etc.)

Time for a little demo

Artificial Intelligence in Education (AIED)

the Primer should not replace the human teacher but assist her (e.g.by keeping track of what individual children know and do not know)

adapt the Primer to the child and not child to the Primer (all our speech recognition models run on our own servers / local hardware and can easily adapt to a concrete pupil or group of pupils)

as the child learns, so does the Primer (we call this "Human Machine Peer Learning")

Achtung, Gefahr: !!! by focusing too much on the technical, AI-related side of things, one may easily fall into trap of poor pedagogical practices !!!

One Goal



Mündigkeit

Web primer :: https://fibel.digital

Web Primer allows wider public to benefit from our growing collection of open educational resources (OERs) without necessity to build a physical Primer. Frontend is a Progressive Web App, backend is a quite sophisticated "knowledge graph". Artificial Intelligence, speech technologies (notably automatic speech recognition ASR) and so-called audio-text play an important role. Current modules:

👄 lesen (ASR)

👂 hören (multi-voice)

👩🏼‍💻 trainieren and 💯 testen (human-machine peer learning)

🎴 memory (single-player) spielen

Personal Primer

Personal Primer (is a physical, do-it-yourself (DIY) book-like (embooked) Bildunginstrument for fostering of reading skills in younger pupils and informatic skills in older pupils. The idea is simple:

older and/or more expert students strenghten their informatic competences by making the device and fine-tuning acoustic models

younger ones (9-12 yrs.) pupils strenghten their media competence by producing and curating (audiotext) content youngest (6-8 yrs.)

pupils use the device to strenghten their basic literacy (e.g. reading) competence