Smart Home Application Control Via Speech

Smart Home Application Control Via Speech

Chapter 1: Introduction

1.1 Introduction of project

The technology had improved the World's economy, transportation, entertainment and household applications.[3] Imagine a person who is working on the PC at home, is in need of a hot coffee but there is no hot water available for him/her to make coffee. As usual, the person needs to walk to the kitchen and boil water by manually switching the “On” button and walk back to the PC. Or what if the physically disabled, young children and elderly people would like to switch on the lights and fan in a room? It will be the same old method by physically pressing the “On” button on the switch panel which is quite a challenge for the above mentioned group.

With today's technology, there are a few way to switch the button without even needing to walk away from the PC. An example is the remote control which can be used to control the device from a distance, provided there is no obstacle blocking the line of sight of the remote control due to the limitation of IrDa's operating angle.[4]

Another way of controlling the devices and switches is by using speech recognition technology. The “Smart Home Application Control Via Speech” is able to give the user a “hands-free” home application control. This project can save the time taken for the user to move from point A to point B in order to operate the application. Hence, it gives more time and save the effort of moving to and fro for the user. With this speech recognition technology, it will directly improve the quality of human lifestyle.

Basically the “Smart Home Application Control Via Speech” project is based on more towards programming. With the SAPI SDK 5.1[2] developed by Microsoft© incorporated into Visual Basic 8 to perform the speech recognition feature, the user is able to control the device by merely calling out the preset names of each device and giving out simple commands such as “Switch ON”, “Light UP”, “Boil Water” and etc.

1.2 Aim

Ø To develop a PC based speech dependent home application device which is capable of reacting precisely at the user's command.

1.3 Objective

For this project, there are several objectives to be completed such as:-

Ø To research the speech recognition feature, theory, usage and operation.

Ø To research the process of writing programs for Visual C++ 6.0 and communicate the PC with the hardware.

Ø To research how to apply the Speech Application Programming Interface 5.1(SAPI) into Visual C++ 6.0.

Ø To be able to use Standard Parallel Port interfacing between hardware and software.

Ø To be able to understand the basic elements of using Radio Frequency (RF) modules.

Ø To be able to use PIC16F877A with proper programming to fully utilized the functions of the microcontroller.

Ø To gain knowledge on theoretical and practical procedures for the hardware.

Ø To develop a user friendly project.

1.4 Product Specification


v Windows based PC program.

v Wirelessly control home application via voice command.

v Operating range from PC to home application up to 100 meter.

v High accuracy voice recognition.


Transmitter and receiver box = 6(L) X 4(W) X 3(H).

Cable = 2 feet long.

What is in the package

1. Software CD.

2. Transmitter Box.

3. Receiver Box.

4. Cable.

5. Power adapter.

6. User manual.

Chapter 2: Literature Review

2.1 Similar product comparison

2.1.1 Dragon Naturally Speaking™ [5]

Program Specification

* Highly accurate speech reading.

* Able to integrate to Microsoft® Office.

* Capable to work with portable voice recorders.

* Windows based application.

* Require licensing.

* Enable users for customise programming after purchase.

* Web searching function

Dragon Naturally Speaking™ is a software developed by Nuance company based in the United States of America. The product is available in the market and it is used for speech-to-text application such as writing a document by dictating the information. The product have been reviewed to have a 97% accuracy for the speech recognition. The price for the Dragon Naturally Speaking™ is set at a price of USD199. If converted to Malaysian Ringgit it will cost roughly about RM670.

2.1.2 MacSpeech Dictate v1.5.2™ [6]

Program Specification

* Highly accurate speech reading.

* Macintosh based application.

* Using part of Dragon Naturally Speaking™'s speech engine.

* Menu commanding program.

* Capable to work with Words™, Adobe Photoshop™, iPhoto, iChat and many more.

* Require licensing.

MacSpeech Dictate v1.5.2™ is a software developed by MacSpeech Inc. The software is only for Macintosh users as the platform that it is build is a Mac based program. The product is reviewed to have a 98% accuracy for speech recognition. The price for the program is USD249. If converted to Malaysian Ringgit it will cost roughly about RM840.

2.1.3 BlueAnt™ Voice Interface Bluetooth™ headset [7]

Program Specification

* BlueGenie™ Lite voice interface [8]

* Continuous Listening voice trigger.

* Able to let user answer or reject a call with a simple voice with a simple voice command.

* Able to relay sms messages from the phone to user and check headset battery level.

* A “Fully Hands Free” car kit.

* Require Licensing.

BlueAnt™ voice interface for Bluetooth™ headset is a product from Sensory Inc. The product is actually a fully handsfree Bluetooth headset capable of receiving voice command from the user. It is able to relay the incoming messages from the phone, search for phone numbers, call somebody by calling out the names all via speech recognition. The price for the Bluetooth headset is set at USD77. If converted to Malaysian Ringgit it will cost roughly about RM260.


Dragon Naturally Speaking

MacSpeech Dictate

BlueAnt Headset

Smart Home Application Control Via Speech

Accuracy speech reading





Able to interface with devices





Operation base





Speech Recognition Engine

Dragon Naturally Speaking

Dragon Naturally Speaking



Open sourcecode





Price for licensing/product





Table 1: Comparison of available products in the market with the proposed project

2.1.4 Critical Analysis and critics

The table above shows the comparison of the different products available in the market related to voice recognition with the proposed “Smart Home Application Control Via Speech” project. By comparing the specification and features of each product, Dragon Naturally Speaking is one of the best value for money product currently available in the market. The “Smart home application control via speech” will be based on the SAPI 5.1 SDK to perform the speech synthesizing while the other products will be using a different platform developed by Nuance Inc and Sensory Inc. The Dragon Naturally Speaking could be reprogrammed to control numerous devices after purchase. With the costing under consideration, the SAPI 5.1 SDK from Microsoft™ will be a free open source program available for download.[31]

2.2 Component Level Comparison

2.2.1 PIC Microcontroller [9]

Microcontrollers are very common these days because of their capability of controlling general purpose input output on a circuit rather than hooking the whole project to a PC to interface the program. Microcontrollers have limited memory space on each different model. There are several types of 8-bit PIC models in the market with different specification on each model. For this project, the 12F, 16F and 18F model family will be compared as shown below.





ADC converter




Operation Speed




Flash Memory




Input/Output Pins




Pin Count








Table 2: Comparison of the 8-bit family models.

For the “Smart Home Application Control Via Speech”, the PIC 16F877A is chosen as the number of I/O pins for 12F683 is insufficient for the project and the price for the 18F448 is higher compared to 16f877A. The 16F and 18F specs are almost the same except for the operating speed which the 18F is double of the 16F model but cost an additional RM10. With all the rest of the specs are identical, it is more appropriate to choose PIC16F877A to cut the cost of the project.

2.3 Tool Level Research

2.3.1 SAPI SDK developed by Microsoft ©

It was developed by Microsoft© for the newest extension of Windows 95 operating system. SAPI is actually a part of Windows Open Services Architecture (WOSA) model which provides speech recognition and text-to-speech engine for windows based programming software such as VB, C++ or C#. For this project, the Microsoft Speech SDK 5.1, speech recognition (SR) engine will be used to perform all the speech analysis. Alongside with Visual Basic®, the Win 32 Speech API (SAPI) is able to be programmed to analyse the speech command and convert it into data which is used to activate the home application. Speech Recognition Engine [13]

To recognize any human spoken speech, the recognition engine uses four different type of processing. Which are:

Word separation – This process the human speech by detecting the length of the speech and by creating a discreet portion of the human speech.

Word Matching – This is the process where the voice recognition engine compares the preset speech in the system's vocabulary.

Vocabulary – The list of preset speeches that the system engine can identify.

Speaker dependency – The speech engine could only recognize the vocal tones and grammar of the speaker because it depends on the word matching process to compare the entire stored speech pattern. Advantages of SAPI:

Ø The SAPI 5.1 SDK is a free software distributed by Microsoft©. It can be downloaded from their MSDN download page at

Ø Able to interface with devices by using Visual Basic®

Ø Complete tutorials available on the web, reference books and journals.

Ø Using C++ for the programming which is widely used.

Ø High accuracy speech recognition after “training”. Disadvantages of SAPI[14]:

Ø The software requires training prior to use as every individual's voice is different.

Ø The noise level in the environment will affect the accuracy.

Ø If the user had a cold, the voice recognition engine could not be able to recognize the user, thus cause frustration.

Ø Takes time to manually correct the errors and training at the beginning stage.

2.3.2 A comparison of high level language and low level language.

There are 2 level type of programming language that could be used for the project. They are high level language and low level language. The table below shows the advantages or disadvantages of each programming language. [15][16][17]



High Level Language

Example: C language, Pascal, Visual Basic®, Java

Ø Easy to understand, modify and troubleshoot.

Ø Compiler will convert the High Level instruction to Low Level language.

Ø User friendly interface.

Ø Require a compiler to optimize.

Ø Has slower speed when executed.

Ø Limitations to the operating system's memory space.

Ø Unable to fully utilize the processor's architecture.

Low Level Language

Example: Assembly language, machine code

Ø Fully utilized the memory space.

Ø Does not require a compiler to operate

Ø Faster speed when executing.

Ø Higher performance

Ø Difficult to understand the coding.

Ø Difficult to modify and troubleshoot.

Ø A detailed knowledge on the computer's architecture is required.

Table 3: The advantages and disadvantages of High level and Low level language

By comparing the advantages and disadvantages of both level language, the high level language is chosen as it is easier to understand the coding, user friendly and require less time troubleshooting for errors. The “Smart Home Application Control Via Speech” will be using High Level Language programs such as Visual Basic® 6.0 and Visual C++. By using Visual Basic 6.0, the computer is able to send a standard set of communication command which is able to link the PC to the device application via the communication ports.

2.4 Technology Research

2.4.1 Introduction to Biometrics

Biometric technology is used to identify people by their unique physical features. It is currently starting to replace the existing security feature such as ID cards, tag token and “smart card”. [18]

There are two(2) types of biometric:- [19]

Ø Behavioural biometrics

Ø Physical biometrics Behavioural biometric

This type of biometric measures the characteristics which are developed by a person over a time period. It is only generally used for verification purpose. Examples of behavioural biometric are:-

Speaker Recognition – vocal based behaviour analyzing

Signature – Analyze the signature dynamics

Keystroke – Analyze the time spacing of typed words. Physical biometric [23]

This type of biometric measures the physical characteristics of each individual and it is used mostly for identification purpose as the physical features of a person will remain as it is from birth. Here are the examples of the physical biometric type:-

Fingerprint identification – analyze the fingertip pattern of each individual as the fingerprint ridges are formed at the fetal development stage.

Facial Recognition – measures the size and shape of facial characteristics by using relative distance between common landmark to generate a unique “faceprint”.

Iris Scan – analyze the rings, furrows and freckles in the coloured ring.

2.4.2 Comparison of Biometrics[26]


Iris Scan


Facial Recognition


Key Stroke

Speaker Recognition

No sensor contact







High accuracy reading







Cannot be lost or forgotten







Hard to forge














Build up cost







Table 4: Comparison of biometrics


For the “Smart Home Application Control Via Speech” project, the behavioural biometrics' speaker recognition will be used as the project will be based on the speech recognition process. The speech recognition will only be a verification feature to detect the speaker's speech and enables the SAPI SDK speech recognition engine to process the speech received from the user.

2.4.3 Clapping sound recognition

This clapping sound recognition technology is based on the amplitude change in the whole clapping sequence. For example, the device will detect 2 claps when there are 2 high pitched sounds detected. This technology is used majority for interactive games for children.

2.5 The applications and usefulness of speech recognition

In the current world, there are a lot of sound/speech recognition products available in the market. Below are the few examples that shows which sector is getting the benefit of using speech recognition.

Dictation – It is the most widely used application in the world. The most common dictation technology is the language translator dictionary. [27]

Automotive – With the current trend of using GPS(Global Positioning System) devices, the voice recognition feature have started to cave in as one of the must have features for a GPS device. [28]

Telephony – Most of the mobile phones today have a simple voice recognition feature readily available upon purchase. These feature can be make used such as speed dialing by voicing out the person's name which the user intended to make a call. [29]

Medical/Disabilities – For those who are wheelchair bound with physical ability, the voice recognition technology have benefit them as the wheelchair could be operated with just a single word such as “forward”, “left” and etc.

Chapter 3: Methodology

3.1 Project Overview

When the user speaks the command at the microphone, it produces waveforms and further converts into digital waveforms when it goes through the microphone. The SAPI SDK software incorporated with Visual Basic will analyze the spoken command and compared it with the database stored in the system earlier on from “training”. Once the sound is recognized by the system, the program will send data to the microcontroller to acknowledge which electrical water boiler will be activated. From the microcontroller, the signal will then be processed by the microcontroller and will pass through the relay switch. When the microprocessor triggers the relay switch, the electrical water boiler will be switched on without having the user to operate it manually.

3.2 Block Diagram

This project is based on more towards programming rather than hardware. When the user speaks, it produces waveforms and further converts into digital waveforms when it goes through the microphone. Further on, the digital waveform will be analyzed by the speech recognition engine for comparison and produce instruction data for the microcontroller to react according to the program set by the user. The PC and the microcontroller will be connected via parallel port connection. From the microcontroller, the data will be encoded into digital data and will be transmitted through the RF module and further on being decoded back. The signal will then be processed by the microcontroller and will pass through the relay switch. When the microprocessor triggers the relay switch, it will activate the application to boil water.

3.3 Flow Chart

Flow Chart description:

The voice recognition system works when the microphone detects a speech from the user and then sends the waveform to the computer for speech processing. This enables the software in the PC to decide which home application is called and sends pulses to the transmission media and activate the device. For example, a user speaks out “water boiler number 2”, then pauses for 2 seconds and continues “boil”. Instantly after a few seconds, the electronic water boiler boils water without the user touching the device.

3.4 Project Initial Design

3.5 Process sequence of project

Chapter 4: Project management and planning

4.1 Risk Assessment and Management

Problems Encountered


How to troubleshoot

PC breakdown

System failure

Check and maintain PC regularly to avoid downtime.

Software unable to detect voice

Speech recognition could not work

Check microphone connection.

RF Transmitter does not response

Unable to send data to control devices

Check parallel cable connected to PC

Home application does not response when all system is working

Unable to control home application

Check home application microcontroller.

Software unresponsive

Unable to use application

Restart software

Table 5: Risk Assessment and Management

For this project, a simple risk assessment have been constructed to enable a smooth project flow as some minor problems encountered might affect the progress of the project. As soon as the deadline for the project submission has been clearly stated, the project development has to be in sync with the proposed timeframe. The solution stated above is use as a guideline whenever a problem occurs.

4.2 Gantt Chart

4.3 Project Costing

Item Description


Unit Price


Total Price (RM)

PIC microcontroller PIC16F877A




RF Transmitter




RF Receiver




High Quality Microphone




Parallel Port + cable




Electrical Water Boiler




PCB Board








Total Cost


Table 6: Project Costing

With the cost calculated out to construct a single unit, the cost of workmanship/manpower is added to the total cost to produce a fully functional product. A fixed amount of RM4 per hour is calculated for the cost of workmanship for 6 hours per day. The estimated time needed to fully build up a product will take 6 weeks (42 days).

Cost of workmanship/manpower

RM4 X 6 hours X 42 days

= RM 1008

So, in order to complete a fully functional product, we need to add RM336 to the cost of components and accessories:-

Raw Material + Workmanship

= RM470 + RM1008

= RM1478

Chapter 5: Conclusion and references

5.1 Conclusion

The Smart Home Application Control Via Speech could be utilized in every household as it is a smart way of controlling a home application with a “hands free” feature. This speech technology is currently used on various types of technology such as mobile phones, interactive toys, GPS navigator, speech-to-text software and etc.

This project carries out numerous research on voice recognition and speech synthesis process available for use. The process of learning will improve the basic knowledge of speech recognition programs.

This project is aiming to improve the quality of human lifestyle. The finished product will reduce the time taken to walk from the original position to the home application and reduce the risk of handling applications at home for the physically disabled, young children and elderly people.

5.2 Bibliography

Title: The Catalogue

RS Components SDN BHD (387404-M)

Place of publication: Shah Alam, Selangor Darul Ehsan.

Published Date: April 2007.

Title: Cytron Technologies Catalogue

Cytron Technologies SDN BHD (755563-V)

Available from the official web

Title: Programming and customizing the PIC Microcontroller

Written by Myke Predko.

Published on 1998 by The McGraw-Hill Companies Inc.

Title: Advanced PIC Microcontroller Projects in C

Written by Dogan Ibrahim,

Published on 2008 by Elsevier LTD.

5.3 References

[1] Bill Gate's dream

(Accessed on 19 Jan 2010)

[2] SAPI 5.1 SDK

(Accessed on 19 Jan 2010)

[3] Associated Content Article

(Accessed on 23 Jan 2010)

[4] Papyrus Computer Technologies Ltd. How sensitive is infrared to distance and reception angle

(Accessed on 20 Jan 2010)

[5] Nuance – Dragon Naturally Speaking

(Accessed on 21 Jan 2010)

[6] MacSpeech Dictate V1.5.2

(Accessed on 21 Jan 2010)

[7] BlueAnt Bluetooth headset.

(Accessed on 21 Jan 2010)

[8] Sensory Inc. [Accessed on

(Accessed on 24 Jan 2010)

[9] Microchip Technology Inc.

(Accessed on 29 Jan 2010)

[10] Microchip Technology Inc. - PIC 16F877A

(Accessed on 29 Jan 2010)

[11] Microchip Technology Inc. - PIC 12F683

(Accessed on 29 Jan 2010)

[12] Microchip Technology Inc. - PIC 18F448

(Accessed on 29 Jan 2010)

[13] SAPI Architecture

(Accessed on 31 Jan 2010)

[14] Modec Instruments Inc.

(Accessed on 27 Jan 2010)

[15] Free Computer Tips Info – high level language

(Accessed on 28 Jan 2010)

[16] Free Computer Tips Info – low level language

(Accessed on 28 Jan 2010)

[17] Blurt It

(Accessed on 28 Jan 2010)

[18] Eye Controls – Iris technology overview.

(Accessed on 3 Feb 2010)

[19]Quest Biometrics - Biometric Definition: The basics about biometrics

(Accessed on 3 Feb 2010)

[20] English Corner – speaking icon.

(Accessed on 5 Feb 2010)

[21] Magical Florida Wedding – Signature icon.

(Accessed on 5 Feb 2010)

[22] East riding – keystroke icon.

(Accessed on 5 Feb 2010)

[23] Technovelgy – Biometric identification systems

(Accessed on 3 Feb 2010)

[24] The Fingerprint

(Accessed on 5 Feb 2010)

[25] Quest Biometrics – Using facial traits to identify people

(Accessed on 3 Feb 2010)
[26]Eye Controls – Technology review chart

(Accessed on 6 Feb 2010)
[27] Universal Translator language dictionary

(Accessed on 9 Feb 2010)
[28]GPS technology review – Speech recognition GPS Navigation

(Accessed on 9 Feb 2010)

[29] Ericsson – Speech recognition technology for mobile phones

(Accessed on 9 Feb 2010)

[30]Katalavox – Voice activated power wheelchair

(Accessed on 9 Feb 2010)

[31] Microsoft Corp – SAPI download

(Accessed on 28 Jan 2010)


Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!