[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [pygame] Voice Text To Speech and Wav File Make and Save



Hi Ian

    I could try it but would be looking at the the key-up. For you will note
in my old event key char, or OnKey function I was looking ahead. That is OK
and it worked with the elif list, but the textctrl edit field can have
pointers that do not point to the actual text position.
    When reading on the subject, studying what methods textctrl had, I chose
to use them instead for an easier, less code check. It also needed the
key-up event so the pointer moves after releasing the key and by the time I
catch the event the pointer is where it is to be and I just get the pointers
and I am done.

    So, I added the onkey_up function to do just that. The function is much
smaller and less time consuming except the check for word forward and back.
There I needed to go and fetch all within a word to say the word.
    By doing this I use the key code to decide, which in all cases it is
just the point in the line. Where the entire line read for up and down
requires only to know the position and translating that using the built in
function to convert to line and col, then the the entire line is loaded with
another built in function and I get my line.

    The nice part about this is that I do check for end of line chars, which
may vary with platform, In other words, single cr command, 13, for all
others are lf, or 10 for line feed.
    When even going word forward and back it actually announces the end of
line because it stops there and makes it all nicer.

    Anyway, thanks for the input, for trying to understand all the built in
functions can be a little bit time consuming. But the more I write the
easier it gets.

    I am using the wxpython for the buttons and frames are standard format,
and my screen reader reads them as if it were a regular window. Now games
are a different thing, for the people I write it to, the screen images is
not that important.
    The Pygame window is seen by screen readers a just a single line just
below the title bar.

    I have added another engine to my speech and it also uses the Sapi 4
voices and will load up to 28 voices. For your Vista to read it I think you
have to fool Vista. But that has to be done before loading the speech
engines. Sapi 4 and 5 do work together but only in XP, not Vista.

        Bruce

Ian wrote:

I use pygame.key.get_pressed().  I usually use the state of the input
in question rather than deal with events.  My generic input function
looks like:

def GetInput():
    key = pygame.key.get_pressed()
    for event in pygame.event.get():
        if event.type == QUIT or (event.type == KEYDOWN and event.key
== K_ESCAPE):
            pygame.quit(); sys.exit()

For your purposes, I now add:

        if event.type == KEYDOWN:
            text.append(event.key.name)
    mpress = pygame.mouse.get_pressed()
    mpos = pygame.mouse.get_pos()

All untested (on vaction without a laptop); hope it works!

Good luck,

Ian
#DRIVERS FOR SAPI 5 AND VOICES!
#NOTE THE CONSTANTS AND IN THE SPEAK FUNCTION AND THE ADDING/OR OF THE VALUES.
from comtypes.client import CreateObject
import _winreg

class constants4tts:
    Wait = -1
    Sync = 0
    Async = 1
    Purge = 2
    Is_filename = 4
    XML = 8
    Not_XML = 16
    Persist = 32
    Punc = 64

class SynthDriver():
    name="sapi5"
    description="Microsoft Speech API version 5 (sapi.SPVoice)"
    _voice = 0
    _pitch = 0
    _voices = []
    _wait = -1 #WAIT INDEFINITELY
    _sync = 0 #WAIT UNTIL SPEECH IS DONE.
    _async = 1 #DO NOT WAIT FOR SPEECH
    _purge = 2 #CLEAR SPEAKING BUFFER
    _is_filename = 4 #OPEN WAV FILE TO SPEAK OR SAVE TO WAV FILE
    _xml = 8 #XML COMMANDS, PRONUNCIATION AND GRAMMER.
    _not_xml = 16 #NO XML COMMANDS
    _persist_xml = 32 #Changes made in one speak command persist to other calls to Speak.
    _punc = 64 #PRONOUNCE ALL PUNCTUATION!
    def check(self):
        try:
            r=_winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT,"SAPI.SPVoice")
            r.Close()
            return True
        except:
            return False
#INITIALIZE ENGINE!
    def init(self):
        try:
            self.tts = CreateObject( 'sapi.SPVoice')
            self._voice=0
            self._voiceCount = len(self.tts.GetVoices())
            for v in range(self._voiceCount):
                self._voices.append( self.tts.GetVoices()[v])
            return True
        except:
            return False
#TERMINATE INSTANCE OF ENGINE!
    def terminate(self):
        del self.tts
#NUMBER OF VOICES FOR ENGINE!
    def getVoiceCount(self):
        return len(self.tts.GetVoices())
#NAME OF A VOICE BY NUMBER!
    def getVoiceNameByNum(self, num):
        return self.tts.GetVoices()[ num].GetDescription()
#NAME OF A VOICE!
    def getVoiceName(self):
        return self.tts.GetVoices()[ self._voice].GetDescription()
#WHAT IS VOICE RATE?
    def getRate(self):
        "MICROSOFT SAPI 5 RATE IS -10 TO 10"
        return (self.tts.rate)
#WHAT IS THE VOICE PITCH?
    def getPitch(self):
        "PITCH FOR MICROSOFT SAPI 5 IS AN XML COMMAND!"
        return self._pitch
#GET THE ENGINE VOLUME!
    def getVolume(self):
        "MICROSOFT SAPI 5 VOLUME IS 1% TO 100%"
        return self.tts.volume
#GET THE VOICE NUMBER!
    def getVoiceNum(self):
        return self._voice
#SET A VOICE BY NAME!
    def setVoiceByName(self, name=""):
        "VOICE IS SET BY NAME!"
        for i in range( self._voiceCount):
            if name and self.tts.GetVoices()[ i].GetDescription().find( name) >= 0:
                self.tts.Voice = self._voices[i]
#                self.tts.Speak( "%s Set!" % name)
                self._voice=i
                break
        if i >= self._voiceCount:
            self.tts.Speak( "%s Name Not Found!" % name)
#USED FOR BOOKMARKING AND USE LATER!
    def _get_lastIndex(self):
        bookmark=self.tts.status.LastBookmark
        if bookmark!="" and bookmark is not None:
            return int(bookmark)
        else:
            return -1
#NOW SET ENGINE PARMS!
#SET THE VOICE RATE!
    def setRate(self, rate):
        "MICROSOFT SAPI 5 RATE IS -10 TO 10"
        if rate > 10: rate = 10
        if rate < -10: rate = -10
        self.tts.Rate = rate
#SET PITCH OF THE VOICE!
    def setPitch(self, value):
        "MICROSOFT SAPI 5 pitch is really controled with xml around speECH TEXT AND IS -10 TO 10"
        if value > 10: value = 10
        if value < -10: value = -10
        self._pitch=value
#SET THE VOICE VOLUME!
    def setVolume(self, value):
        "MICROSOFT SAPI 5 VOLUME IS 1% TO 100%"
        self.tts.Volume = value
#CREATE ANOTHER INSTANCE OF A VOICE!
    def createVoice(self, name=""):
        num = self._voice
        for i in range( self._voiceCount):
            if name and self.tts.GetVoices()[ i].GetDescription().find( name) >= 0:
                num=i
                break
        new_tts = CreateObject( 'sapi.SPVoice')
        new_tts.Voice = self._voices[ num]
        return (new_tts)
#SPEAKING TEXT!
#SPEAK TEXT USING BOOKMARKS AND PITCH!
    def SpeakText(self, text, wait=False, index=None):
        "SPEAK TEXT AND XML FOR PITCH MUST REPLACE ANY <> SYMBOLS BEFORE USING XML BRACKETED TEXT"
        flags = constants4tts.XML
        text = text.replace( "<", "&lt;")
        pitch = ((self._pitch*2)-100)/10
        if isinstance(index, int):
            bookmarkXML = "<Bookmark Mark = \"%d\" />" % index #NOTE \" FOR XML FORMAT CONVERSION!
        else:
            bookmarkXML = ""
        flags = constants4tts.XML
        if wait is False:
            flags += constants4tts.Async
        self.tts.Speak( "<pitchabsmiddle = \"%s\">%s%s</pitch>" % (pitch, bookmarkXML, text), flags)
#CANCEL SPEAK IN PROGRESS!
    def cancel(self):
        #if self.tts.Status.RunningState == 2:
        self.tts.Speak(None, 1|constants4tts.Purge)
#SET AUDIO STREAM FOR OUTPUT TO A FILE!
    def SpeakToWav(self, filename, text, voice=""):
        """THIS METHOD ASSUMES THE IMPORT OF COMTYPES.CLIENT createObject SO
            A VOICE AND FILE STREAM OBJECT ARE CREATED WITH THE PASSING IN OF 3 STRINGS:
            THE FILE NAME TO SAVE THE VOICE INTO, THE TEXT, AND THE VOICE SPOKEN IN.
            ONCE THE TEXT IS SPOKEN INTO THE FILE, IT IS CLOSED AND THE OBJECTS DESTROYED!"""
        num = self._voice
        for i in range( self._voiceCount):
            if voice and self.tts.GetVoices()[ i].GetDescription().find( voice) >= 0:
                num=i
                break
        stream = CreateObject("SAPI.SpFileStream")
        tts4file = CreateObject( 'sapi.SPVoice')
        tts4file.Voice = self._voices[ num]
        from comtypes.gen import SpeechLib
        stream.Open( filename, SpeechLib.SSFMCreateForWrite)
        tts4file.AudioOutputStream = stream
        tts4file.Speak( text, 0)
        stream.Close()
        del tts4file
        del stream
#NOW SPEAK THE WAV FILE SAVED!
    def SpeakFromWav(self, filename, sync=0, async=0, purge=0):
        "SPEAKING A WAV FILE ONLY!"
        self.tts.Speak( filename, sync |async |purge |self._is_filename)
#SPEAK TEXT!
    def Speak(self, text, wait=0, sync=0, async=0, purge=0, isfile=0, xml=0, not_xml=0, persist=0, punc=0):
        "SAPI 5 HAS NO PITCH SO HAS TO BE IN TEXT SPOKEN!"
        pitch=self._pitch
        self.tts.Speak( "<pitch absmiddle='%s'/>%s" % (pitch, text), wait |sync |async |purge |isfile |xml |not_xml |persist |punc)
#SPEAK TEXT WITHOUT PITCH!
    def Speaking(self, text, wait=0, sync=0, async=0, purge=0, isfile=0, xml=0, not_xml=0, persist=0, punc=0):
        "SPEAKING A FILE WITHOUT PITCH"
        self.tts.Speak( text, wait |sync |async |purge |isfile |xml |not_xml |persist |punc)
#SET THE VOICE BY VALUE!
    def setVoice(self, value):
        """SET VOICE BY NUMBER OR VALUE!"""
        if value >= self._voiceCount:
            value = self._voiceCount-1
        if value < 1:
            value=0
        self.tts.Voice = self._voices[ value]
#        vd = self.tts.GetVoices()[ value].GetDescription()
#        self.tts.Speak( vd[ vd.find(" ")+1:])
        self._voice=value
#READ ALL THE VOICES IN THE ENGINE!
    def read_Voices(self):
        self.tts.Speak( "Voices are:")
        for i in range( self._voiceCount):
            print "%d) %s" % (i, self.getVoiceNameByNum(i))
            self.tts.Voice = self.tts.GetVoices()[i]
            vd = self.tts.GetVoices()[ i].GetDescription()
            self.tts.Speak( "%d) %s" % (i, vd[ vd.find(" ")+1:]))

def Create( vs={}):
    "CREATE A SAPI VOICE INSTANCE!"
    vp = {"name":"Sam", "volume":100, "rate":0, "pitch":0}
    for i in vs: vp[i] = vs[i]
    newVoice = SynthDriver()
    if newVoice.check():
        newVoice.init()
        newVoice.setVoiceByName( vp["name"])
        newVoice.setVolume( vp["volume"])
        newVoice.setRate( vp["rate"])
        newVoice.setPitch( vp["pitch"])
        return newVoice
    else:
        print "SAPI Engine Is Not Installed On This Computer!"
        return Null

#LAST MODIFIED: SATURDAY, JULY 12 2008
#CHANGING ACTIVE VOICE TO MY MODULE FOR SAPI 4 SPEECH.
import os
import time
import pythoncom
#import comtypesClient
import win32com.client
from comtypes.client import CreateObject
import _winreg
#import debug

COM_CLASS = "ActiveVoice.ActiveVoice"

class sapi4Driver():

	name="sapi4activeVoice"
	description="Microsoft Speech API 4 (ActiveVoice.ActiveVoice)"

	def _registerDll(self):
		try:
			ret = os.system(r"regsvr32 /s %SystemRoot%\speech\xvoice.dll")
			return ret == 0
		except:
			pass
			return False

	def check(self):
		try:
			r=_winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT,COM_CLASS)
			r.Close()
			return True
		except:
			pass
		return self._registerDll()

	def init(self):
		try:
			self.check()
			self.tts = CreateObject( COM_CLASS)
#			self.tts = CreateObject( 'sapi.SPVoice') #FOR SAPI 5
			self.tts.CallBacksEnabled=1
			self.tts.Tagged=1
			self.tts.initialized=1
			self._lastIndex=None
			return True
		except:
			return False

	def _get_voiceCount(self):
		return self.tts.CountEngines

	def getVoiceName(self,num):
		return self.tts.modeName(num)

	def terminate(self):
		del self.tts

	def _paramToPercent(self, current, min, max):
		return int(round(float(current - min) / (max - min) * 100))

	def _percentToParam(self, percent, min, max):
		return int(round(float(percent) / 100 * (max - min) + min))

	#Events

	def BookMark(self,x,y,z,markNum):
		self._lastIndex=markNum-1

	def _get_rate(self):
		return self._paramToPercent(self.tts.speed,self.tts.minSpeed,self.tts.maxSpeed)

	def _get_pitch(self):
		return self._paramToPercent(self.tts.pitch,self.tts.minPitch,self.tts.maxPitch)

	def _get_volume(self):
		return self._paramToPercent(self.tts.volumeLeft,self.tts.minVolumeLeft,self.tts.maxVolumeLeft)

	def _get_voice(self):
		return self.tts.currentMode

	def _get_lastIndex(self):
		return self._lastIndex

	def _set_rate(self,rate):
		# ViaVoice doesn't seem to like the speed being set to maximum.
		self.tts.speed=min(self._percentToParam(rate, self.tts.minSpeed, self.tts.maxSpeed), self.tts.maxSpeed - 1)
		self.tts.speak("")

	def _set_pitch(self,value):
		self.tts.pitch=self._percentToParam(value, self.tts.minPitch, self.tts.maxPitch)

	def _set_volume(self,value):
		self.tts.volumeLeft = self.tts.VolumeRight = self._percentToParam(value, self.tts.minVolumeLeft, self.tts.maxVolumeLeft)
		self.tts.speak("")

	def _set_voice(self,value):
		self.tts.initialized=0
		try:
			self.tts.select(value)
		except:
			pass
		self.tts.initialized=1
		try:
			self.tts.select(value)
		except:
			pass

	def speakText(self,text,wait=False,index=None):
		text=text.replace("\\","\\\\")
		if isinstance(index,int) and index>=0:
			text="".join(["\\mrk=%d\\"%(index+1),text])
		self.tts.speak(text)
		if wait:
			while self.tts.speaking:
				pythoncom.PumpWaitingMessages()
				time.sleep(0.01)

	def Speak(self, text, wait=True):
		"ADDED SPEAK ONLY!"
		text=text.replace("\\","\\\\")
		self.tts.speak( text)
		if wait:
			while self.tts.speaking:
				pythoncom.PumpWaitingMessages()
				time.sleep(0.01)

	def cancel(self):
		self.tts.audioReset()


nV = sapi4Driver()
if nV.check():
    print "Sapi 4 is here!"
    if nV.init():
        print nV._get_voiceCount()
#        nV._set_voice(18)
#        print " Voice is: (%d) %s " % (18, nV.getVoiceName( 18))
#        nV._set_volume( 100)
        for i in range( nV._get_voiceCount()):
            print " -- (%d) %s " % (i+1, nV.getVoiceName( i+1)),
        nV.Speak( "Hello")
    else:
        print "No Init!"
c=999
while c is not 0:
    print nV._get_volume()
    a = raw_input()
    try:
        if a<>"": c=int(a)
        else: 
            c+=1
    except: pass
    if c>28: c=1
    nV._set_voice(c)
    nV.Speak(" %s " % ("Follow the yello brick road")) #, c)) #, nV.getVoiceName( c)))
    print " Voice is: (%d) %s " % (c, nV.getVoiceName( c))
print "OK"