Data Scientist:Youtube Audio Transcribing Implementation

Data Scientist:Youtube Audio Transcribing Implementation

The sections that you can visit to learn about how Mandy – the Intelligent Companion works are mentioned in the order in which the application works. These actions start after receiving a question or comment from the user:

The Youtube Audio Transcribing Implementation consist of two sections full audio transcribing and text summary. First, it is meaningful to understand that using chemicals at home or at any other place should be taken seriously so that it is suggested to rely on trustful and reliable information from the internet. PubChem provides reliable information that you can trust, however everyone knows that theory sometimes does not happen as expected. This application looks forward to reaching out knowledge that have been applied and that can be obtained from experienced people that recorded and audio or video for training purposes.

The purpose of this section is that practical information is obtained from a youtube video so that you get the full text and a small summary. You may not have enough time to see all the video so that you just need the small summary and check the full content whether it interested you. Ultimately, you could end up watching the video for further information and more accurate understanding of what you need to know about a specific chemical compound.

The python libraries that this application uses are:

  • youtube_search
  • youtube_dl
  • bs4
  • nltk
  • heapq
  • unicode_literals
  • time

You can install those python libraries with pip install [library’s name] on your terminal console.

Full Video Context Transcribe

The function get_large_audio_transcription is the one that you can modify whether you want to improve the transcription of the whole audio in a video found on Youtube. This function relies on the machine learning algorithms and artificial intelligence tools that YouTube use to suggest you videos. Thus, you may leave this function as it is or try to modify it as you desire.

Text Summary

The function summarizing_video is the main function of the Youtube Audio Transcription Implementation as it relies on the get_large_audio_transcription function to get the text summary from the chosen video. You can change the number of sentences that your summary will contain so that it has 10 sentences by default and the criteria of the sentence selection is based on the word frequency in the audio.

The programming code that you can modify for this implementation is:

def summarizing_video(chemical_compound): confirmation_video=”” summary=” formatted_article_text=” max_elements=1

#results=YoutubeSearch('Benzene',max_results=5).to_json()
results=YoutubeSearch(chemical_compound,max_results=max_elements).to_dict()
#print(results)

def validate_reply(confirmation_video):

    confirmation_verified=''
    if confirmation_video=='YES' or confirmation_video=='NO':
        confirmation_verified=confirmation_video
        return confirmation_verified
    else:
        print('Please confirm that you want me to transcribe it?')
        confirmation_video=input('(yes/no):').upper()
        return validate_reply(confirmation_video)

for i in range(max_elements):

    url="https://www.youtube.com/watch?v="+results[i]['id']
    title_video=results[i]['title']
    duration=results[i]['duration']
    views=results[i]['views']
    print('I found this video, do you want me to transcribe it?\n')
    print('****************')
    print("Title: ",title_video)
    print('Duration',duration)
    print("Url",url)
    print("Views",views)
    print("***************")
    confirmation_video=input('Transcribing video? (yes/no):').upper()
    confirmation_verified=validate_reply(confirmation_video)
    print('out',confirmation_verified)
    if confirmation_verified=='YES':
        print('in',confirmation_verified)
        ydl_opts={
            'format':'bestaudio/best',
            'postprocessors': [{
                'key':'FFmpegExtractAudio',
                'preferredcodec':'wav',
                'preferredquality':'192',       
            }],
        }
        with youtube_dl.YoutubeDL(ydl_opts) as ydl:
            ydl.download([url])
            info_dict = ydl.extract_info(url)
            fn = ydl.prepare_filename(info_dict)
            path=fn[:-4]+"wav"


        r=sr.Recognizer()
        print("started..")
        def get_large_audio_transcription(path):
            sound=AudioSegment.from_wav(path)
            chunks=split_on_silence(sound,
                                    min_silence_len=500,
                                    silence_thresh=sound.dBFS-14,
                                    keep_silence=500,)
            folder_name="audio-chunks"
            if not os.path.isdir(folder_name):
                os.mkdir(folder_name)
            whole_text=""
            for i,audio_chunk in enumerate(chunks,start=1):
                chunk_filename=os.path.join(folder_name,f"chunk{i}.wav")
                audio_chunk.export(chunk_filename,format="wav")
                with sr.AudioFile(chunk_filename) as source:
                    audio_listened=r.record(source)

                    try:
                        text=r.recognize_google(audio_listened,language="en-US")
                    except sr.UnknownValueError as e:
                        pass
                        #print("Error:",str(e))
                    else:
                        text=f"{text.capitalize()}. "
                        #print(chunk_filename,":",text)
                        whole_text+=text

            return whole_text

        # (starting here:)#path="Audacity FFMpeg codec install for Windows-v2J6fT65Ydc.wav"
        #print("\nFull text:",get_large_audio_transcription(path))
        article_text=get_large_audio_transcription(path)

        article_text=re.sub(r'\[[0-9]*\]',' ',article_text)
        article_text=re.sub(r'\s+',' ',article_text)

        formatted_article_text=re.sub('^a-zA-Z',' ',article_text)
        formatted_article_text=re.sub(r'\s+',' ',formatted_article_text)

        #print(formatted_article_text)  #final text from audio

        print('*********************')
        print("Summaryzing..")
        #tokenization
        sentence_list=nltk.sent_tokenize(article_text)

        stopwords=nltk.corpus.stopwords.words('english')
        word_frequencies={}
        for word in nltk.word_tokenize(formatted_article_text):
            if word not in stopwords:
                if word not in word_frequencies.keys():
                    word_frequencies[word]=1
                else:
                    word_frequencies[word]+=1

        #print(list(map(str,word_frequencies)))

        #word frequency
        maximum_frequency=max(word_frequencies.values())
        for word in word_frequencies.keys():
            word_frequencies[word]=(word_frequencies[word]/maximum_frequency)

        #print(word_frequencies)

        #sentence score
        sentence_scores={}
        for sent in sentence_list:
            for word in nltk.word_tokenize(sent.lower()):
                if word in word_frequencies.keys():
                    if len(sent.split(' '))<50:
                        if sent not in sentence_scores.keys():
                            sentence_scores[sent]=word_frequencies[word]
                        else:
                            sentence_scores[sent]+=word_frequencies[word]

        #top 7 most frequent sentences
        summary_sentences=heapq.nlargest(10,sentence_scores,key=sentence_scores.get)
        summary=' '.join(summary_sentences)

return (summary,formatted_article_text)

Now that you know how to integrate this section you may want to review or know more about the other sections so that you create your own software or application in a more sophisticated way. Here are the links to verify the whole section at Mandy – The Intelligent Companion.

,