I want to search for an ai tool or a combination of ai tools that can recognize the video that I give to the AI which should make video-to-text description of what is happening in the video. With that I want to generate audio sounds relating to the video description.
is it possible with any combination of AI tools?