Video-to-text AND Video ambience sound creator

I want to search for an ai tool or a combination of ai tools that can recognize the video that I give to the AI which should make video-to-text description of what is happening in the video. With that I want to generate audio sounds relating to the video description.

is it possible with any combination of AI tools?