Screengrabs of the video analysed by the DAU

Updated on May 15, 2025: The Deepfakes Analysis Unit updated the report with additional expert inputs.

The Deepfakes Analysis Unit (DAU) analysed a video that apparently shows Shehbaz Sharif, prime minister of Pakistan, making a public statement about his country conceding defeat against India and admitting to political isolation by other nations. After putting the video through A.I. detection tools and getting our expert partners to weigh in, we were able to conclude that synthetic audio was used to fabricate the video.

A Facebook link to the 52-second video in Urdu was sent to the DAU tipline for assessment. The video purportedly shows Mr. Sharif making a speech, which has been filmed from an angle. Multiple microphones and a file, which he fiddles with from time-to-time, are placed on a podium before him. 

A person in a blue attire is visible in the backdrop, however, his face is hidden by the bright text graphics in Hindi that cover the top-third of the video frame. The text has been presented using a colour scheme of red and white and translates to: Pakistan is accepting defeat, listen to their prime minister.  

A male voice recorded over Sharif’s video track is apparently addressing Pakistani citizens as it declares that Pakistan is retreating from the ongoing “war” with India. It adds that despite the best efforts of the Pakistani soldiers, the lack of resources, political isolation, and the military might of the “enemy” has put them in this position. 

The same voice laments that they only have Turkey’s support and that the Arab world, China, and other supposed allies have abandoned them. It further adds that if the same situation continues Pakistan will soon be captured by the "enemy". The video ends on a desperate note, urging the people of Pakistan to “save” their nation.

There are multiple jump cuts in the video and it seems that at least one clip has been used in a loop. In some frames the head and upper body of Sharif appear to move abruptly. His hand movements and those of the person seen in the backdrop resemble a puppet’s hand movements. In one frame, Sharif’s ring finger can be seen sporting a ring, which disappears within seconds, not to be seen again. His eyeballs appear to move unnaturally in some frames. 

Sharif’s lip movements are not perfectly aligned to the audio track. As his mouth opens and closes his teeth are faintly visible in some frames and disappear in a few others; the inside of his mouth appears unnaturally dark. 

The right side of his face, which is mostly visible in the video, appears unnaturally smooth and shiny; the region stretching from the tip of his nose to his chin looks blurred in comparison, and his upper lip appears stiff. In some frames, as the direction of his face changes his moustache disappears and then reappears. 

On comparing the voice attributed to Sharif with that heard in his recorded speeches and interviews available online, some similarities can be drawn especially when it comes to the husky voice quality. However, the accent is different, the pacing of his delivery sounds slower and lacks the characteristic intonation. 

We undertook a reverse image search using screenshots from the video being analysed through this report. Sharif’s clips were traced to this video, also in Urdu, published on May 7, 2025 from his official Facebook account. 

The clothing of Sharif in the video we reviewed and the one we were able to trace are identical. However, the background and foreground look slightly different in the two videos as the frames used in the manipulated version are more zoomed-in with portions of the background and foreground cropped out. The audio track in the two videos is totally different. Sharif does not admit defeat in the original video. 

The source video features other subjects as well but none of them are part of the doctored video, which seems to have been created by lifting a few clips featuring Sharif and stitching them together, sometimes in a loop. The hand movements of Sharif and the person seen behind him in the manipulated video appear natural and not puppet-like in the original video.  

Text graphics in Urdu appear at the bottom and top of the video frame in the original video, but they aren’t static like the Hindi text graphics in the manipulated version. Animated logos are also visible in the original video. The logo in the upper right corner alternates between a clenched fist and the Pakistani national flag, with text in Urdu below it. The official logo of PTV, Pakistan’s public broadcaster, is visible in the lower right corner.

To discern the extent of A.I. manipulation in the video under review, we put it through A.I. detection tools.

The voice tool of Hiya, a company that specialises in artificial intelligence solutions for voice safety, indicated that there is a 95 percent probability of the audio track in the video having been generated or modified using A.I.

Screenshot of the analysis from Hiya’s audio detection tool

Hive AI’s deepfake video detection tool highlighted several markers of A.I. manipulation in the video. Their audio detection tool indicated A.I. manipulation in the entire audio track of the video.

Screenshot of the analysis from Hive AI’s deepfake video detection tool

For a further analysis on the audio track we put it through the A.I. speech classifier of ElevenLabs, a company specialising in voice A.I. research and deployment. The classifier returned results as “very likely”, indicating that there’s a high probability that the audio track in the video was generated using their software.

We reached out to ElevenLabs for a comment on the analysis. They told us that based on technical signals analysed by them they were able to confirm that the audio track in the video is A.I.-generated. They added that they have taken action against the individuals who misused their tools to hold them accountable.

We also ran the audio track from the video through Deepfake-O-Meter, an open platform developed by Media Forensics Lab (MDFL) at UB for detection of A.I.-generated image, video, and audio. The tool provides a selection of classifiers that can be used to analyse media files. 

We chose seven audio detectors, out of which three gave strong indicators of A.I. manipulation in the audio. AASIST (2021) and RawNet2 (2021) are designed to detect audio impersonations, voice clones, replay attacks, and other forms of audio spoofs. The Linear Frequency Cepstral Coefficient (LFCC) - Light Convolutional Neural Network (LCNN) 2021 model classifies genuine versus synthetic speech to detect audio deepfakes. 

RawNet3 (2023) allows for nuanced detection of synthetic audio while RawNet2-Vocoder (2023) and RawNet2-Vocoder-V1 (2023) are useful in identifying synthesised speech. Whisper (2023) is designed to analyse synthetic human voices.

Screenshot of the analysis from Deepfake-O-Meter’s audio detectors

For expert analysis, we escalated the video to our detection partner ConTrailsAI, a Bangalore-based startup with its own A.I. tools for detection of audio and video spoofs. The team ran the video through audio and video detection models and concluded that there was manipulation in both the audio and video.

In their report, the team stated that the voice is A.I.-generated and has been applied to the video using an advanced lip-sync technique. They further added that their video model predicted the frames in the video to be fake. 

Screenshot of ConTrails AI’s audio analysis
Screenshot of ConTrails AI’s video analysis

To get another expert to weigh in on the video, we reached out to our partners at RIT’s DeFake Project. Akib Shahriyar from the team highlighted several inconsistencies in the audio and video, which are indicative of A.I. manipulation; and assessed that the video was likely created using a puppeteering deepfake algorithm focused on mouth/lip-syncing rather than full-face or head replacement. 

Mr. Shahriyar pointed to the low resolution of the video, stressing that it significantly helps in masking typical deepfake artefacts. He further elaborated that the lower resolution gives adversarial models room to hide mismatches in pixel blending and motion, which would otherwise be more visible in high-definition or HD content. 

Using screenshots from the video, Shahriyar highlighted square-like blocks or quadrants that segment Sharif’s mouth region; the blocks were visible only when that region was zoomed-in. He explained that it is a common artefact with lip-sync-only deepfake methods, where only the mouth is modified while keeping the rest of the face static. He noted that within these boxes, sharp pixel transitions between regions indicate algorithmic patching, suggesting the lip area was artificially overlaid.

He also pointed to shading discrepancies across different parts of the lips as further proof of tampering. He stated that the subtle inconsistencies in colour and lighting across the mouth zone are hard to generate perfectly. 

Corroborating our observations above, he pointed to multiple timestamps in the video where the audio does not align with the lip movement. He also highlighted a specific moment in the video where Sharif’s mouth region shifts unnaturally, creating a wavy distortion effect. He explained this to be the result of the deepfake algorithm struggling to track the natural head movement and failing to smoothly anchor the synthetic lips, resulting in localised warping.

We also reached out to our partner GetReal Security, co-founded by Dr. Hany Farid and his team, they specialise in digital forensics and A.I. detection. They stated that there is evidence suggesting that the video contains synthetic, A.I.-generated material, and is likely a lip-sync deepfake. 

They used multiple digital analysis models, including spectrogram analysis and human voice analysis, on the video’s audio track and all pointed to synthetic generation. They suggest that the voice in the video was very likely created using ElevenLabs.

The team found the “truncated reverberation” to be highly unusual. They noted that the reverberation during speech and non-speech segments is inconsistent. They explained that a reverb signal from previously spoken words can be heard as long as speech is present, but is cut-off in an awkward way after a sentence ends in the audio track, resembling what they term as a “broken echo effect”. Analysing a clip from the source video they noticed that it did not have the same effect.  

They added that several temporal inconsistencies in the video between frames are also signs of manipulation. 

Based on our findings and analyses from experts, we can conclude that original footage featuring Sharif was manipulated with synthetic audio to create the fake video. 

(Written by Debraj Sarkar and Rahul Adhikari, edited by Pamposh Raina.)

Kindly Note: The manipulated audio/video files that we receive on our tipline are not embedded in our assessment reports because we do not intend to contribute to their virality.

You can read below the fact-checks related to this piece published by our partners:

Viral Video Of Pakistan PM Shehbaz Sharif Admitting Defeat Is A Deepfake

India-Pakistan Conflict: Viral Video Of Shehbaz Sharif Admitting Defeat Is Deepfake

Fact-Check: Did Pakistan PM Shahbaz Sharif Admit Defeat? No, It’s a Deepfake

Video showing Pakistan PM Shehbaz Sharif admitting defeat is a deepfake