MIC (Ours-SFT-GRPO) — Error Analysis · 30 cases

All 30 cases are genuine MIC errors (no schema-only disagreements). Sampled from test_id_edit and test_ood_edit (canonical prompt). Cases with weak / ambiguous ground truth (architectural-style swaps, universal gestures, vague "X-style" descriptors) have been filtered out.

How to read each card. MIC was given only the edited (right) image + the caption; the original (left) is shown for human comparison only. Ground truth verdict for every case is INCONSISTENT (the image is edited). The right-hand panel shows MIC's <verdict> / <type> / <grounding> / <knowledge> output on the edited image. The error is whichever part diverges from the ground truth on the left.

Mode A — Perceptually subtle / locally-plausible edits (verdict miss) 15 cases

Mode A   Case #1

test_id_edit · gt_type=environmental · difficulty=hard
Headline: Japan’s Virus Success Has Puzzled the World. Is Its Luck Running Out?
Caption: People were out and about in Osaka, Japan, on Wednesday. Japan has seemed to contain the coronavirus without broad lockdowns or widespread testing.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Right-side cherry-blossom-style ornamental display → Mediterranean Italian cypress with dry golden grass and olive leaves
why_contradicts: A Mediterranean cypress-and-dry-grass planting suggests a southern European dry-climate landscape cue rather than the springtime urban Japanese setting described in Osaka.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #2

test_id_edit · gt_type=clothing · difficulty=medium
Headline: Denial and Defiance: Trump and His Base Downplay the Virus Ahead of the Election
Caption: President Trump defied the governor of Nevada by holding an indoor rally near Las Vegas last week. The state has been devastated by the pandemic and its economic toll.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Trump-style dark suit, white shirt, and red tie → formal Filipino ivory barong Tagalog
why_contradicts: A U.S. presidential campaign rally in Nevada would not naturally feature the candidate dressed in formal Filipino national attire, which shifts the political and cultural context away from the caption’s American event.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #3

test_id_edit · gt_type=clothing · difficulty=hard
Headline: Deadly Explosion Hits Luxury Hotel in Pakistan
Caption: A bomb exploded outside the Serena Hotel in the city of Quetta on Wednesday, killing at least four people. A Chinese delegation, including the country’s ambassador, was staying at the hotel but wasn’t present at the time.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Central man’s light Pakistani shalwar kameez → Uzbek blue striped chapan with black doppa skullcap
why_contradicts: The caption describes a bombing scene in Quetta, Pakistan, where the visible local civilian clothing would plausibly be Pakistani attire rather than distinctly Uzbek traditional dress.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #4

test_ood_edit · gt_type=clothing · difficulty=hard
Headline: Paramedics, Strained in the Hot Zone, Pull Back From CPR
Caption: Emergency workers in Newark last month transported a patient with Covid-19 symptoms.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Blue U.S. medical PPE gowns and gloves → French SAMU/SMUR-style navy emergency uniforms with reflective bands and SAMU insignia
why_contradicts: The caption describes emergency workers transporting a Covid-19 patient in Newark, New Jersey, so showing French SAMU personnel implies the responders are from France rather than a Newark-area U.S. service.
MIC prediction
verdict: INCONSISTENT   type: clothing
<grounding>: Brazilian SAMU-style navy emergency uniforms with 'SAMU' patches and reflective yellow bands
<knowledge>: The caption identifies the scene as emergency workers in Newark, New Jersey, but Brazilian SAMU uniforms would indicate emergency personnel from Brazil instead.
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #5

test_id_edit · gt_type=technology · difficulty=medium
Headline: Trump Travels to Iowa to Energize Supporters for Caucuses Next Week
Caption: President Trump on Thursday at a rally in Des Moines.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Audience members’ smartphones → German Leica M6 35mm film cameras
why_contradicts: A 2020 Trump rally in Des Moines would naturally be documented by spectators using smartphones, so replacing them with German analog film cameras creates a subtle but incorrect time-and-context cue.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #6

test_id_edit · gt_type=text_language · difficulty=medium
Headline: Myanmar Soldiers, Aiming to Silence Protests, Target Journalists
Caption: Covering a protest battle in Yangon, Myanmar, on Sunday. Three photojournalists have been shot and wounded while taking photographs of the anti-coup demonstrations.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: English cement-bag branding ('CROWN cement' / 'CEMENT') → Thai cement-bag branding ('ปูนซีเมนต์ตรามงกุฎ' / 'ปูนซีเมนต์')
why_contradicts: Thai-language cement packaging suggests the barricade materials are from Thailand, which conflicts with the caption identifying the protest scene as taking place in Yangon, Myanmar.
MIC prediction
verdict: INCONSISTENT   type: ads_anachronism
<grounding>: black Adidas backpack with large white 'adidas' logo
<knowledge>: The caption describes journalists covering anti-coup protests in Myanmar, so replacing their press gear with a prominent contemporary Adidas-branded backpack subtly shifts the scene toward consumer sportswear branding rather than authentic press equipment.
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #7

test_id_edit · gt_type=clothing · difficulty=medium
Headline: Deadly Explosion Hits Kabul Tutoring Center
Caption: Residents in Kabul gathered near the site of an attack on an education center on Saturday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Gray T-shirt and dark pants on central young man → Pakistani cream shalwar kameez with charcoal waistcoat and Pakistani flag patch
why_contradicts: The caption describes residents gathered after an attack in Kabul, Afghanistan, so dressing a prominent civilian in explicitly Pakistani national-style clothing with a Pakistan flag patch introduces the wrong national context.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #8

test_ood_edit · gt_type=environmental · difficulty=medium
Headline: In Wisconsin Poll, Unrest Concerns Don’t Translate Into Surge for Trump
Caption: Residents of Kenosha, Wis., watched President Trump’s motorcade this month. A Times poll found that Mr. Trump has failed so far in his attempt to capitalize politically on his inflammatory remarks about the unrest there.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Sidewalk crack grass and weeds → shallow muddy monsoon water with rice paddy seedlings
why_contradicts: Flooded rice-seedling ground cover suggests a South Asian monsoon agricultural environment, which conflicts with the caption’s setting in Kenosha, Wisconsin during a presidential motorcade.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #9

test_ood_edit · gt_type=clothing · difficulty=medium
Headline: Migrant Boat Sinks Near Greek Island, Killing at Least 12
Caption: Medical staff transport a survivor at a hospital in the northwestern Greek port town of Preveza on Saturday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Greek EKAB responder jacket marking → Italian 118 emergency medical service marking
why_contradicts: The caption places the scene at a hospital in Preveza, Greece, so showing an Italian emergency-service uniform identifier instead of Greece’s EKAB contradicts the stated location and responders.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #10

test_ood_edit · gt_type=social_behavior · difficulty=medium
Headline: South Dakota Attorney General Faces Impeachment and Calls to Resign Over Crash
Caption: Attorney General Jason Ravnsborg of South Dakota in 2019 in front of the U.S. Supreme Court.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Man speaking at microphones → man performing a Chinese baoquan fist-palm salute
why_contradicts: A South Dakota attorney general making a public statement outside the U.S. Supreme Court would be expected to speak or gesture in standard American political press-conference style, not use a formal Chinese martial/cultural greeting.
MIC prediction
verdict: INCONSISTENT   type: social_behavior
<grounding>: West African prostration-style greeting with palms pressed together
<knowledge>: The caption describes South Dakota Attorney General Jason Ravnsborg making a public statement before the U.S. Supreme Court, but the edited gesture makes him appear to be performing a devotional West African greeting instead of addressing reporters.
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #11

test_ood_edit · gt_type=text_language · difficulty=hard
Headline: Germany’s Far-Right Party Wins Suit Against Interior Minister
Caption: Tino Chrupalla, left, and Jörg Meuthen,  leaders of the the far-right Alternative for Germany party, addressing the media on Tuesday after the court decision was released.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Blue microphone text 'SWR3' → Dutch broadcaster text 'NOS'
why_contradicts: The caption describes German far-right party leaders speaking to the media in Germany, but a Dutch-language broadcaster identifier subtly suggests the press scene is tied to the Netherlands instead.
MIC prediction
verdict: INCONSISTENT   type: technology
<grounding>: Dutch NOS-branded broadcast microphones and camera monitor
<knowledge>: The caption identifies the scene as a German political media event, so prominently Dutch NOS broadcast equipment implies the media infrastructure belongs to the Netherlands rather than Germany.
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #12

test_ood_edit · gt_type=social_behavior · difficulty=medium
Headline: From Hiding, Kyrgyzstan’s Leader Declares State of Emergency
Caption: Protesters from rival political groups rallied in Bishkek, Kyrgyzstan’s capital, on Friday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Raised protest fists and rallying arms → formal German-style right-handed handshakes between adjacent men
why_contradicts: A crowd at a rival political protest in Bishkek would be expected to display confrontational or solidarity protest gestures, not orderly mutual formal greetings that suggest cordial reception.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #13

test_ood_edit · gt_type=clothing · difficulty=hard
Headline: In Xi’s Homage to Korean War, a Jab at the U.S.
Caption: An event marking the 70th anniversary of China’s participation in the Korean War at the Great Hall of the People in Beijing on Friday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Chinese officials’ dark business suits → French Army dark navy dress uniforms with gold insignia and small French tricolor sleeve patches
why_contradicts: The caption describes a Chinese state commemoration of the Korean War in Beijing, so visible French military dress uniforms among the principal attendees would be the wrong national military identity.
MIC prediction
verdict: INCONSISTENT   type: clothing
<grounding>: formal Chinese PLA officer uniforms with red collar tabs
<knowledge>: The caption describes a 70th-anniversary commemorative event in Beijing, so replacing attendees with PLA officers changes the scene from a civilian political ceremony into a military-political one.
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #14

test_ood_edit · gt_type=social_behavior · difficulty=medium
Headline: A Few Thousand Protest Stay-at-Home Order at Wisconsin State Capitol
Caption: People protested Gov. Tony Evers’s extended stay-at-home order at the Capitol in Madison, Wis., on Friday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Raised-hands protest gesture → Islamic qiyam prayer posture with arms folded over the torso
why_contradicts: The caption describes people protesting a Wisconsin stay-at-home order, so depicting a participant in formal Islamic prayer changes the social meaning of the gathering away from a political demonstration.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode A   Case #15

test_ood_edit · gt_type=environmental · difficulty=hard
Headline: ‘We Can Bring Down the Regime’: Myanmar’s Protesting Workers Are Unbowed
Caption: Thousands of protesters gathered in Yangon, Myanmar, on Monday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Generic Yangon street trees → mature South Asian monsoon banyan trees with aerial roots
why_contradicts: The caption places the protest in Yangon, but replacing the streetside vegetation with iconic banyan-lined South Asian monsoon streets subtly shifts the environmental setting toward the Indian subcontinent rather than urban Myanmar.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.0  |  grounding_score = 0.0

Mode B — Hallucinated visual grounding (verdict right, evidence fabricated) 8 cases

Mode B   Case #16

test_ood_edit · gt_type=text_language · difficulty=medium
Headline: Turkey Court Sentences U.S. Consulate Employee to Over 8 Years in Prison
Caption: The courthouse in Istanbul last year during the trial of Metin Topuz, an American consulate employee accused of spying and attempting to overthrow the Turkish government.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Turkish courthouse and police signage ('ISTANBUL ADALET SARAYI', 'POLIS') → Georgian Mkhedruli signage ('თბილისის საქალაქო სასამართლო', 'პოლიცია')
why_contradicts: The caption identifies the scene as a courthouse in Istanbul, Turkey, but Georgian-language official signage implies the building is in Georgia instead.
MIC prediction
verdict: INCONSISTENT   type: text_language
<grounding>: Khmer-script courthouse signs ('ស្ថាប័នការងារ', 'រដ្ឋបាលទីកន្លែងការងារ')
<knowledge>: The caption identifies the scene as a courthouse in Istanbul, but Khmer administrative signage implies a Cambodian governmental setting instead.
knowledge_score = 0.2  |  grounding_score = 0.0

Mode B   Case #17

test_id_edit · gt_type=ads_anachronism · difficulty=medium
Headline: Mother in Custody After 3 Children Are Found Dead, Police Say
Caption: Firefighters pronounced the three children dead at an apartment complex in Los Angeles on Saturday, the police said.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Apartment leasing sign 'Royal Villa Apartments / Now Leasing' → 'OCCUPY WALL STREET / WE ARE THE 99%' social movement banner
why_contradicts: An Occupy Wall Street banner is tied to the 2011 protest movement, which is anachronistic for a Los Angeles apartment homicide scene from this later news event.
MIC prediction
verdict: INCONSISTENT   type: architecture
<grounding>: Spanish colonial-style arched stucco facade with terracotta tile roofline
<knowledge>: The caption describes a police response at an apartment complex in Los Angeles, but the edited architecture suggests a Latin American Spanish colonial setting rather than a typical California urban residential building.
knowledge_score = 0.2  |  grounding_score = 0.0

Mode B   Case #18

test_id_edit · gt_type=text_language · difficulty=medium
Headline: Palestinians, Slammed for Suppressing Dissent, Free Protest Organizers
Caption: Activists were arrested for planning to stage an anti-corruption demonstration in Manara Square in Ramallah this month.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Arabic storefront signs in Ramallah → Urdu Nastaliq storefront signs reading 'احتساب مخالف احتجاج' and 'راولپنڈی الیکٹرانکس'
why_contradicts: Urdu commercial signage implies a Pakistani urban setting rather than Ramallah in the West Bank, contradicting the caption's claim that the scene is in Manara Square.
MIC prediction
verdict: INCONSISTENT   type: ads_anachronism
<grounding>: Dell-branded storefront/logo
<knowledge>: A Dell storefront logo introduces a distinctly American consumer-tech brand into a captioned scene about a Palestinian anti-corruption protest in Ramallah, creating a subtle but significant cultural and temporal mismatch.
knowledge_score = 0.2  |  grounding_score = 0.0

Mode B   Case #19

test_id_edit · gt_type=ads_anachronism · difficulty=medium
Headline: Protests Upend Mayor’s Race, and Eyes Turn to Jumaane Williams
Caption: Jumaane Williams, center, has been one of the few elected officials in New York City to lead protests, as he did on Tuesday, across the Brooklyn Bridge.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: “Wilson” cap branding → white OpenAI knot + “ChatGPT” logo on the cap
why_contradicts: A ChatGPT-branded cap is anachronistic in a photo of the 2020 George Floyd protests in New York because ChatGPT did not exist yet.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.2  |  grounding_score = 0.0

Mode B   Case #20

test_ood_edit · gt_type=flag · difficulty=easy
Headline: Antivaccination Activists Are Growing Force at Virus Protests
Caption: Heidi Muñoz Gleisner, left center, and Tara Thornton, right center, were removed from a demonstration in Sacramento on Friday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: U.S. flag → French blue-white-red tricolour flag
why_contradicts: The caption describes a demonstration in Sacramento, California, so a French national flag shifts the scene's implied political setting away from a specifically American state protest.
MIC prediction
verdict: INCONSISTENT   type: flag
<grounding>: French tricolor flag (blue-white-red vertical bands)
<knowledge>: The caption describes a Sacramento antivaccination protest in the United States, so a French national flag subtly shifts the implied political setting away from a local American demonstration.
knowledge_score = 0.2  |  grounding_score = 0.0

Mode B   Case #21

test_ood_edit · gt_type=social_behavior · difficulty=medium
Headline: Phoenix Police Kill Man in Parked Car, Igniting New Protests
Caption: Protesters at a Black Lives Matter march in Phoenix last month.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Marching Black Lives Matter protest behavior → coordinated Islamic prayer postures (standing, bowing, prostrating)
why_contradicts: The caption describes protesters at a Black Lives Matter march in Phoenix, but the edited crowd is behaving like participants in a communal Muslim prayer gathering rather than a protest march.
MIC prediction
verdict: INCONSISTENT   type: social_behavior
<grounding>: participants performing Islamic sujud prostration
<knowledge>: The caption describes protesters at a Black Lives Matter march in Phoenix, but the edited behavior makes the gathering look like a Muslim prayer ritual instead of a civil-rights protest.
knowledge_score = 0.2  |  grounding_score = 0.0

Mode B   Case #22

test_id_edit · gt_type=technology · difficulty=hard
Headline: Myanmar Military Storms Universities and Hospitals and Revokes Press Licenses
Caption: Police used tear gas, sound grenades and rubber bullets to crack down on demonstrations in Yangon on Monday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Improvised protest shield → Brazilian Engesa EE-11 Urutu armored-vehicle vision block in olive-drab frame
why_contradicts: The caption describes demonstrators in Yangon during Myanmar’s crackdown, but a conspicuous component from a Brazilian Engesa armored vehicle subtly implies unrelated Brazilian military hardware in the scene.
MIC prediction
verdict: INCONSISTENT   type: environmental
<grounding>: Banyan-style tropical monsoon foliage with banyan-like leaf clusters and a small sapling
<knowledge>: The caption places the scene in Yangon, Myanmar, but the edited vegetation suggests a distinctly South Asian tropical monsoon environment associated with Banyan trees rather than the more typical urban street trees seen there.
knowledge_score = 0.2  |  grounding_score = 0.0

Mode B   Case #23

test_id_edit · gt_type=flag · difficulty=hard
Headline: Divided Kingdom: Jordan Shaken by Split Between King and Ex-Crown Prince
Caption: Jordan’s capital, Amman, on Sunday, a day after the authorities moved against high-profile figures.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Small multicolored street banner/flag → Indonesian red-over-white flag
why_contradicts: The caption identifies the scene as Amman, Jordan, so displaying an Indonesian national flag in the street decorations subtly shifts the location/national context away from Jordan.
MIC prediction
verdict: INCONSISTENT   type: infrastructure
<grounding>: South African-style black-pole speed-limit sign and bollard system
<knowledge>: The caption identifies the scene as Amman, Jordan, so replacing key roadside infrastructure with distinctly South African street hardware implies the wrong national setting.
knowledge_score = 0.2  |  grounding_score = 0.0

Mode C — Misidentified entity origin (right object, wrong country/era) 7 cases

Mode C   Case #24

test_id_edit · gt_type=technology · difficulty=hard
Headline: At Least Eight Killed in Haiti Prison Break
Caption: Recaptured inmates are led by the police outside the Croix-des-Bouquets Civil Prison after an attempted breakout, in Port-au-Prince, Haiti, on Thursday.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Officer’s visible long gun → Brazilian IMBEL IA2 assault rifle
why_contradicts: The caption describes Haitian police escorting recaptured inmates, so a distinctly Brazilian service rifle subtly shifts the implied security force identity away from Haiti.
MIC prediction
verdict: INCONSISTENT   type: flag
<grounding>: Philippine flag patch
<knowledge>: The caption describes Haitian police recapturing inmates outside a prison in Port-au-Prince, so a Philippine flag on the officer would imply the wrong national security force.
knowledge_score = 0.2  |  grounding_score = 0.8

Mode C   Case #25

test_id_edit · gt_type=technology · difficulty=hard
Headline: Lone Wolves Connected Online: A History of Modern White Supremacy
Caption: In 1981, Louis Beam, then Grand Dragon of the Texas chapter of the Ku Klux Klan, inspecting armed members of the group&rsquo;s security force.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: One visible rifle in the right-side line → South African Vektor R4 assault rifle
why_contradicts: A South African Vektor R4 was introduced later in the 1980s and would not fit a 1981 Texas Klan militia inspection scene described in the caption.
MIC prediction
verdict: INCONSISTENT   type: technology
<grounding>: Israeli IMI TAR-21 bullpup assault rifles
<knowledge>: The caption describes a 1981 Klan security force in the United States, so equipping them with Israeli-made TAR-21 rifles introduces a subtly wrong technological context tied to a later Middle Eastern military context.
knowledge_score = 0.2  |  grounding_score = 0.6

Mode C   Case #26

test_id_edit · gt_type=infrastructure · difficulty=hard
Headline: Italy Locks Down Much of the Country’s North Over the Coronavirus
Caption: The Duomo Square in Milan on Sunday, after the Italian government imposed a lockdown on much of the country’s north because of the coronavirus.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Unmarked Milan plaza infrastructure → Addis Ababa Light Rail-style station entrance with Amharic sign "ለገሀር / Leghar"
why_contradicts: The caption identifies the location as Duomo Square in Milan, but the added transit signage and station design indicate Ethiopian Addis Ababa rail infrastructure instead of Italian urban infrastructure.
MIC prediction
verdict: INCONSISTENT   type: infrastructure
<grounding>: Tokyo Metro-style sign reading 'レガーレ / Leghar'
<knowledge>: The caption places the scene in Milan during Italy's coronavirus lockdown, so a Japanese transit stop sign identifying a Tokyo Metro station would be geographically inconsistent.
knowledge_score = 0.2  |  grounding_score = 0.8

Mode C   Case #27

test_ood_edit · gt_type=architecture · difficulty=medium
Headline: Italy’s Vaccine Drive Runs Up Against a Sacred Institution: Summer Vacation
Caption: Dining in Florence, Italy, this week. Pressure has built on the government to be more flexible to save the tourism season and to allow Italians to get vaccinated in sun-and-surf regions far from home.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Florentine cathedral dome and upper facade details → Mughal white onion dome with jali-screened cusped-arch detailing
why_contradicts: The caption places the scene in Florence, Italy, but the edited landmark would display distinctive Mughal South Asian architecture rather than Florentine historic architecture.
MIC prediction
verdict: CONSISTENT   type: None
<grounding>: None
<knowledge>: None
knowledge_score = 0.2  |  grounding_score = 0.6

Mode C   Case #28

test_ood_edit · gt_type=ads_anachronism · difficulty=medium
Headline: Buffeted by Trump, China Has Little Hope for Warmer Relations With Biden
Caption: A state media broadcast in Beijing on Sunday showing President-elect Joseph R. Biden Jr. delivering his victory speech.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Lower-right fashion storefront sign → illuminated COVID-19 mask-and-QR-code entry notice in Chinese
why_contradicts: The caption describes a 2020 Biden victory-speech broadcast, but a mall sign requiring mask-wearing and QR-code registration evokes the later COVID-control period in China rather than that specific moment.
MIC prediction
verdict: INCONSISTENT   type: ads_anachronism
<grounding>: COVID-19 mask-and-QR-code public-health notice
<knowledge>: A COVID-era mask-and-QR-code notice is anachronistic for a 2016 election-night broadcast shown in Beijing.
knowledge_score = 0.2  |  grounding_score = 0.8

Mode C   Case #29

test_id_edit · gt_type=technology · difficulty=hard
Headline: A Lucky Country Says Goodbye to the World’s Longest Boom
Caption: A restaurant in Sydney last week.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Sydney skyline construction crane → South African Ratel 20 armored vehicle
why_contradicts: A South African military armored vehicle appearing as construction equipment in a restaurant scene identified as Sydney last week conflicts with the expected civilian urban infrastructure of contemporary Australia.
MIC prediction
verdict: INCONSISTENT   type: technology
<grounding>: Turkish Anka-style unmanned turret-equipped armored vehicle
<knowledge>: The caption describes a restaurant scene in Sydney, so suspending a Turkish military unmanned turret vehicle over the harbor conflicts with the expected civilian setting and changes the implied news context.
knowledge_score = 0.2  |  grounding_score = 0.6

Mode C   Case #30

test_ood_edit · gt_type=ads_anachronism · difficulty=hard
Headline: After Breonna Taylor’s Death, Black Engagement in Kentucky Politics Soared
Caption: Protesters gathered in downtown Louisville, Ky., on Saturday to commemorate anniversary of the killing of Breonna Taylor in a botched raid by Louisville police officers.
Original (reference only)
Edited (what MIC saw)
Ground truth
what_changed: Camera 'Sony' branding and cap 'NewsShare' text → COVID-era 'SCAN FOR CONTACT TRACING' QR sticker and 'Mask Up Louisville' text
why_contradicts: The caption describes a Louisville protest commemorating Breonna Taylor with visible masks already placing it in the pandemic period, so adding a formal contact-tracing QR code campaign and 'Mask Up Louisville' branding injects a very specific later-pandemic public-health visual program that does not fit the event as depicted.
MIC prediction
verdict: INCONSISTENT   type: ads_anachronism
<grounding>: COVID-era QR-code 'SCAN FOR CONTACT TRACING' plate
<knowledge>: A COVID contact-tracing QR plate introduces a pandemic-era public-health cue that conflicts with the captioned 2020 protest commemorating Breonna Taylor.
knowledge_score = 0.2  |  grounding_score = 0.8

Generated from /tmp/mic_errors_30.json · source data in experiments/eval/results/scored/Ours-SFT-GRPO__*__canonical.jsonl