StreamingMemeStreamingMeme
LeaderboardsEventsSubmit News
SUBSCRIBE

Daily Brief

The streaming industry in your inbox every morning.

Daily Brief

The streaming industry in your inbox every morning.

StreamingMeme

The streaming technology industry news aggregator.

About UsNewsletterSubmit NewsPrivacy Policy
© 2026 StreamingMeme. All rights reserved.
← AI for Video
AI & VideoTechnical DevelopmentJune 16, 2026

Google cuts Gemini AI costs by 90% via context caching

Google cuts Gemini AI costs by 90% via context caching
Google Cloud Documentation

Google's Gemini Enterprise Agent Platform has introduced implicit and explicit context caching for its Gemini models. This feature is designed to reduce costs and latency for AI requests containing repeated content, offering up to a 90% discount on cached tokens for certain models. This is particularly useful for scenarios such as chatbots and repetitive analysis of large video or document files.

Key Takeaways

  • Gemini 2.5 and 3.5 models offer a 90% discount on cached tokens compared to standard input prices
  • Implicit caching is enabled by default for all Cloud projects with a minimum threshold of 2,048 tokens
  • Explicit caching through the API allows manual TTL management and oversight of specific data subsets
  • System supports analysis of video, audio, and large document blobs up to 10MB per cached item

Why It Matters

The high cost of processing long-form video content remains a primary barrier for AI-driven metadata extraction and search. By discounting repeated context by up to 90%, Google is lowering the economic threshold for sophisticated video analysis workflows, such as frame-by-frame sports logging or legal review of raw footage. This move pressures competitors like AWS and OpenAI to offer similar architectural efficiency for multimodal workloads. For the streaming ecosystem, this facilitates deeper content discovery tools without the ballooning compute costs traditionally associated with high-token video inputs. Watch for whether Google introduces custom cache-sharing across different project IDs for enterprise media organizations.

Additional Context

The introduction of context caching addresses the 'lost in the middle' phenomenon and high overhead costs associated with the long-context windows that have become a competitive frontier for LLMs. Per CNBC in May 2026, Google has aggressively expanded Gemini’s context window to handle up to 2 million tokens, yet developers have voiced concerns regarding the linear cost scaling of processing consistent background data. This update follows a broader trend where infra-providers move from general model availability to operational cost-optimization. For example, per The Verge in late 2025, competitors have focused on 'prompt caching' to retain users who are moving away from brute-force token consumption. Within the media and entertainment sector, the utility of this feature aligns with recent industry shifts toward AI-driven post-production. Per a June 2026 report from Variety, major studios are increasingly using multimodal models to automate the generation of descriptive metadata and localization scripts. Prior to caching, re-analyzing a 60-minute 4K video file for different targeted outputs — such as social media clips versus accessibility captions — required redundant and expensive token processing. Google’s 90% discount directly targets these repetitive workflows. Furthermore, the technical implementation of implicit caching mirrors recent updates found in open-source frameworks. Per TechCrunch in April 2026, the demand for 'agentic' workflows — where an AI performs multiple sequential tasks on a single dataset — has skyrocketed. By making the cache hit savings automatic for projects using Gemini 3.5 Flash and Flash-Lite, Google is attempting to lock in developers who require low-latency responses for consumer-facing video chatbots and interactive streaming experiences.


Read full article at docs.cloud.google.com

Related Articles

Substack: Google Gemma 4 12B Enables Fast Local Multimodal AI Inference
Bytebytego: AI inference engineering matures as open models drive 80% cost savings
BroadcastBridge: Telestream embeds 'Practical AI' across Vantage to automate broadcast bottlenecks

Newest

about 10 hours ago
Netapp: AutoMQ and Amazon FSx bypass Kafka's cost-latency trade-off with diskless WAL
about 10 hours ago
Eutelsat: Satellite resilience and hybrid delivery strategies redefine global television distribution
about 10 hours ago
Binadit: Hidden CDN data flows to US servers risk massive GDPR fines
about 10 hours ago
Cisco: Cisco updates WCCP technical guidelines to optimize content delivery efficiency
about 10 hours ago
Redsharknews: Apple releases rebuilt Siri AI in iOS 27 developer beta
about 10 hours ago
HarmonicInc: Harmonic integrates VOS media software with Red Hat OpenShift for telcos
about 10 hours ago
HarmonicInc: Harmonic launches AI Orchestration Service for unified live streaming workflows
about 10 hours ago
Brightcove: Brightcove integrates Zencoder workflows to streamline cross-platform video ingestion
about 10 hours ago
Advanced-television:
about 10 hours ago
Premio Inc: Premio bridges the edge AI hardware gap with x86 workstation rollout
about 10 hours ago
Amazon.jobs: Amazon hires for low-latency live streaming as sports portfolio grows
about 10 hours ago
HarmonicInc: Streaming shifts from growth to profit via hybrid models and AI
about 10 hours ago
Cloudprice: Google Cloud debuts G2 instance for NVIDIA L4-powered video streaming
about 10 hours ago
ProVideoInstruments: ProVideoInstruments launches HEVC encoder to slash IPTV bandwidth by 75%
about 10 hours ago
slashCAM: AJA KONA IP25 integrates with Colorfront for uncompressed ST 2110 workflows
about 10 hours ago
HarmonicInc: FCC Upper C-Band reclamation forces broadcasters toward IP and hybrid alternatives
about 10 hours ago
Substack: Entravision ad-tech segment revenue surges 204% as Smadex offsets media decline
about 10 hours ago
HarmonicInc: Tier-1 broadcaster cuts bandwidth costs 68% via satellite-to-IP migration
about 10 hours ago
Advanced-television: GSMA report warns of €205 billion mobile network investment shortfall
about 10 hours ago
workable:

Upcoming Events

Jun
17–19
Content Tokyo 2024https://www.content-tokyo.jp/ja-jp.html
Jun
22–25
CineEuropehttp://www.filmexpos.com/cineeurope/
Jun
22–26
Cannes Lionshttps://www.canneslions.com/
Jun
24–26
MWC Shanghaihttps://www.mwcshanghai.com/
Aug
19–22
Beijing International Radio, TV & Film Exhibition (BIRTV)www.birtv.com
View all events →

Top Sources

  1. 1.wTVision156
  2. 2.MSN105
  3. 3.BoxxTech80
  4. 4.Calendly71
  5. 5.Sportsvideo64
  6. 6.Sports Video Group58
  7. 7.Advanced Television56
  8. 8.AdExchanger50
Full leaderboards →

Newest

about 10 hours ago
Netapp: AutoMQ and Amazon FSx bypass Kafka's cost-latency trade-off with diskless WAL
about 10 hours ago
Eutelsat: Satellite resilience and hybrid delivery strategies redefine global television distribution
about 10 hours ago
Binadit: Hidden CDN data flows to US servers risk massive GDPR fines
about 10 hours ago
Cisco: Cisco updates WCCP technical guidelines to optimize content delivery efficiency
about 10 hours ago
Redsharknews: Apple releases rebuilt Siri AI in iOS 27 developer beta
about 10 hours ago
HarmonicInc: Harmonic integrates VOS media software with Red Hat OpenShift for telcos
about 10 hours ago
HarmonicInc: Harmonic launches AI Orchestration Service for unified live streaming workflows
about 10 hours ago
Brightcove: Brightcove integrates Zencoder workflows to streamline cross-platform video ingestion
about 10 hours ago
Advanced-television:
about 10 hours ago
Premio Inc: Premio bridges the edge AI hardware gap with x86 workstation rollout
about 10 hours ago
Amazon.jobs: Amazon hires for low-latency live streaming as sports portfolio grows
about 10 hours ago
HarmonicInc: Streaming shifts from growth to profit via hybrid models and AI
about 10 hours ago
Cloudprice: Google Cloud debuts G2 instance for NVIDIA L4-powered video streaming
about 10 hours ago
ProVideoInstruments: ProVideoInstruments launches HEVC encoder to slash IPTV bandwidth by 75%
about 10 hours ago
slashCAM: AJA KONA IP25 integrates with Colorfront for uncompressed ST 2110 workflows
about 10 hours ago
HarmonicInc: FCC Upper C-Band reclamation forces broadcasters toward IP and hybrid alternatives
about 10 hours ago
Substack: Entravision ad-tech segment revenue surges 204% as Smadex offsets media decline
about 10 hours ago
HarmonicInc: Tier-1 broadcaster cuts bandwidth costs 68% via satellite-to-IP migration
about 10 hours ago
Advanced-television: GSMA report warns of €205 billion mobile network investment shortfall
about 10 hours ago
workable:

Upcoming Events

Jun
17–19
Content Tokyo 2024https://www.content-tokyo.jp/ja-jp.html
Jun
22–25
CineEuropehttp://www.filmexpos.com/cineeurope/
Jun
22–26
Cannes Lionshttps://www.canneslions.com/
Jun
24–26
MWC Shanghaihttps://www.mwcshanghai.com/
Aug
19–22
Beijing International Radio, TV & Film Exhibition (BIRTV)www.birtv.com
View all events →

Top Sources

  1. 1.wTVision156
  2. 2.MSN105
  3. 3.BoxxTech80
  4. 4.Calendly71
  5. 5.Sportsvideo64
  6. 6.Sports Video Group58
  7. 7.Advanced Television56
  8. 8.AdExchanger50
Full leaderboards →