Challenges
Sparse or delayed conversion signals reduce the effectiveness of predictions.
Manual or rule-based bidding strategies lead to inefficiencies.
Scaling retrieval and matching across large ad inventories is technically demanding.
Difficulty in adapting to fast-changing user behavior and market trends.
Deep Learning Models for CTR and CVR Estimation – Enhancing predictive accuracy for Click-Through Rate (CTR) and Conversion Rate (CVR) models.
Challenge: Traditional CTR/CVR prediction models struggled to capture complex, non-linear patterns in user behavior and ad interactions, limiting their accuracy.
Solution: Leverage advanced deep learning architectures to learn richer representations from large-scale click and conversion data. By using deep neural networks, the model can better capture subtle relationships and patterns, improving its predictive capability for both clicks and conversions.
Outcome: Significantly improved predictive accuracy more than +20% for CTR and more than +10% for CVR, leading to more effective ad ranking and targeting. This results in higher engagement and conversion rates, as the system more reliably presents ads users are likely to click and act upon.
Bidding Strategy Optimization – Incorporating advanced bidding strategies such as Target CPA and Max Conversions to improve advertiser outcomes.
Challenge: Achieving advertisers’ performance goals (e.g. specific cost-per-action or maximum conversions within budget) is difficult with manual or simplistic bidding. Without optimization, ad spend may be inefficient, either overshooting budgets or missing potential conversions.
Solution: Implement automated bidding strategies like Target CPA (Cost Per Acquisition) and “Maximize Conversions.” These strategies use machine learning to adjust bids in real time based on the likelihood of conversion, allocating budget more effectively. The system continuously learns and updates bids to hit the desired CPA or get the most conversions for the budget.
Outcome: Improved advertiser outcomes and ROI. The multiple bidding strategies gives more options to advertisers and ensures budgets are used efficiently to maximize valuable actions (conversions), helping advertisers achieve target CPA goals and obtain more conversions without manual intervention.
Integration of the Vespa Search Framework – Adopting Vespa for ads candidation, enabling high-speed and scalable ad retrieval.
Challenge: Retrieving relevant ads quickly from a large inventory can be technically challenging, especially as the system scales. Traditional ad retrieval solutions might introduce high latency or struggle with the volume of ad data, hurting the user experience and system performance.
Solution: Integrate the Vespa search platform for ad candidate retrieval. Vespa, a high-performance search engine, is used to index ads and handle queries efficiently. By leveraging Vespa’s scalability and speed, the system can retrieve a set of relevant ad candidates in milliseconds, even as the catalog grows.
Outcome: High-speed, scalable candidate ads retrieval. Increased RPM between +3% to 12% depending on the supply, Increased distinct campaigns reach between +12% to 20% depending on the supply, +3% increased spend for new campaigns with competitive price and scale.
Incremental Learning for Large-Scale CTR Prediction – Developing models that continuously update with new data without requiring full retraining.
Challenge: In a dynamic ad environment, user preferences and trends change rapidly. Traditional models require periodic full retraining on fresh data to stay accurate, which is time-consuming and computationally expensive at large scale. Delays in updating models can lead to outdated predictions and missed opportunities.
Solution: Employ incremental learning techniques that update the CTR prediction model continuously as new data (impressions, clicks, conversions) streams in. Instead of retraining from scratch, the model parameters are adjusted in small steps with each batch of new data. This might involve online learning algorithms or fine-tuning of an existing model on recent data, avoiding a full retrain.
Outcome: The CTR model remains up-to-date and accurate without the cost of full retraining cycles. It adapts quickly to changing user behavior or ad content, maintaining strong performance. This continuous learning approach ensures scalability for large-scale systems by reducing downtime and computational load associated with model retraining.
Exploration vs. Exploitation & Model Uncertainty Estimation – Balancing new opportunity discovery with performance optimization through uncertainty-aware modeling.
Challenge: The ad recommendation system must balance exploration (trying new or less-known ads/content to discover potentially high performers) with exploitation (prioritizing ads known to perform well). Pure exploitation can miss emerging opportunities, while too much exploration can reduce immediate performance. Additionally, the system often doesn’t know how confident it should be in its predictions (model uncertainty), making it hard to decide when to take risks on new content.
Solution: Introduce an uncertainty-aware approach to manage exploration vs. exploitation. The model estimates its confidence or uncertainty in CTR/CVR predictions for each ad. When uncertainty is high, the system can choose to explore—showing some new or unproven ads to gather more data. Conversely, when the model is confident (low uncertainty), it leans toward exploitation, showing historically high-performing ads. Techniques such as Bayesian neural networks or ensemble models can be used to quantify uncertainty, and multi-armed bandit algorithms or probabilistic decision policies balance the exploration-exploitation trade-off using that uncertainty information.
Outcome:A well-balanced advertising strategy that discovers new high-performing ads or targeting strategies (through controlled exploration) while still maintaining strong overall performance (through exploitation of known winners). The model’s ability to estimate uncertainty leads to smarter decisions about when to try something new versus when to rely on proven content, ultimately improving long-term engagement and conversion metrics without sacrificing short-term results.
Self-Supervised Learning for User-Context Modeling – Leveraging user interactions and contextual signals to improve ad targeting.
Challenge: Effectively capturing user context (behavior, preferences, current session context) is difficult, especially when labeled data on user intent is limited. Relying only on traditional supervised learning signals might not utilize the full richness of user interaction data, leaving potential understanding of user preferences untapped.
Solution: Utilize self-supervised learning to model user context. The system can create pretext tasks or objectives using abundantly available implicit user interaction data (such as click sequences, dwell times, or content consumption patterns) without the need for manual labels. For example, the model might predict the next content a user will engage with or reconstruct parts of a user’s interaction sequence, learning a latent representation of user context in the process. This learned user embedding or context representation can then be used by the ad targeting model to better match ads to users.
Outcome: Richer user-context models that significantly improve ad personalization. By leveraging self-supervised signals, the system gains a deeper understanding of user interests and context, leading to more relevant ad recommendations. This results in higher user engagement and click-through rates, as ads are served in a contextually appropriate and personalized manner.
Mixture of Experts for CTR & CVR Model Estimation – Implementing Mixture of Experts (MoE) models to improve prediction accuracy across different ad categories and extract model uncertainty.
Challenge: A single monolithic CTR/CVR model may not perform optimally across all ad categories or audience segments, since different categories can exhibit very different user response patterns. Additionally, it’s hard for one model to know when it’s likely to be wrong (uncertainty) across diverse scenarios.
Solution: Adopt a Mixture of Experts architecture for CTR and CVR prediction. In an MoE model, multiple expert sub-models are trained, each specializing in particular data subsets (for example, an expert for each ad category or user segment), and a gating network learns to weight their contributions for each prediction. This specialization means each expert can capture patterns specific to its domain. The diversity of expert outputs also provides a signal for uncertainty — if the experts disagree on a prediction, it indicates higher uncertainty.
Outcome: Improved prediction accuracy across heterogeneous ad categories and better insight into prediction confidence. The MoE approach yields more tailored predictions (since each category is handled by an expert tuned to its patterns), which boosts overall CTR/CVR performance. It also offers the ability to gauge model uncertainty by observing the agreement among experts, aiding in decision-making (such as triggering exploration when uncertainty is high).
Knowledge Distillation for Transfer Learning – Applying knowledge distillation techniques to facilitate efficient transfer learning across multiple domains, reducing model complexity while maintaining high performance.
Challenge: Deploying separate complex models for multiple domains or platforms in advertising (e.g., different websites, apps, or ad formats) can be resource-intensive and hard to maintain. Training a large model from scratch for each new domain is inefficient, yet a smaller simple model might not achieve needed performance if trained alone on limited data.
Solution: Leverage knowledge distillation as a form of transfer learning. First, train a powerful teacher model (or use an existing high-performing model) on a large or combined dataset encompassing multiple domains. Then use this model to teach a smaller student model, transferring the learned knowledge. The student model is trained to replicate the teacher’s predictions (or internal representations), thus absorbing the expertise of the teacher but with far fewer parameters. This process can be applied when moving to a new domain: the student model for the new domain benefits from the generalized knowledge of the teacher, instead of learning from scratch.
Outcome: Efficient cross-domain model deployment with reduced complexity and sustained performance. The distilled student models are lightweight (faster inference, less memory) yet preserve high accuracy thanks to the transferred knowledge. This enables maintaining high ad prediction and recommendation performance across multiple domains or platforms without the overhead of training and serving large models for each one.
Results
Increased campaign ROI through personalized, efficient ad delivery.
Real-time targeting with sub-100ms inference latency.
Broader reach across distinct user segments and ad types.
Scalable AI infrastructure without prohibitive compute cost.
Contextual & Behavioral Targeting with Multi-Modal AI: AI integrates text, images, and browsing behavior to predict ad relevance dynamically