Amazon is a âsearchâ platform. 50-70% of shoppers across categories are searchers, not browsers. Unlike âbrowseâ heavy platforms like Nykaa, Myntra, Cred and others, journeys start and end with a search or two. Being visible on searches is the game. The problem is that all top listings are advertisements you need to bid for. This performance marketing is addictive because one, it gives quick returns and two, reducing spends has direct impact on revenue. But it is expensive if not done efficiently or if done in vanity. Thousands of brands have tried to gain traction through AMS only and ended up in the burial ground. Itâs a death spiral. The only way one can survive selling on Amazon is if a significant portion of sales comes organically. And for that one needs to rank higher organically. Amazon uses the A10 algorithm to rank products according to relevance to search. Itâs an almost black box but some factors it seems to assign weights to are: 1. Search relevance: it checks keywords in the front-end, back-end, descriptions and rest of listing including richness of A+ content. 2. Consistency of sales velocity: OOS affects it badly. Fluctuations affect it badly. Grow steady and fast, preferably steady. 3. External signals: ratings, reviews and external trafficâs weight has been increased in A10 compared to A9. Not much else matters if your ratings are poor. Ratings affect the factors that follow next. A double whammy! 4. Click through rates: What % of people who saw your listing clicked on it. A function of first listing card and delivery time among others. 5. Conversion rates: What % of people who saw your listing went on to buy. 6. Seller Authority: your karma matters. Keep on doing the right things and the system rewards. Fall in the trap of a quick buck and you back a couple of steps.
Performance Optimization Techniques
Explore top LinkedIn content from expert professionals.
-
-
Quantizing is not enough when fine-tuning a model! Even in the lowest precisions, most of the memory is going to be taken by the optimizer state when training that model! One great strategy that emerged recently is QLoRA. The idea is to apply LoRA adapters to quantized models. When the optimizer state is going to be computed, it is only going to be done on the adapter parameters instead of the whole model, and this will save a large amount of memory! The parameters are converted from BFloat16 / Float16 to 4-bits normal float. This quantization strategy comes from the realization that trained model weights tend to be Normal distributed, and we can create quantization buckets using that fact. This allows the compression of the model parameters without too much information loss. When we quantize a model, we need to capture the quantization constants to be able to dequantize the model. We usually capture them in Float32 to avoid as much dequantization error as possible. To compress further the model, we perform a double quantization to quantize the quantization constants to Float8. During the forward pass, because the input tensors are in BFloat16 / Float16, we need to dequantize the quantized parameters to perform the operations. However, during the backward pass, the original weights do not contribute to the computations, and they can remain quantized.
-
In the last three months alone, over ten papers outlining novel prompting techniques were published, boosting LLMsâ performance by a substantial margin. Two weeks ago, a groundbreaking paper from Microsoft demonstrated how a well-prompted GPT-4 outperforms Googleâs Med-PaLM 2, a specialized medical model, solely through sophisticated prompting techniques. Yet, while our X and LinkedIn feeds buzz with âsecret prompting tipsâ, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results. https://lnkd.in/g7_6eP6y In this AI Tidbits Deep Dive, I outline six of the best and recent prompting methods: (1) EmotionPrompt - inspired by human psychology, this method utilizes emotional stimuli in prompts to gain performance enhancements (2) Optimization by PROmpting (OPRO) - a DeepMind innovation that refines prompts automatically, surpassing human-crafted ones. This paper discovered the âTake a deep breathâ instruction that improved LLMsâ performance by 9%. (3) Chain-of-Verification (CoVe) - Meta's novel four-step prompting process that drastically reduces hallucinations and improves factual accuracy (4) System 2 Attention (S2A) - also from Meta, a prompting method that filters out irrelevant details prior to querying the LLM (5) Step-Back Prompting - encouraging LLMs to abstract queries for enhanced reasoning (6) Rephrase and Respond (RaR) - UCLA's method that lets LLMs rephrase queries for better comprehension and response accuracy Understanding the spectrum of available prompting strategies and how to apply them in your app can mean the difference between a production-ready app and a nascent project with untapped potential. Full blog post https://lnkd.in/g7_6eP6y
-
Excited to share our production guide for building RAG-based LLM applications where we bridge the gap between OSS and closed-source LLMs. - ð» Develop a retrieval augmented generation (RAG) LLM app from scratch. - ð Scale the major workloads (load, chunk, embed, index, serve, etc.) across multiple workers. - â Evaluate different configurations of our application to optimize for both per-component (ex. retrieval_score) and overall performance (quality_score). - ð Implement LLM hybrid routing approach to bridge the gap b/w OSS and closed-source LLMs. - ð¦ Serve the application in a highly scalable and available manner. - ð¥ Share the 1st order and 2nd order impacts LLM applications have had on our products and org. ð Links: - Blog post (45 min. read): https://lnkd.in/g34a9Zwp - GitHub repo: https://lnkd.in/g3zHFD5z - Interactive notebook: https://lnkd.in/g8ghFWm9 Philipp Moritz and I had a blast developing and productionizing this with the Anyscale team and we're excited to share Part II soon (more details in the blog post).
-
Introducing Insights in Chrome DevTools Performance panel! Many web developers know the power of the Chrome DevTools Performance panel, but navigating its wealth of data to pinpoint issues can be daunting. While tools like Lighthouse provide great summaries, they often lack the context of when and where issues occur within a full performance trace. On the Chrome team we're bridging this gap with the new "Insights sidebar" directly within the Performance panel. Read all about it: https://lnkd.in/gGd3bkPw This exciting feature integrates Lighthouse-style analysis right into your workflow. After recording a performance trace, the Insights sidebar appears, offering actionable recommendations. Crucially, it doesn't just list potential problems but highlights relevant events and overlays explanations directly on the performance timeline. Hover over an insight like "LCP by phase," "Render blocking requests" or "Layout shift culprits" to visually connect the suggestion to the specific moments in your trace. The sidebar covers key areas like Largest Contentful Paint (LCP) optimization (including phase breakdowns and request discovery), Interaction to Next Paint (INP) analysis (like DOM size impact and forced reflows), Cumulative Layout Shift (CLS) culprits, and general page load issues such as third-party impact and image optimization. It's designed to make performance debugging more intuitive by linking high-level insights to the granular data, helping you improve Core Web Vitals and overall user experience more effectively. Check out the Insights sidebar in the latest Chrome versions (it's been evolving since Chrome 131!). Itâs a fantastic step towards making complex performance analysis more accessible. Give it a try on your next performance audit! #softwareengineering #programming #ai
-
Thought you knew which #quantumcomputers were best for #quantum optimization? The latest results from Q-CTRL have reset expectations for what is possible on today's gate-model machines. Q-CTRL today announced newly published results that demonstrate a boost of more than 4X in the size of an optimization problem that can be accurately solved, and show for the first time that a utility-scale IBM quantum computer can outperform competitive annealer and trapped ion technologies. Full, correct solutions at 120+ qubit scale for classically nontrivial optimizations! Quantum optimization is one of the most promising quantum computing applications with the potential to deliver major enhancements to critical problems in transport, logistics, machine learning, and financial fraud detection. McKinsey suggests that quantum applications in logistics alone are worth over $200-500B/y by 2035 â if the quantum sector can successfully solve them. Previous third-party benchmark quantum optimization experiments have indicated that, despite their promise, gate-based quantum computers have struggled to live up to their potential because of hardware errors. In previous tests of optimization algorithms, the outputs of the gate-based quantum computers were little different than random outputs or provided modest benefits under limited circumstances. As a result, an alternative architecture known as a quantum annealer was believed â and shown in experiments â to be the preferred choice for exploring industrially relevant optimization problems. Todayâs quantum computers were thought to be far away from being able to solve quantum optimization problems that matter to industry. Q-CTRLâs recent results upend this broadly accepted industry narrative by addressing the error challenge. Our methods combine innovations in the problemâs hardware execution with the companyâs performance-management infrastructure software run on IBMâs utility-scale quantum computers. This combination delivered improved performance previously limited by errors with no changes to the hardware. Direct tests showed that using Q-CTRLâs novel technology, a quantum optimization problem run on a 127-qubit IBM quantum computer was up to 1,500 times more likely than an annealer to return the correct result, and over 9 times more likely to achieve the correct result than previously published work using trapped ions These results enable quantum optimization algorithms to more consistently find the correct solution to a range of challenging optimization problems at larger scales than ever before. Check out the technical manuscript! https://lnkd.in/gRYAFsRt
-
PRODUCTION PERFORMANCE ACTIVITIES: 1. Productivity Improvement: OEE Monitoring â Tracks machine availability, performance, and quality. Line Balancing â Distributes tasks evenly to reduce idle time. Cycle Time Reduction â Minimizes time per unit. Kaizen â Ongoing small improvements by operators. Time & Motion Study â Removes wasted motion. Bottleneck Removal â Use VSM, Takt Time, TOC to fix constraints. 2. Quality Improvement: First Pass Yield â Measures products without rework. In-Process Checks â Ensures quality at every step. Root Cause Analysis â Identifies defect causes (5 Whys, Fishbone). Poka Yoke â Error-proofing devices or techniques. Defect Analysis â Tracks trends and types of defects. 3. Cost Reduction: Material Yield â Reduces scrap and wastage. Energy Monitoring â Cuts power cost per unit. Tool Life Management â Lowers tool costs and downtime. Inventory Control â Uses FIFO, Kanban to manage stock. Lean Waste Removal â Eliminates non-value-added work. 4. Delivery Improvement: OTD Tracking â Measures actual vs. planned delivery. Production Scheduling â Aligns with customer demand. SMED (Quick Changeover) â Reduces setup times. Logistics Optimization â Streamlines material flow. 5. Safety Enhancement: 5S Implementation â Clean, safe, and organized workplace. Safety Audits â Identify and reduce risks. Incident Tracking â Record and act on near-misses. Safety Kaizens â Employee-led safety improvements. 6. Morale & Engagement: Daily Meetings â Share targets and issues. Suggestion Scheme â Reward employee ideas. Skill Matrix â Enable cross-training and flexibility. Recognition Programs â Appreciate team achievements. 7. Environmental Improvement: Waste Segregation â Improve recycling. Utility Savings â Conserve water and energy. Emission Control â Reduce dust, noise, fumes. Green Practices â Use eco-friendly materials/processes. Supporting Activities: Hourly Boards & Dashboards â Monitor daily performance. Tier Meetings â Escalate and solve issues. SOP Audits â Ensure process compliance. Gemba Walks â Management on the floor to guide teams.
-
As quantum computers enter the utility era, with users executing circuits on 100 or more qubits, the performance of quantum computing software begins to play a prominent role. With this in mind, starting in 2020 Qiskit began the move from a mainly Python-based package to one utilizing the Rust programming language. What began with creating a highly optimized graph library in Rust (https://lnkd.in/eUdwqiMU), has now culminated in most of the circuit creation, manipulation, and transpilation code being fully ported over in the upcoming Qiskit 1.3. The fruits of this labor are easy to verify, with Qiskit outperforming competing SDKs in terms of runtime by an order of magnitude or more, as measured by rigorous benchmarks (https://lnkd.in/e98wniXY). However, algorithmic improvements also play a critical role in Qiskit's continued success. The team recently released a paper highlighting 18-months of effort optimizing the routing of circuits to match the topology of a target quantum device. This new LightSABRE method (https://lnkd.in/eMgm3TMG) is 200x faster than previous implementations, and reduces the number of two-qubit gates by nearly 20% compared to the original SABRE algorithm. In addition, LightSABRE, supports complex quantum architectures, disjoint connectivity graphs, and classical flow-control. The work the team puts into optimizing and enhancing Qiskit is one of the primary reasons why nearly 70% of quantum developers select Qiskit as their go-to quantum computing SDK.
-
Reinforcement Learning(RL) has quietly become one of the most important techniques shaping the evolution of LLM fine-tuning. For years, we optimized models through supervised learning, predicting the next token or minimizing cross-entropy loss. But as generative models scaled, we needed them to reason, align with intent, and adapt to human feedback in more complex ways. Thatâs where Reinforcement Learning (RL) entered the picture. At its core, RL is about interaction and feedback. An agent learns by interacting with an environment to maximize reward. In the context of large language models, the agent is the model itself. Each action is the next token it generates, and the reward is a signal derived from metrics or human preferences that measures how aligned the output is with the desired goal. Hereâs a quick technical primer on the RL methods now powering GenAI fine-tuning: 1. RL Fine-Tuning (RLFT) We adapt a pre-trained model to new objectives like truthfulness, coherence, and safety using policy gradient algorithms such as PPO (Proximal Policy Optimization). Instead of minimizing loss, the model improves through iterative reward-driven optimization. 2. Reinforcement Learning from Human Feedback (RLHF) Human preference data trains a Reward Model (RM), which then guides fine-tuning through PPO. RLHF was key in aligning early LLMs, making outputs more helpful, factual, and instruction-following. 3. Direct Preference Optimization (DPO) A newer, more efficient approach. DPO skips the Reward Model and the full RL loop. It reframes alignment as a direct optimization task, teaching the model to prefer human-approved responses through a simplified objective function. Itâs computationally stable, theoretically grounded in RL, and rapidly becoming a standard for GenAI alignment. Reinforcement Learning is no longer just a research concept. It is the foundation of how large language models learn to reason, align, and self-improve. â»ï¸ Share this with your network to spread learning ðFollow me for more data and AI insights
-
What are some of the effective ways to optimize your services and in turn reduce your overall Infra footprint ?   ð Benchmark throughput for your services. Profile CPU/memory resources to catch any major performance bottlenecks. Optimize your code as much as possible to maximize the throughput per instance. Adopt clean coding practices.   ð Implement caching at every stage of the request journey. Review & revise your caching strategies to ensure that frequently accessed data is cached , right from the browser to the datastores.   ð Configure load balancers to evenly distribute traffic across all the servers for optimal performance. No single server is over loaded.   ð Design your systems with asynchronous processing. With this approach servers can handle more concurrent requests, better utilize the resources & drastically reduce the latencies.   ð Optimizing databases play a key role in reducing latencies and improving the performance of the applications. Ensure frequently accessed columns are indexed, slow running queries are optimized, right configs are used for connection pools, caching high volume queries.   ð Optimize service to service payloads. Transmit only required data. Use the right formats for transmission. This reduces latency and improves your throughput.   ð Keep all the client/server versions in your tech stack to the latest stable builds. You will be surprised, performance of the newer versions could be much much better than the previous ones.  To run your infrastructure optimally is a consistent effort & focus. Its important to continuously monitor performance & adopt best practices to operate at peak efficiencies. ðð  #tech #myntra #womenintech #leadership