ðð¨ð«ð«ð² ððð¨ð®ð ðð ð«ðð©ð¥ððð¢ð§ð ð£ð¨ð ðð¬ ððð¯ðð¥ð¨ð©ðð«, ð ð°ð¨ð«ð«ð² ðð¬ ðð¨ ðððð®ð ð ðð¡ð ðð¨ðð ð°ð«ð¢ðððð§ ðð² ðð Hear me out , before you reach to any conclusion ! If AI starts spitting out more and more code, the real bottleneck wonât be writing code, itâll be: â³ Debugging code you didnât write â³ Understanding why AI took a certain approach â³ Maintaining code when requirements change â³ Making it production-ready (secure, performant, compliant) ððð«ðâð¬ ðð¡ð ððððð¡: AI-generated code often looks neat and confident but can hide subtle bugs, performance issues, or architectural mismatches. Debugging such code is harder because: â³ No mental map â You didnât write it, so you donât know its design intent. â³ Overconfidence bias â People assume âAI wrote it, so itâs correctâ (spoiler: it isnât). â³ Hidden dependencies â AI may use patterns or libraries that donât fit your landscape. This means the future developerâs value shifts to: â³ Reading and understanding unfamiliar code fast â³ Designing test strategies to catch AI mistakes â³ Building guardrails so AI output stays within company standards â³ Acting as a human validator between âworks in demoâ and âworks in productionâ. P.S : main toh aise hi theek hu In SAP/ABAP world, for example, AI might churn out an OData service or CDS view in secondsâbut a human still needs to debug the weird runtime error when it hits real data, check for performance drains on HANA, and ensure it passes security audits. ðð¨â¦ ð¢ð«ð¨ð§ð¢ððð¥ð¥ð², ðð ð°ð¢ð¥ð¥ ð¦ðð¤ð ðððð®ð ð ð¢ð§ð ðð§ ðð¯ðð§ ð¦ð¨ð«ð ð¯ðð¥ð®ððð¥ð ð¬ð¤ð¢ð¥ð¥ ðð¡ðð§ ð©ð®ð«ð ðð¨ðð¢ð§ð . If you want, I can give you a playbook for debugging AI-generated code in SAP and general developmentâso you stay future-proof even if AI writes 80% of the code. P.S :Image is to catch your attention ð
Software Development
Explore top LinkedIn content from expert professionals.
-
-
One of the most promising directions in software engineering is merging stateful architectures with LLMs to handle complex, multi-step workflows. While LLMs excel at one-step answers, they struggle with multi-hop questions requiring sequential logic and memory. Recent advancements, like O1 Previewâs âchain-of-thoughtâ reasoning, offer a structured approach to multi-step processes, reducing hallucination risksâyet scalability challenges persist. Configuring FSMs (finite state machines) to manage unique workflows remains labor-intensive, limiting scalability. Recent studies address this from various technical approaches: ð. ðððððð ð¥ð¨ð°: This framework organizes multi-step tasks by defining each stage of a process as an FSM state, transitioning based on logical rules or model-driven decisions. For instance, in SQL-based benchmarks, StateFlow drives a linear progression through query parsing, optimization, and validation states. This configuration achieved success rates up to 28% higher on benchmarks like InterCode SQL and task-based datasets. Additionally, StateFlowâs structure delivered substantial cost savingsâlowering computation by 5x in SQL tasks and 3x in ALFWorld task workflowsâby reducing unnecessary iterations within states. ð. ðð®ð¢ððð ððð§ðð«ððð¢ð¨ð§ ð ð«ðð¦ðð°ð¨ð«ð¤ð¬: This method constrains LLM output using regular expressions and context-free grammars (CFGs), enabling strict adherence to syntax rules with minimal overhead. By creating a token-level index for constrained vocabulary, the framework brings token selection to O(1) complexity, allowing rapid selection of context-appropriate outputs while maintaining structural accuracy. For outputs requiring precision, like Python code or JSON, the framework demonstrated a high retention of syntax accuracy without a drop in response speed. ð. ððð-ððð (ðð¢ðð®ððð¢ð¨ð§ðð¥ ðð°ðð«ðð§ðð¬ð¬-ððð¬ðð ðð¥ðð§ð§ð¢ð§ð ): This framework combines two LLM agentsâLLMgen for FSM generation and LLMeval for iterative evaluationâto refine complex, safety-critical planning tasks. Each plan iteration incorporates feedback on situational awareness, allowing LLM-SAP to anticipate possible hazards and adjust plans accordingly. Tested across 24 hazardous scenarios (e.g., child safety scenarios around household hazards), LLM-SAP achieved an RBS score of 1.21, a notable improvement in handling real-world complexities where safety nuances and interaction dynamics are key. These studies mark progress, but gaps remain. Manual FSM configurations limit scalability, and real-time performance can lag in high-variance environments. LLM-SAPâs multi-agent cycles demand significant resources, limiting rapid adjustments. Yet, the research focus on multi-step reasoning and context responsiveness provides a foundation for scalable LLM-driven architecturesâif configuration and resource challenges are resolved.
-
Ideation generates significant value in design. Yet businesses frequently hesitate to invest time, money, and personnel in this phase, often overlooking it as it falls between the strategic planning and development stages for many companies. In the double-diamond process, while many teams include it in the development phase, most do not continue refining their ideas from the problem-solving stage. Instead, they jump directly into coding, which is often the least adaptable testing method for validating the effectiveness of their design. Here's why teams run into ideation roadblocks: â Quantifying the value of creative ideas is difficult â There is no assigned team to iterate on ideas â Tracking time and resources in ideation is hard â Ideation's long-term value complicates cost analysis â Lack of metrics makes it hard to judge costs â Balancing idea quantity and quality is challenging â Cultural views may misjudge ideation costs Design teams should emphasize that generating ideas can be cost-effective and valuable. Providing measurable results makes their argument stronger and encourages collaboration among different groups. We built Helio for this purpose. I've seen a single checkbox lead to millions of dollars in lost revenue. The issue wasn't that it was coded incorrectly or a bug. Rather, it was because the development process overlooked the user's needs, failing to recognize why the checkbox was necessary in the first place. What's your experience? #productdesign #productdiscovery #innovation #uxresearch
-
C++ move semantics might be the most OVERHYPED "optimization" technique of the last decade. After debugging move-related crashes for 3 days straight last week, I realized something: What we gain in micro-optimizations, we often lose 10x in debugging time. Here's the uncomfortable truth about move semantics: ⢠They can turn a 1-line function call into a multi-dimensional nightmare of dangling pointers and resource leaks ⢠Understanding the full implications requires almost quantum-mechanical thinking - observing the code literally changes its behavior! ⢠That "massive performance boost" often amounts to fractions of milliseconds in real-world code Last month, our team spent 26 engineering hours tracking down a mysterious crash in our embedded system. The culprit? An innocent std::move that interacted badly with a custom allocator. When we replaced it with a simple copy operation, not only did the system become rock-solid stable, but our benchmarks showed only a 0.3% performance difference. The reality? In many cases, the cognitive overhead and debugging complexity of move semantics simply isn't worth the theoretical performance gain. Senior engineers understand this truth: maintainable code that works consistently beats clever optimizations almost every time. What's your experience? Have you spent countless hours debugging move semantics, only to question if it was worth the effort? #Cplusplus #EmbeddedSystems #MoveSemantics #SoftwareEngineering #PerformanceOptimization
-
I see many posts by SDEs making fun that how come 1 small code can break the order delivery functionality for an entire day. I am not defending this, but let me share 2 use case: Use case1: ------------ Suppose, there are 100s of Microservices presents and all are pushing independently. Sometime what happen is, the code which you have written is not breaking your functionality but its impacting your clients (downstream or upstream), just for example: in response you added new ENUM value in one of the field but client is still on the old version and unware of the new ENUM, so its possible that, your client will face some exception but at your layer everything is fine. So, no UT, FT, REGRESSION TESTING failure at your end. No spike in monitoring during One Machine in your component. And you proceed futher and pushed the code in all the machines, assuming everything is fine. And if that client is in Critical path, it might break the critical functionality like Order delivery too. But for the whole day? I dont think so, but it might take some time for your client to identify whats the issue (depending upon how quickly they identify there is a spike in errors), also say they are doing the push, so first thing they will do is, revert their own build, but thats not the issue, so next they will do is try to see which dependent team has done the push, and then if they find the component which actually caused the issue, they will involve their team members and ask to rollback the changes, which again require some time. So for whole day, can this be impact present, it depends upon company to company, how quickly they identify the spike and correlate the error with dependent component and how quickly they can revert. And say, if this issue requires DB correction, then atleast for impacted users, because of this issue, it might impact for the whole day. UseCase2: ------------- You have added new functionality but with Feature flag OFF Since this is new code and its not taking any traffic, many times what happen is FT is not written for those. And code goes live, passing all UT, FT, One Machine and then full deployment. Now, the day comes when you are enabling this feature ON. Now, for enabling the feature, generally you dont have to run the UT, FT etc. But One Machine monitoring do happens, but there is a catch too, some time, in 1 machine monitoring spike is not caught, because of less traffic or during that 2-3 hrs of 1 machine monitoring the required traffic has not come or sufficient to catch the issue. And say we proceed with the full rollout of the feature flag ON to all the machines. So its possible that new functionality can impact the existing critical flow and breaks say ORDER Delivery functionality. But for the whole day? i dont this so, but again depends upon company to company how quickly they can identify the spike and revert the changes. I might be wrong, but pls do share your views. #softwareengineer
-
Good summary paper on RL, the promised land to get apps into production. The focus of RL is moving from behavioral alignment to cognitive improvement, transforming Large Language Models into Large Reasoning Models (LRMs). The new goal is not just to refine a modelâs final output but to fundamentally incentivize and improve its ability to perform long-form, step-by-step reasoning. This new approach, often called Reinforcement Learning with Verifiable Rewards (RLVR), is being applied in complex domains like advanced mathematics and coding. Milestone models such as OpenAI o1 and DeepSeek-R1 have demonstrated its power. By rewarding the model for successfully completing complex, multi-step problems, RL incentivizes the entire chain of thoughtâthe planning, self-correction, and logical stepsârequired to reach the correct answer. The major domains of application for RL in large reasoning models include: 1. Coding Tasks RL is widely used in coding due to its inherent verifiability, making it essential for improving code reasoning and developing autonomous, closed-loop coding agents. ⢠Code Generation The goal is to adjust the LLM generation distribution to meet specific coding requirements. This includes tackling challenging areas such as competitive programming, domain-specific code (like Text-to-SQL and formal proofs), electronic design automation (EDA), and chart-to-code generation. ⢠Software Engineering RL assists in real-world software development scenarios by improving code quality and overall reliability. This includes enhancing automated code repair, optimizing code for efficiency, maintainability, and security, and facilitating repository-level code generation that addresses complex cross-file dependencies by incorporating reflection mechanisms. ⢠Agentic Coding This advances code generation from single-step outputs to multi-round interactions, equipping LLMs with execution and verification abilities for continuous policy optimization on benchmarks simulating real software evolution. 2. Agentic Tasks RL enables LLMs to master the use of external tools and manage multi-round interactions, leading to more adaptive and autonomous agent behaviors. ⢠Tool-Integrated Reasoning (TIR) This tight coupling of natural language reasoning with tool execution environments allows models to generate, execute, and verify intermediate code or program outputs, thereby improving verifiability and reducing errors. ⢠Search Agents RL trains models to function as search agents by integrating structured prompting with search environments (simulated or online). This includes training deep research agents capable of gathering information from various sources to solve complex, long-horizon problems. ⢠Browser and GUI Agents RL is applied to train agents for web-browsing and Graphical User Interface (GUI) tasks. These methods often use rule-based rewards tailored for accurate action selection and correct argument formulation. https://lnkd.in/geSxtxUk
-
SWE-RL: approach to Scale Reinforcement Learning based LLM reasoning for Software Engineering tasks. This paper introduces SWE-RL, the first approach to scale RL-based LLM reasoning for real-world software engineering. Leveraging a lightweight rule-based reward (e.g., the similarity score between ground-truth and LLM-generated solutions), SWE-RL enables LLMs to autonomously recover a developerâs reasoning processes and solutions by learning from extensive open-source software evolution dataâââthe record of a softwareâs entire lifecycle, including its code snapshots, code changes, and events such as issues and pull requests. ð¢ðð²ð¿ðð¶ð²ð - create a seed RL dataset from GitHub PR data, including issue descriptions, code context, and oracle patches. - policy LLM generates code changes through reasoning. - For correctly formatted responses, the reward is calculated based on the match between the predicted and the oracle patch; incorrectly formatted responses are assigned a negative reward. GRPO is used for policy optimization ð¥ð®ð ð½ðð¹ð¹ ð¿ð²ð¾ðð²ðð ð±ð®ðð® ð°ðð¿ð®ðð¶ð¼ð» - collected git clones and GitHub events are transformed into self-contained PR instances via decontamination, aggregation, relevant files prediction, and filtering ð¥ð²ðð®ð¿ð± ðºð¼ð±ð²ð¹ð¶ð»ð´ i) Dataset preparation - based on specific heuristics, curated high-quality PR seeds - For each seed, extracted issue descriptions and code context, including all changed files and some relevant but unchanged files, which are then converted to input prompts for the policy LLM ii) Implementation - bootstrap the policy LLM with a prompt template, where given an issue description and the corresponding code context, the policy LLM needs to generate search/replace edits to fix this issue through reasoning. - training approach conditions the model on the complete context of each file, implicitly forcing it to identify detailed fault locations before generating repair edits ððð®ð¹ðð®ðð¶ð¼ð» i) Setup - Llama3-SWE-RL-70B is trained on top of Llama-3.3â70B-Instruct using SWE-RL for 1,600 steps with a 16k context window ii) Results - Llama3-SWE-RL-70B, achieves a 41.0% solve rate on SWE-bench Verifiedâ(a human-verified collection of real-world GitHub issues) best performance among medium-sized language models (<100B) and even comparable to leading proprietary models like GPT-4o. - shown that that applying RL solely to real-world SE tasks, such as issue solving, can already enhance an LLMâs general reasoning abilities, enabling it to improve on out-of-domain tasks like math, code generation, and general language understanding ðð¹ð¼ð´: https://lnkd.in/eQ3__-gk ð£ð®ð½ð²ð¿: https://lnkd.in/e3kJwDbW ðð¼ð±ð²: https://lnkd.in/eDpTP6Pp
-
Last week I asked students to brainstorm which functions were involved at each phase of the Product Lifecycle. One group noted that "everyone" should be included in the Ideation phase. I normally don't love catch all answers, but I truly believe having "everyone" involved in product discovery makes for better software development. I love seeing people's eyes go wide when I tell them I've invited non-product development roles to early brainstorm sessions: like lawyers, risk team members, sales & marketing, etc. So often PMs, Designers and Engineers get caught in their small bubble of product development and fail to bring unique or differing perspectives into our process (because lawyers are conservative, compliance teams don't use Figma, [insert countless other objections]), but the reality is so much magic happens when you bring fresh eyes into a new problem space. So my prompt to you - next time you hold a product brainstorm, bring in some new faces. Try bringing a smattering of everyone* *When I say everyone, I don't literally mean everyone (I'm not crazy). I mean bring reps from a wide variety of functions.
-
Sometimes adopting a best practice for your organization means adopting the philosophy behind it, but not the practice itself. Thatâs how I view Googleâs 20% time practice and its application at RightHand Robotics, Inc. For me, the goal behind 20% time was creating space for good ideas to surface. Itâs based on the belief that good ideas come from anywhere and anyone. It gives people the latitude to be able develop their ideas, even if they have nothing to do with their assigned role. That concept has been a core philosophy Iâve brought with me to RightHand Robotics, Inc. But our approach is less about time allocation and more about making sure there are open channels for generating and developing great ideas. There are a few different ways we do that. Weâre a Slack-heavy culture, which means we have many different channels for surfacing and documenting new ideas or observations. We also keep a âwish listâ of ideas. These are ideas we think may be worth pursuing, but perhaps itâs not the right time yet. They could be marketing ideas, product ideas, engineering ideas, or even process improvement ideas - which we think are great, but which we canât pursue at the time they arise. Keeping these repositories allows us to track and continually revisit good ideas, so that weâre more likely to unearth the right idea at the right time. Google taught me that itâs important for an organization to leave room for idea generation. But the methodology matters less than the principle itself. Regardless of your approach, empowering every member of your team to be an idea generator will leave the door open for innovation.
-
The Evolution of Code: From Manual Craft to AI Collaboration Software development is changing rapidly! We're seeing three main approaches emerge: Manual Coding: The traditional way. Developers write code line-by-line, requiring deep understanding and control. Great for complex logic & robust systems, but can be slow. Debugging is efficient because you know the code inside out. Vibe Coding: Using AI (LLMs) to generate code from natural language prompts. Fast for quick prototypes and accessible to non-coders. However, understanding the generated code is limited, making debugging a significant challenge. Hybrid Coding: The best of both worlds. Developers use AI tools for assistance (code completion, generation, suggestions) but remain actively involved in reviewing, understanding, and refining the code. Why Hybrid is Gaining Traction: Hybrid coding boosts productivity by automating repetitive tasks while keeping the human developer firmly in the driver's seat. Crucially, developer understanding of the logic is maintained. This is KEY for effective debugging. When issues arise, a developer who understands the (even AI-assisted) code can quickly identify root causes and implement robust fixes, unlike the "black box" problem of pure vibe coding. Hybrid coding isn't just about speed; it's about smarter, more maintainable, and more debuggable code. It feels like the most practical path forward, blending AI power with essential human expertise. Comments? #coding #softwaredevelopment #AI #LLMs #hybridcoding #vibe #manualcoding #programming #debugging #futureoftech #Aryaka