DZone Spotlight

Saturday, December 14 View All Articles »

Mastering Seamless Single Sign-On: Design, Challenges, and Implementation

By Rinku Mohan

Being a backend developer and having worked for years in the jungle of authentication and identity management, I can attest to the fact that implementing seamless SSO is often way more complex than it seems. I have witnessed how organizations struggle to integrate diverse applications, balance strict security requirements with user expectations, and maintain system performance across a heterogeneous enterprise environment. I want to share what I have learned about the design of SSO systems, understand the main protocols like OAuth2 and SAML, address challenges in integration (legacy infrastructures are the most tricky,) and eventually provide an authentication ecosystem that would balance the needs of security mandates with user convenience. The Shifting Landscape of Enterprise Authentication It was not that long ago that enterprise authentication seemed relatively straightforward: users logged into each application separately, creating a patchwork of credentials that were hard to manage and even harder to secure. As organizations grew and adopted more services — some on-premises, others in the cloud — this complexity ballooned. Users complained of “password fatigue,” administrators grappled with inconsistent access controls, and the business risked exposing vulnerabilities through mismanaged credentials. These challenges needed a paradigm shift, and SSO was an answer to this by providing the ability to authenticate once and then access a range of applications and services. Making SSO seamless, though, required a fundamental rethinking of how we approach identity in the enterprise. When I first began exploring SSO, I realized it was not just a convenience feature. Instead, it represents an integral part of modern security strategy. That makes SSO both a security, although it reduces the number of credentials users need to remember, thereby reducing the possible points of entry for an attack and a high-value target for attacks if implemented wrongly, with a single key used to unlock several doors. This ability to understand these tradeoffs is really important, and their how-to part is what much of my work has focused on uncovering. Understanding the Core Protocols: OAuth2 and SAML Any discussion of Single Sign-On has to start with the protocols underlying it. There are several, but two have emerged as being among the most common in an enterprise environment: OAuth2 and SAML. Understanding where each fits and in which circumstances they’re best used can lead you toward a solution that best meets your organizational needs. OAuth2: The Modern, Token-Based Approach Unlike its predecessor, OAuth2 targets authorization instead of authentication and is extensively used for mobile apps, single-page applications, and microservices. Instead of providing passwords to third-party components, OAuth2 uses tokens: short-lived, bounded credentials that convey partial, limited access to the resources of the owner. This will be advantageous in reducing the attack surface: in case of an attack, different credentials are not permanently stored in each application. For example, when users log into a third-party application using “Login with Google,” the behind-the-scenes exchange is usually done via OAuth2. The application never actually sees the user’s real password; it just gets a temporary token that Google issued on the user’s behalf. I have especially liked OAuth2 in a distributed environment where different services need to talk to each other without constantly asking the user for credentials. The flexibility and scalability make it a good fit for cloud-era architectures. With that flexibility comes some complexity. Implementing OAuth2 correctly means paying careful attention to token lifetimes, refresh flows, and the secure handling of client secrets. SAML: A Robust Solution for Enterprise Legacy and Beyond Security Assertion Markup Language (SAML) predates OAuth2 and uses XML-based assertions to pass identity and security information between identity providers and service providers. This protocol is deeply embedded in traditional enterprise settings that rely heavily on products like Active Directory, legacy HR systems, or older on-premises applications. Many of the enterprises I’ve worked with already have established SAML infrastructures, making it a natural choice to integrate with their existing identity ecosystem. With SAML, a typical scenario involves a user requesting access to a service, which then redirects them to a centralized identity provider. After authenticating, the identity provider returns a signed assertion that contains user identity and authorization information. The service provider then grants access. This process can enable seamless sign-on across multiple legacy applications, something I’ve repeatedly seen enterprises value as they transition into more modern infrastructures. Yet, SAML’s XML-based format and older design assumptions can be somewhat clunky compared to OAuth2, especially for those developers who have grown up on JSON and REST-based workflows. Designing SSO: Security, Performance, and User Experience When architecting an SSO solution, I always advise balancing three critical dimensions: security, performance, and user experience. This triad represents the heart of what “seamless” truly means. If one focuses too heavily on one dimension at the expense of others, the result can only be a flawed implementation — either too rigid and slow, too weak and vulnerable, or too confusing and frustrating for end-users. Security as a Foundation Security needs to be front and center. SSO, after all, centralizes access. If an attacker compromises the identity provider or improperly gains access to tokens, they may then be able to roam freely across many applications. I have made it a habit to enforce strict encryption standards, using HTTPS everywhere and opting for short-lived tokens that limit the window of opportunity for attackers. I also recommend adding MFA to your SSO workflows. MFA’s requirement for something beyond just a password — such as a one-time code delivered to the user’s mobile device — adds a layer of protection against those attacks that steal users’ credentials. Careful token validation on the part of the service provider, solid certificate management, and security audits conducted regularly minimize attack surfaces. Strong Performance Assurance Another important factor to consider is performance. Incompetent SSO integration may add latency, turning that supposedly nice improvement into users’ headaches when they have to wait for every single action after logging in. In my experience, performance optimization starts from proper caching-strategies-token caching, validation keys, and user attributes, which reduce the number of network round trips. Besides that, it is necessary to set reasonable session timeouts and balance loads across multiple identity servers to sustain high loads even at peak conditions. Monitoring your SSO infrastructure is critical. Over the years, I’ve learned that early detection of performance bottlenecks — such as slow token issuance or identity provider downtime — prevents larger issues. Proactive health checks, application performance monitoring tools, and intelligent failover strategies ensure your SSO system runs smoothly, so users rarely see errors or delays. User Experience: The Forgotten Component It’s so easy to let user experience take a backseat to security and technical design. In my opinion, UX can make or break the adoption of your SSO solution. When users are presented with confusing login screens, unexpected reauthentication prompts, or unhelpful error messages, frustration mounts, and trust in the system begins to erode. A seamless experience means introducing minimal friction during login, presenting a consistent look and feel across applications, and making sure sign-on actions are intuitive and self-explanatory. One technique I have employed is the use of friendly, branded login portals and clear explanations when something goes wrong. If a session times out, for instance, let the user know why and what they need to do next. By making this transparent, the user feels in control, and satisfaction as well as overall security compliance increase. Integration With Legacy Systems One of the biggest challenges I encountered during my consultancy was SSO integration with legacy systems. Most enterprises still depend on older applications that don’t follow the modern standards of authentication, and retrofitting those environments feels daunting. Mostly, when I start, I map critical legacy applications and those to which gradual replacement or modernizing can be done. Wrappers or gateways translating OAuth2 or SAML into whatever authentication mechanisms older applications understand would work for some of these older systems. In cases when support for token-based log-ins must have as little refactoring of a legacy application’s authentication flow, it is necessary. I recommend patience and gradual change. The Big Bang approach is risky and disruptive to business operations, especially if the legacy systems form the backbone of the enterprise. By carefully planning the integration steps, thoroughly testing new workflows in a staging environment, and rolling out the changes gradually, organizations can transition to SSO without destabilizing their operations. Common Security Vulnerabilities I have witnessed many SSO infrastructures attacked by attackers who know compromising the identity provider yields a very broad scope of reward. Other common threats include token theft, session hijacking, man-in-the-middle attacks, and token replay. In mitigating these risks, I highly recommend the use of TLS/HTTPS for all communications. Further, the strict scoping of tokens ensures they allow only minimal permissions, thereby limiting any damage when they get into the wrong hands. Logging and auditing are indispensable here. By monitoring with care authentication attempts, token issuances, and suspicious activity, organizations can pick up early anomalies. If a token seems to be reused too many times from unexpected IP addresses, or if a certain user’s account shows unusual patterns of behavior, an alert system can trigger an investigation, or even automatic token revocation. Logging isn’t just a defensive tool; it is also a compliance and diagnostic resource that helps maintain overall system health. Choosing Tools and Frameworks As I refined my approach to SSO, I found selecting the right tools and frameworks can make all the difference. Open-source solutions like Keycloak offer powerful identity and access management out of the box, and support for both OAuth2 and SAML, among others. Commercial products like Okta or Auth0 offer managed identity services with intuitive interfaces and broad integration options. Microsoft Azure Active Directory works seamlessly in Microsoft-centric environments, which makes it a natural choice for enterprises committed to the Microsoft stack. When choosing a tool, I always look at much more than just protocol support. I think about the admin interfaces, how the solution scales, how easily MFA can be integrated, and how good the documentation/community support is. A good tool will make deployment easier and less maintenance-intensive, accelerating time-to-value. But it’s equally important to ensure that the tool fits into your organization’s overall long-term strategy, its compliance obligations, and its operational constraints. Preparing for the Future of SSO Technology is evolving each day, and so is authentication. I see SSO as becoming even more intelligent, adaptive, and integrated with models of machine learning for real-time detection of anomalous behavior, automatically applying policy-based risk authentication. It is also my belief that SSO design will continue going forward with zero-trust security architectures, where no user or application should have inherent trust without constant validation. Furthermore, emerging identity standards and decentralized identity models, perhaps backed by blockchain, may someday move the locus of control away from centralized identity providers. While these changes are still over the horizon, designing your SSO solution with flexibility and modularity in mind ensures you can adapt to new protocols and standards as they mature. Embracing Holistic SSO Design Let me emphasize that implementing SSO is not just about reducing login prompts. It’s all about enhancing the general security posture, improving user satisfaction, and smoothing out administrative overhead. Done right, SSO can transform authentication from a burdensome chore into a seamless background process that users barely notice. Conclusion This journey requires a strategic roadmap, a willingness to learn and adapt, and striking the right balance between competing priorities. Understand the fundamentals of OAuth2 and SAML, integrate wisely with legacy systems, and keep performance, security, and UX in harmony while trying to craft a solution that suits the needs of your organization. As I’ve learned through my work and research, the most successful SSO implementations embrace a holistic perspective — one that recognizes authentication as an evolving, dynamic system rather than a static configuration. In the end, mastering seamless SSO is more than just mastering code or protocols; it’s about connecting the gaps between technology, people, and processes. So, in turn, we pave the way to an enterprise where authentication is not a blockade but an enabler-one that lets users get back to what they are good at, with complexity from credentials fading noiselessly into the background. More

Protecting Your API Ecosystem: The Role of Rate Limiting in Service Stability

By Elakkiya Daivam

In modern web and mobile applications, APIs are the backbone of communication between different components, services, and users. However, as API usage grows, there is a risk of overloading the system, causing degraded performance or even service outages. One of the most effective ways to prevent such issues is through API rate limiting. Rate limiting refers to the practice of restricting the number of requests a user or system can make to an API within a specific timeframe, which is measured in requests per second or per minute. This ensures that no single user or client overwhelms the API, allowing for fair usage and protecting the backend from being flooded with excessive traffic. In this article, we'll explore the different rate-limiting strategies available, their use cases, and best practices for implementing them to safeguard the APIs from overload. Why Is API Rate Limiting Important? API rate limiting is essential to: Prevent malicious flooding and denial-of-service (DOS) attacks.Maintain API performance and reliability.Ensure fair usage among users.Prevent high costs from overused cloud services. Common API Rate Limiting Strategies There are several rate limiting strategies that can be implemented in API Gateways, Load balancers, etc. 1. Fixed Window Rate Limiting This strategy involves setting a fixed limit on the number of requests allowed within a fixed time window, such as 100 requests per minute. The counter resets when the window ends. The major downside is the possibility of "thundering herd" problems. If several users hit their limit right before the window resets, the system could face a spike in traffic, potentially causing overload. Python import time class FixedWindowRateLimiter: def __init__(self, limit, window_size): self.limit = limit self.window_size = window_size self.requests = [] def is_allowed(self): current_time = time.time() self.requests = [req for req in self.requests if req > current_time - self.window_size] # Check if the number of requests in the current window exceeds the limit if len(self.requests) < self.limit: self.requests.append(current_time) return True else: return False # Example usage limiter = FixedWindowRateLimiter(limit=5, window_size=60) # 5 requests per minute for _ in range(7): if limiter.is_allowed(): print("Request allowed") else: print("Rate limit exceeded") time.sleep(10) # Sleep for 10 seconds between requests 2. Sliding Window Rate Limiting This strategy attempts to fix the problem of the "thundering herd" by shifting the window dynamically based on the request timestamp. In this approach, the window continuously moves forward, and requests are counted based on the most recent period, enabling smoother traffic distribution and less likely to cause sudden bursts. A user is allowed to make 100 requests within any 60-second period. If they made a request 30 seconds ago, they can only make 99 more requests in the next 30 seconds. It is slightly more complex to implement and manage compared to the fixed window strategy. Python import time from collections import deque class SlidingWindowRateLimiter: def __init__(self, limit, window_size): self.limit = limit self.window_size = window_size self.requests = deque() def is_allowed(self): current_time = time.time() while self.requests and self.requests[0] < current_time - self.window_size: self.requests.popleft() if len(self.requests) < self.limit: self.requests.append(current_time) return True else: return False # Example usage limiter = SlidingWindowRateLimiter(limit=5, window_size=60) for _ in range(7): if limiter.is_allowed(): print("Request allowed") else: print("Rate limit exceeded") time.sleep(10) # Sleep for 10 seconds between requests 3. Token Bucket Rate Limiting Token bucket is one of the most widely used algorithms. In this approach, tokens are generated at a fixed rate and stored in a bucket. Each request removes one token from the bucket. If the bucket is empty, the request is denied until new tokens are generated. This algorithm requires careful tracking of tokens and bucket state and may introduce some complexity in implementation. It's more flexible than fixed or sliding windows and allows bursts of requests while enforcing a maximum rate over time. Python import time class TokenBucketRateLimiter: def __init__(self, rate, capacity): self.rate = rate self.capacity = capacity self.tokens = capacity self.last_checked = time.time() def is_allowed(self): current_time = time.time() elapsed = current_time - self.last_checked self.tokens += elapsed * self.rate if self.tokens > self.capacity: self.tokens = self.capacity self.last_checked = current_time if self.tokens >= 1: self.tokens -= 1 return True else: return False # Example usage limiter = TokenBucketRateLimiter(rate=1, capacity=5) for _ in range(7): if limiter.is_allowed(): print("Request allowed") else: print("Rate limit exceeded") time.sleep(1) # Sleep for 1 second between requests 4. Leaky Bucket Rate Limiting Similar to the token bucket algorithm, the leaky bucket model enforces a maximum rate by controlling the flow of requests into the system. In this model, requests are added to a "bucket" at varying rates, but the bucket leaks at a fixed rate. If the bucket overflows, further requests are rejected. This strategy helps to smooth out bursty traffic while ensuring that requests are handled at a constant rate. Similar to the token bucket, it can be complex to implement, especially for systems with high variability in request traffic. Python import time class LeakyBucketRateLimiter: def __init__(self, rate, capacity): self.rate = rate self.capacity = capacity self.water_level = 0 self.last_checked = time.time() def is_allowed(self): current_time = time.time() elapsed = current_time - self.last_checked self.water_level -= elapsed * self.rate if self.water_level < 0: self.water_level = 0 self.last_checked = current_time if self.water_level < self.capacity: self.water_level += 1 return True else: return False # Example usage limiter = LeakyBucketRateLimiter(rate=1, capacity=5) for _ in range(7): if limiter.is_allowed(): print("Request allowed") else: print("Rate limit exceeded") time.sleep(1) # Sleep for 1 second between requests 5. IP-Based Rate Limiting In this strategy, the rate limit is applied based on the user's IP address. This ensures that requests from a single IP address are limited to a specific threshold. This approach can be bypassed by users employing VPNs or proxies. Additionally, it might unfairly affect users sharing an IP address. Python import time class IpRateLimiter: def __init__(self, limit, window_size): self.limit = limit self.window_size = window_size self.ip_requests = {} def is_allowed(self, ip): current_time = time.time() if ip not in self.ip_requests: self.ip_requests[ip] = [] self.ip_requests[ip] = [req for req in self.ip_requests[ip] if req > current_time - self.window_size] if len(self.ip_requests[ip]) < self.limit: self.ip_requests[ip].append(current_time) return True else: return False # Example usage limiter = IpRateLimiter(limit=5, window_size=60) for ip in ['192.168.1.1', '192.168.1.2']: for _ in range(7): if limiter.is_allowed(ip): print(f"Request from {ip} allowed") else: print(f"Rate limit exceeded for {ip}") time.sleep(10) # Sleep for 10 seconds between requests 6. User-Based Rate Limiting This is a more personalized rate-limiting strategy, where the limit is applied to each individual user or authenticated account rather than their IP address. For authenticated users, rate limiting can be done based on their account (e.g., via API keys or OAuth tokens). Python import time class UserRateLimiter: def __init__(self, limit, window_size): self.limit = limit self.window_size = window_size self.user_requests = {} def is_allowed(self, user_id): current_time = time.time() if user_id not in self.user_requests: self.user_requests[user_id] = [] self.user_requests[user_id] = [req for req in self.user_requests[user_id] if req > current_time - self.window_size] if len(self.user_requests[user_id]) < self.limit: self.user_requests[user_id].append(current_time) return True else: return False # Example usage limiter = UserRateLimiter(limit=5, window_size=60) for user_id in ['user1', 'user2']: for _ in range(7): if limiter.is_allowed(user_id): print(f"Request from {user_id} allowed") else: print(f"Rate limit exceeded for {user_id}") time.sleep(10) # Sleep for 10 seconds between requests Best Practices for Implementing Rate Limiting Use clear error responses, typically '429 Too Many Requests'.Rate limit based on context and factors such as user roles, API endpoints, or subscription tiers.Granular limits at different levels (e.g., global, per-user, per-IP) depending on the needs of the API. Log and monitor rate limiting to identify potential abuse or misuse patterns. Use Redis or similar caching solutions for highly distributed systems.Use exponential backoff to retry with increasing delay intervals. Conclusion API Rate limiting is a critical aspect of API management that ensures performance, reliability and security. By choosing the appropriate strategy based on the system's needs and efficient monitoring of usage patterns, the health and performance of the APIs even under heavy traffic can be maintained. Rate limiting is not just a defensive measure; it's an integral part of building scalable and robust web services. More

Trend Report

Observability and Performance

The dawn of observability across the software ecosystem has fully disrupted standard performance monitoring and management. Enhancing these approaches with sophisticated, data-driven, and automated insights allows your organization to better identify anomalies and incidents across applications and wider systems. While monitoring and standard performance practices are still necessary, they now serve to complement organizations' comprehensive observability strategies. This year's Observability and Performance Trend Report moves beyond metrics, logs, and traces — we dive into essential topics around full-stack observability, like security considerations, AIOps, the future of hybrid and cloud-native observability, and much more.

Refcard #392

Software Supply Chain Security

By Justin Albano

CORE

Refcard #399

Platform Engineering Essentials

By Apostolos Giannakidis

CORE

Advancing Explainable Natural Language Generation (NLG): Techniques, Challenges, and Applications

Natural language generation (NLG) lies at the core of applications ranging from conversational agents to content creation. Despite its advances, NLG systems often operate as "black boxes," leaving developers and users uncertain about their decision-making processes. Explainable AI (XAI) bridges this gap by making NLG models more interpretable and controllable. This article explores practical techniques and tools for enhancing the transparency of NLG systems, offering detailed code snippets and step-by-step explanations to guide developers in understanding and improving model behavior. Topics include attention visualization, controllable generation, feature attribution, and integrating explainability into workflows. By focusing on real-world examples, this article serves as an educational guide for building more interpretable NLG systems. Introduction to Explainable NLG Natural language generation (NLG) enables machines to produce coherent and contextually appropriate text, powering applications like chatbots, document summarization, and creative writing tools. While powerful models such as GPT, BERT, and T5 have transformed NLG, their opaque nature creates challenges for debugging, accountability, and user trust. Explainable AI (XAI) provides tools and techniques to uncover how these models make decisions, making them accessible and reliable for developers and end-users. Whether you're training an NLG model or fine-tuning a pre-trained system, XAI methods can enhance your workflow by providing insights into how and why certain outputs are generated. Techniques for Explainable NLG 1. Understanding Attention Mechanisms Transformers, which form the backbone of most modern NLG models, rely on attention mechanisms to focus on relevant parts of the input when generating text. Understanding these attention weights can help explain why a model emphasizes certain tokens over others. Example: Visualizing Attention in GPT-2 Python from transformers import GPT2Tokenizer, GPT2LMHeadModel from bertviz import head_view # Load GPT-2 model and tokenizer model = GPT2LMHeadModel.from_pretrained("gpt2", output_attentions=True) tokenizer = GPT2Tokenizer.from_pretrained("gpt2") # Input text text = "The role of explainability in AI is crucial for ethical decision-making." # Tokenize input inputs = tokenizer(text, return_tensors="pt") # Generate attentions outputs = model(**inputs) attentions = outputs.attentions # List of attention weights from all layers # Visualize attention head_view(attentions, tokenizer, text) Explanation The bertviz library provides a graphical interface for understanding how attention is distributed across input tokens. For instance, if the model generates a summary, you can analyze which words it deems most important. 2. Controllable Text Generation Controllability allows users to guide the model's output by specifying parameters like tone, style, or structure. Models like CTRL and fine-tuned versions of GPT enable this functionality. Example: Guiding Text Generation with Prompts Python from transformers import AutoModelForCausalLM, AutoTokenizer # Load GPT-Neo model model_name = "EleutherAI/gpt-neo-2.7B" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Define a prompt for controlling output style prompt = ( "Write an inspiring conclusion to an academic paper: \n" "In conclusion, the field of Explainable AI has the potential to..." ) # Tokenize and generate text inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(inputs["input_ids"], max_length=100) # Decode and display output print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Explanation By structuring prompts effectively, developers can control how the model generates text. In this example, the model adapts its output to fit an academic tone. 3. Feature Attribution With SHAP SHAP (SHapley Additive exPlanations) provides insights into which parts of the input contribute most to the generated output, helping developers debug issues like bias or irrelevance. Example: SHAP for Explaining Generated Text Python import shap from transformers import pipeline # Load a text generation pipeline generator = pipeline("text-generation", model="gpt2") # Define SHAP explainer explainer = shap.Explainer(generator) # Input text prompt = "Explainable AI improves trust in automated systems by" # Generate explanations shap_values = explainer([prompt]) # Visualize explanations shap.text_plot(shap_values) Explanation SHAP highlights the words or phrases that influence the generated text, offering a way to analyze model focus. For example, you might find that certain keywords disproportionately drive specific tones or styles. 4. Integrated Gradients for Text Attribution Integrated Gradients quantify the contribution of each input feature (e.g., words or tokens) by integrating gradients from a baseline to the input. Example: Integrated Gradients for a Classification Task Python from captum.attr import IntegratedGradients from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "textattack/bert-base-uncased-imdb" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Input text text = "Explainable AI has transformed how developers interact with machine learning models." inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) # Compute Integrated Gradients ig = IntegratedGradients(model) attributions = ig.attribute(inputs['input_ids'], target=1) # Visualize attributions print("Integrated Gradients Attributions:", attributions) Explanation Integrated Gradients are particularly useful in classification tasks where you want to understand which words influence the decision. This can also be extended to text generation tasks for token attribution. 5. Layer-Wise Attention Analysis Sometimes, understanding the individual layers of a transformer can provide deeper insights into the model's behavior. Example: Extracting Attention Weights Layer by Layer Python import torch from transformers import BertTokenizer, BertModel # Load BERT model and tokenizer tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") model = BertModel.from_pretrained("bert-base-uncased", output_attentions=True) # Input sentence text = "Natural Language Generation depends heavily on transformer architectures." inputs = tokenizer(text, return_tensors="pt") # Forward pass with attention outputs = model(**inputs) attention_weights = outputs.attentions # Attention weights for each layer # Analyze specific layer layer_3_attention = attention_weights[3].detach().numpy() print("Attention weights from layer 3:", layer_3_attention) Explanation Layer-wise analysis enables developers to track how attention evolves as it propagates through the network. This is particularly useful for debugging or fine-tuning pre-trained models. Integrating Explainable NLG in Workflows Debugging Model Outputs Explainability tools like SHAP and attention visualizations can help identify issues such as irrelevant focus or sensitivity to noise in the input. Improving Dataset Quality Attribution methods can reveal biases or over-reliance on specific phrases, guiding dataset augmentation, or curation. Building User Trust By showing how models arrive at their outputs, developers can foster trust among end-users, especially in high-stakes applications like legal or medical text generation. Ethical Considerations Mitigating Bias Explainability methods can expose biases in generated content, prompting developers to address these issues through improved training datasets or fairness constraints. Preventing Misinformation Transparency ensures that users understand the limitations of NLG systems, reducing the risk of misinterpretation or misuse. Conclusion Explainable NLG bridges the gap between powerful AI systems and user trust, enabling developers to debug, optimize, and refine their models with greater confidence. By incorporating techniques such as attention visualization, controllable generation, and feature attribution, we can create NLG systems that are not only effective but also interpretable and aligned with ethical standards. As this field continues to evolve, the integration of explainability will remain central to building reliable, human-centric AI.

By Manasi Sharma

How to Fix SQL Database Stuck in Recovery Mode

SQL Server databases occasionally enter "In Recovery" mode, which can often catch database administrators off guard. This status occurs during a restart, database restore, or unexpected shutdown, as SQL Server replays or undoes incomplete transactions to maintain data integrity. While this process is typically automatic, it can sometimes take longer than expected — or even appear stuck — leaving administrators unsure of how to proceed. If you’ve encountered this issue, don’t worry. This article will help you understand what’s happening behind the scenes and teach you how to respond. Here's a quick look at what you'll learn: What "In Recovery" Mode Means — Why your database enters this state and what SQL Server is doing in the background.The 3 Phases of Recovery — A clear breakdown of the Analysis, Redo, and Undo phases that SQL Server follows during recovery.Common Causes of Delays — From large transaction logs to excessive virtual log files (VLFs), see what might be slowing down the process.How to Get Back Online — Learn practical steps to restore your database to a consistent state, from waiting it out to using SQL repair tools.When to Seek Advanced Help — What to do if the recovery process seems stuck and no progress is being made. By the end of this guide, you'll have a solid understanding of SQL Server’s recovery process and the tools you can use to get your database back online as quickly as possible. Understanding "In Recovery" Mode in SQL Server When SQL Server restarts or a database is restored from backup, it enters 'In Recovery' mode to maintain data integrity. During this phase, SQL Server replays or undoes incomplete transactions to prevent data corruption and ensure transactional consistency. After restarting SQL Server, the database moves to “In Recovery” mode. You may also see the SQL Server database in recovery state on its startup or when restoring it from backup. Figure 1- SQL Database "In Recovery" Mode The database "recovering" state means that the database performs a recovery process and will automatically come online once the process is complete. However, you may experience that the recovery is slow, and the database is stuck in a recovery state. Your DB might still be in recovery state, as SQL databases undergo three phases of recovery, which can take time depending on your database files' size. The 3 Phases of SQL Database Recovery Usually, when the database is not shut down properly on SQL Server restart, it undergoes crash recovery, ensuring that the DB remains consistent. There are three phases of recovery that an SQL DB needs to go through: Phase 1: Analysis This phase starts from the "last checkpoint till the end of the transaction log." It creates a 'Dirty Page Table' (DPT) table that helps determine all the dirty pages at the time of the crash. Also, it creates an 'Active Transaction Table' (ATT) table to identify uncommitted transactions when the SQL Server stopped. Phase 2: Redo In this phase, SQL Server rolls forward all the changes that happened after the checkpoint and before the crash. Essentially, in the redo phase, all the transactions which are committed but not yet written to the SQL data file (.mdf/.ldf) via checkpoint need to be rolled forward. Phase 3: Undo If there were any uncommitted transactions at the time of database recovery, they have to be rolled back in the undo phase to bring the DB to a consistent state. What to Do if Your Database Is Stuck in Recovery Mode? Check the SQL Server Error Log to see the very first message in the database that may look similar to: Starting up database ‘DatabaseName’ This means that the DB files are opened, and the recovery process has started. Sometime later, you should see SQL Server undergoing 3 phases of recovery. If you're looking for guidance on how to back up and restore your database, check out this guide on backing up and restoring Azure SQL Databases. Phase 1 of database recovery is shown below: Plain Text Recovery of database ‘DatabaseName’ (9) is 0% complete (approximately 95 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required. Recovery of database ‘DatabaseName’ (9) is 3% complete (approximately 90 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required. After completion of phase 1, SQL Server will undergo phase 2 and 3 of recovery: Plain Text Recovery of database ‘DatabaseName’ (9) is 5% complete (approximately 85 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required… Recovery of database ‘DatabaseName’ (9) is 95% complete (approximately 40 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required. Phase 3 of 3. This is an informational message only. No user action is required. Once Phase 2 and 3 is complete, you will see something similar to: Plain Text 3807 transactions rolled forward in database ‘DatabaseName’ (9). This is an informational message only. No user action is required. 0 transactions rolled back in database ‘DatabaseName’ (9). This is an informational message only. No user action is required. Recovery is writing a checkpoint in database ‘DatabaseName’ (9). This is an informational message only. No user action is required. Recovery completed for database DatabaseName (database ID 9) in 30 second(s) (analysis 1289 ms, redo 29343 ms, undo 72 ms.) This is an informational message only. No user action is required. In the error log, pay attention to the message ‘no user action is required’. This indicates that the database is in recovery state. However, the recovery may take longer than expected, and the database will be stuck in recovery mode. Reasons Behind SQL Database Stuck in “In Recovery” Mode Following are the reasons that may cause an SQL database stuck in recovery mode: A long-running transaction is rolling backTransaction log file size is hugeToo many Virtual Log Files (VLFs) are inside the DB transaction logThere was a bug in the SQL Server, which is now fixed. What Can You Do to Bring Database Back to a Consistent State? Workaround 1: Wait for the Database Recovery to Complete The most obvious solution to bring the database back online is to be patient and wait for the recovery process to complete; this could take hours or days. If the recovery is taking too long than expected for a DB in an SQL Server 2008 or 2008 R2, applying Microsoft fixes may help. Note: Avoid running the RESTORE command to bring DB online in a consistent state, as SQL Server is already attempting to perform the same task. And, running ‘RESTORE with..Recovery’ means putting the DB to go through the same steps again. Workaround 2: Use a Professional SQL Database Repair Tool If the recovery gets completed but fails to bring the database in a consistent state, using a specialized SQL repair tool may help restore the DB to its original state. Stellar Repair for MS SQL — A specialized tool that helps restore SQL databases to their original state after corruption or failure.ApexSQL Recover — This tool helps recover deleted, truncated, or corrupted SQL Server database data.dbForge SQL Complete — While primarily an IDE extension, it offers useful error handling and troubleshooting features.Redgate SQL Data Recovery — Redgate provides a range of SQL Server tools, including data recovery features.SysTools SQL Recovery Tool — Known for recovering corrupted or damaged SQL database files (.MDF and .NDF) and bringing them back to a usable state.Kernel for SQL Database Recovery — This tool can recover and restore SQL Server databases from both corruption and unexpected shutdowns.Aryson SQL Database Recovery — Another tool that can repair and restore MDF and NDF files in SQL Server. Conclusion In this article, we covered what it means when a SQL database is stuck in 'In Recovery' mode, the three critical phases of recovery (Analysis, Redo, and Undo), and what you can do to bring the database back online. We also discussed possible causes, from large transaction logs to too many virtual log files (VLFs). If your SQL database is still stuck in recovery mode, remember that patience is key. Avoid running a RESTORE command, as this restarts the recovery process. For severe issues where manual intervention fails, consider using a professional SQL database repair tool to restore your data. If you found this guide helpful, share it with your team or leave a comment with your SQL recovery experiences.

By Daniel Calbimonte

How and Why the Developer-First Approach Is Changing the Observability Landscape

Developers play a crucial role in modern companies. If we want our product to be successful, we need to have a developer-first approach and include observability from day one. Read on to understand why. The World Has Changed Many things have changed in the last decade. In our quest for greater scalability, resilience, and flexibility within the digital infrastructure of our organization, there has been a strategic pivot away from traditional monolithic application architectures towards embracing modern software engineering practices such as microservices architecture coupled with cloud-native applications. This shift acknowledges that in today's fast-paced technological landscape, building isolated and independently deployable services offers significant advantages over the legacy of intertwined codebases characteristic of monolithic systems. Moreover, by adopting cloud-native principles tailored for public or hybrid cloud environments, we've further streamlined our application development and delivery process while ensuring optimal resource utilization through container orchestration tools like Kubernetes — which facilitate scalable deployment patterns such as horizontal scaling to match demand fluctuations. This paradigm shift not only allows us more efficient use of cloud resources but also supports the DevOps culture, fostering an environment where continuous integration and delivery become integral components that accelerate time-to-market for new features or enhancements in alignment with our business objectives. To deal with the fast-changing world, we've shifted our approach to reduce the complexity of deployments; they have become frequent daily tasks rather than rare challenging events due to a move from laborious manual processes to streamlined CI/CD pipelines and the creation of infrastructure deployment tools. This transition has substantially complicated system architectures across various dimensions including but not limited to infrastructure, configuration settings, security protocols, machine learning integrations, etc., where we've gained proficiency in managing these complexities through our deployments. Nevertheless, the intricate complexity of databases hasn’t been addressed adequately; it has surged dramatically with each application now leveraging multiple database types — ranging from SQL and NoSQL systems to specialized setups for specific tasks like machine learning or advanced vector search operations due to regular frequent deployments. Because these changes are often rolled out asynchronously, alterations in the schema of databases or background jobs can occur at any time without warning which has a cascading effect on performance issues throughout our interconnected systems. This not only affects business directly but also complicates resolution efforts for developers and DevOps engineers who lack the expertise to troubleshoot these database-centric problems alone, thus necessitating external assistance from operations experts or specialized DBAs (Database Administrators). The absence of automated solutions leaves the process vulnerable due to dependence on manual intervention. In the past, we would put the burden of increased complexity on specialized teams like DBAs or operations. Unfortunately, this is not possible anymore. The complexity of the deployments and applications increased enormously due to the hundreds of databases and services we deploy every day. Nowadays, we face multi-tenant architectures with hundreds of databases, thousands of serverless applications, and millions of changes going through the pipelines each day. Even if we wanted to handle this complexity with specialized teams of DBAs or DevOps engineers, it’s simply impossible. Thinking that this remains irrelevant to mainstream business applications couldn’t be farther from the truth. Let’s read on to understand why. Developers Are Evaluating Your Business Many companies realized that streamlining developers’ work inevitably brings multiple benefits to the whole company. This happens mostly due to two reasons: performance improvement and new domains. Automation in development areas can significantly reduce MTTR and improve velocity. All business problems of today’s world need to be addressed by the digital solutions that are ultimately developed and maintained by developers. Keeping developers far from the end of the funnel means higher MTTR, more bugs, and longer troubleshooting. On the other hand, if we reorganize the environment to let developers work faster, they can directly impact all the organizational metrics. Therefore, our goal is to involve developers in all the activities and shift-left as much as possible. By putting more tasks directly on the development teams, we impact not only the technical metrics but also the business KPIs and customer-facing OKRs. The second reason is the rise of new domains, especially around machine learning. AI solutions significantly reshape our today’s world. With large language models, recommendation systems, image recognition, and smart devices around, we can build better products and solve our customers’ issues faster. However, AI changes so rapidly that only developers can tame this complexity. This requires developers to understand not only the technical side of the AI solutions but also the domain knowledge of the business they work on. Developers need to know how to build and train the recommendation systems, but also why these systems recommend specific products and how societies work. This turns developers into experts in sociology, politics, economics, finances, communication, psychology, and any other domain that benefits from AI. Both of these reasons lead to developers playing a crucial role in running our businesses. Days of developers just taking their tasks from Jira board are now long gone. Developers not only lead the business end-to-end but also the performance of the business strongly depends on the developers’ performance. Therefore, we need to shift our solutions to be more developer-centric to lower the MTTR, improve velocity, and enable developers to move faster. Developers are increasingly advocating for an ecosystem where every component, from configuration changes to deployment processes, is encapsulated within code — a philosophy known as infrastructure as code (IaC). This approach not only streamlines the setup but also ensures consistency across various environments. The shift towards full automation further emphasizes this trend; developers are keen on implementing continuous integration and delivery pipelines that automatically build, test, and deploy software without human intervention whenever possible. They believe in removing manual steps to reduce errors caused by human error or oversight and speed up the overall development cycle. Furthermore, they aim for these automated processes to be as transparent and reversible as needed — allowing developers quick feedback loops when issues arise during testing stages while ensuring that any rollback can happen seamlessly if necessary due to a failed deployment or unexpected behavior in production environments. Ultimately, the goal is an efficient, error-resistant workflow where code not only dictates functionality but also governs infrastructure changes and automation protocols — a vision of development heavily reliant on software for its operational needs rather than traditional manual processes. Developers critically evaluate each tool under their purview — whether these be platforms for infrastructure management like Puppet or Chef, continuous integration systems such as Jenkins, deployment frameworks including Kubernetes, monitoring solutions (perhaps Prometheus or Grafana), or even AI and machine learning applications. They examine how maintenance-friendly the product is: Can it handle frequent updates without downtime? Does its architecture allow for easy upgrades to newer versions with minimal configuration changes required by developers themselves? The level of automation built into these products becomes a central focus - does an update or change trigger tasks automatically, streamlining workflows and reducing the need for manual intervention in routine maintenance activities? Beyond mere functionality, how well does it integrate within their existing pipelines? Are its APIs easily accessible so that developers can extend capabilities with custom scripts if necessary? For instance, integrating monitoring tools into CI/CD processes to automatically alert when a release has failed or rolled back due to critical issues is an essential feature assessed by savvy devs who understand the cascading effects of downtime in today's interconnected digital infrastructure. Their focus is not just immediate utility but future-proofing: they seek out systems whose design anticipates growth, both in terms of infrastructure complexity and the sheer volume of data handled by monitoring tools or AI applications deployed across their stacks — ensuring that what today might be cutting edge remains viable for years to come. Developers aim not just at building products but also curating ecosystem components tailored towards seamless upkeep with minimal manual input required on everyday tasks while maximizing productivity through intelligent built-in mechanisms that predict, prevent, or swiftly rectify issues. Developers play an essential role in shaping technology within organizations by cooperating with teams at various levels — management, platforms engineering, and senior leaders — to present their findings, proposed enhancements, or innovative solutions aimed to improve efficiency, security, scalability, user experience, or other critical factors. These collaborations are crucial for ensuring that technological strategies align closely with business objectives while leveraging the developers' expertise in software creation and maintenance. By actively communicating their insights through structured meetings like code reviews, daily stand-ups, retrospectives, or dedicated strategy sessions, they help guide informed decision-making at every level of leadership for a more robust tech ecosystem that drives business success forward. This suggests that systems must keep developers in mind to be successful. Your System Must Be Developer-First Companies are increasingly moving to platform solutions to enhance their operational velocity, enabling faster development cycles and quicker time-to-market. By leveraging integrated tools and services, platform solutions streamline workflows, reduce the complexity of managing multiple systems, and foster greater collaboration across teams. This consolidated approach allows companies to accelerate innovation, respond swiftly to market changes, and deliver value to customers more efficiently, ultimately gaining a competitive edge in the fast-paced business environment. However, to enhance the operational velocity, the solutions must be developer-first. Let's look at some examples of products that have shifted towards prioritizing developers. The first is cloud computing. Manual deployments are a thing of the past. Developers now prefer to manage everything as code, enabling repeatable, automated, and reliable deployments. Cloud platforms have embraced this approach by offering code-centric mechanisms for creating infrastructure, monitoring, wikis, and even documentation. Solutions like AWS CloudFormation and Azure Resource Manager allow developers to represent the system's state as code, which they can easily browse and modify using their preferred tools. Another example is internal developer platforms (IDPs), which empower developers to build and deploy their services independently. Developers no longer need to coordinate with other teams to create infrastructure and pipelines. Instead, they can automate their tasks through self-service, removing dependencies on others. Tasks that once required manual input from multiple teams are now automated and accessible through self-service, allowing developers to work more efficiently. Yet another example is artificial intelligence tools. AI is significantly enhancing developer efficiency by seamlessly integrating with their tools and workflows. By automating repetitive tasks, such as code generation, debugging, and testing, AI allows developers to focus more on creative problem-solving and innovation. AI-powered tools can also provide real-time suggestions, detect potential issues before they become problems, and optimize code performance, all within the development environment. This integration not only accelerates the development process but also improves the quality of the code, leading to faster, more reliable deployments and ultimately, a more productive and efficient development cycle. Many tools (especially at Microsoft) are now enabled with AI assistants that streamline the developers’ work. Observability 2.0 to the Rescue We saw a couple of solutions that kept developers’ experience in mind. Let’s now see an example domain that lacks this approach — monitoring and databases. Monitoring systems often prioritize raw and generic metrics because they are readily accessible and applicable across various systems and applications. These metrics typically include data that can be universally measured, such as CPU usage or memory consumption. Regardless of whether an application is CPU-intensive or memory-intensive, these basic metrics are always available. Similarly, metrics like network activity, the number of open files, CPU count, and runtime can be consistently monitored across different environments. The issue with these metrics is that they are too general and don’t provide much insight. For instance, a spike in CPU usage might be observed, but what does it mean? Or perhaps the application is consuming a lot of memory — does that indicate a problem? Without a deeper understanding of the application, it's challenging to interpret these metrics meaningfully. Another important consideration is determining how many metrics to collect and how to group them. Simply tracking "CPU usage" isn't sufficient: we need to categorize metrics based on factors like node type, application, country, or other relevant dimensions. However, this approach can introduce challenges. If we aggregate all metrics under a single "CPU" label, we might miss critical issues affecting only a subset of the sources. For example, if you have 100 hosts and only one experiences a CPU spike, this won't be apparent in aggregated data. While metrics like p99 or tm99 can offer more insights than averages, they still fall short. If each host experiences a CPU spike at different times, these metrics might not detect the problem. When we recognize this issue, we might attempt to capture additional dimensions, create more dashboards for various subsets, and set thresholds and alarms for each one individually. However, this approach can quickly lead to an overwhelming number of metrics. There is a discrepancy between what developers want and what evangelists or architects think the right way is. Architects and C-level executives promote monitoring solutions that developers just can’t stand. Monitoring solutions are just wrong because they swamp the users with raw data instead of presenting curated aggregates and actionable insights. To make things better, the monitoring solutions need to switch gears to observability 2.0 and database guardrails. First and foremost, developers aim to avoid issues altogether. They seek modern observability solutions that can prevent problems before they occur. This goes beyond merely monitoring metrics: it encompasses the entire software development lifecycle (SDLC) and every stage of development within the organization. Production issues don't begin with a sudden surge in traffic; they originate much earlier when developers first implement their solutions. Issues begin to surface as these solutions are deployed to production and customers start using them. Observability solutions must shift to monitoring all the aspects of SDLC and all the activities that happen throughout the development pipeline. This includes the production code and how it’s running, but also the CI/CD pipeline, development activities, and every single test executed against the database. Second, developers deal with hundreds of applications each day. They can’t waste their time manually tuning alerting for each application separately. The monitoring solutions must automatically detect anomalies, fix issues before they happen, and tune the alarms based on the real traffic. They shouldn’t raise alarms based on hard limits like 80% of the CPU load. Instead, they should understand if the high CPU is abnormal or maybe it’s inherent to the application domain. Last but not least, monitoring solutions can’t just monitor. They need to fix the issues as soon as they appear. Many problems around databases can be solved automatically by introducing indexes, updating the statistics, or changing the configuration of the system. These activities can be performed automatically by the monitoring systems. Developers should be called if and only if there are business decisions to be taken. And when that happens, developers should be given a full context of what happens, why, where, and what choice they need to make. They shouldn’t be debugging anything as all the troubleshooting should be done automatically by the tooling. Stay In the Loop With Developers In Mind Over the past decade, significant changes have occurred. In our pursuit of enhanced scalability, resilience, and flexibility within our organization’s digital infrastructure, we have strategically moved away from traditional monolithic application architectures. Instead, we have adopted modern software engineering practices like microservices architecture and cloud-native applications. This shift reflects the recognition that in today’s rapidly evolving technological environment, building isolated, independently deployable services provides substantial benefits compared to the tightly coupled codebases typical of monolithic systems. To make this transition complete, we need to make all our systems developer-centric. This shifts the focus on what we build and how to consider developers and integrate with their environments. Instead of swamping them with data and forcing them to do the hard work, we need to provide solutions and answers. Many products already shifted to this approach. Your product shouldn’t stay behind.

By Adam Furmanek

CORE

How to Test PUT Requests for API Testing With Playwright Java

API testing is a process that confirms that the API under test is working as expected. Generally, in Agile teams, due to shift left testing, API testing is performed earlier in the SDLC as it provides major benefits, like faster feedback and allowing the team to fix the bugs early in the phase. There are multiple tools and frameworks available these days that help perform API testing quickly. Playwright is one such test automation framework that has gained a lot of popularity. Backed by Microsoft, it supports web and API automation testing in multiple programming languages. In this tutorial, we will learn to use Playwright with Java and test PUT API requests in automation testing. Getting Started It is recommended to check out the previous tutorial blog where the details about prerequisite, setup and configuration are already discussed. Application Under Test We will be using RESTful e-commerce APIs that are available over GitHub and free to use. This project has multiple APIs related to the e-commerce application’s order management functionality and allows creating, updating, fetching, and deleting orders. It can be set up locally using NodeJS or Docker. What Is a PUT Request? PUT requests are ideally used to update resources. It is used for replacing the data in the existing data of the target resource with the request. Like POST requests, the Content-Type header plays an important role in sending the data to the resource in the required format. The PUT requests generally return Status Code 200 with the updated data in the response, however, it depends on the requirement, some APIs don’t return any data in response and that depends on how the response of that API is designed. Difference Between POST and PUT Request The major difference between PUT and POST request is that PUT is used for updating the resource while POST is used for creating a new resource. PUT request is idempotent, meaning, if the same request with the same body is called multiple times, it has no side effects and keeps updating the resource. PUT /updateOrder/{id} Endpoint The /updateOrder/{id} endpoint updates the available order using its order ID. This API is a part of the RESTful e-commerce application. This API will be used further in the blog to demo-test the PUT request tests using Playwright Java. This API takes in the id (i.e.order_id) as a path parameter to check for the available order. The updated order details should be supplied in the request body in JSON format. It is important to note here that since it is a PUT request, we need to send the full Order details even if we need to update a single field in the order. This API needs the Authentication token to be supplied considering which order will be updated else an error will be returned if the token is not supplied or if it is invalid. The PUT request will return Status Code 200 with the updated order details in case of a successful order update. In case the update fails, there are multiple response codes and error messages returned based on the criteria as follows: Status Code 400 — If the token authentication failsStatus Code 400 — If the incorrect body/ no body is sent in the requestStatus Code 403 — If the token is not supplied while sending the requestStatus Code 404 — If there are no orders for the respective order_id supplied to update the order How to Test PUT APIs Using Playwright Java We will be using the following test scenario to demonstrate testing PUT APIs using Playwright Java. Test Scenario 1: Update the Order Start the RESTful e-commerce service.Use POST request to create some orders in the system.Update all the order details of order_id “2.”Verify that the Status Code 200 is returned in the response.Verify that the order details have been updated correctly. Test Implementation This test scenario will be implemented in a new test method testShouldUpdateTheOrderUsingPut() in the existing test class HappyPathTests. Java @Test public void testShouldUpdateTheOrderUsingPut() { final APIResponse authResponse = this.request.post("/auth", RequestOptions.create().setData(getCredentials())); final JSONObject authResponseObject = new JSONObject(authResponse.text()); final String token = authResponseObject.get("token").toString(); final OrderData updatedOrder = getUpdatedOrder(); final int orderId = 2; final APIResponse response = this.request.put("/updateOrder/" + orderId, RequestOptions.create() .setHeader("Authorization", token) .setData(updatedOrder)); final JSONObject updateOrderResponseObject = new JSONObject(response.text()); final JSONObject orderObject = updateOrderResponseObject.getJSONObject("order"); assertEquals(response.status(), 200); assertEquals(updateOrderResponseObject.get("message"), "Order updated successfully!"); assertEquals(orderId, orderObject.get("id")); assertEquals(updatedOrder.getUserId(), orderObject.get("user_id")); assertEquals(updatedOrder.getProductId(), orderObject.get("product_id")); assertEquals(updatedOrder.getProductName(), orderObject.get("product_name")); assertEquals(updatedOrder.getProductAmount(), orderObject.get("product_amount")); assertEquals(updatedOrder.getTotalAmt(), orderObject.get("total_amt")); } The following three steps are required to be taken care of while updating the order. Generate the Authentication token.Generate the Update Order Test Data.Update the Order using PUT request. 1. Generating the Authentication Token The POST /auth API endpoint will allow you to generate the token and return the generated token in the response. The login credentials for this API are as follows: username — “admin”password — “secretPass123” When the correct credentials are passed in the POST request, the API returns Status Code 200, along with the token in response. This token value can be used further in the test to execute the PUT request. The token has been added as a security measure in the RESTful e-commerce app so only the users who know login credentials, i.e., trusted users, can update the order. 2. Generating the Test Data for Updating Order The second step is to generate a new set of data that would replace the existing order details. We would be creating a new method : getUpdatedOrder() in the existing class OrderDataBuilder We used it in the POST request tutorial blog to generate the new order test data. Java public static OrderData getUpdatedOrder() { int userId = FAKER.number().numberBetween(4, 5); int productId = FAKER.number().numberBetween(335,337); int productAmount = FAKER.number().numberBetween(510, 515); int quantity = FAKER.number().numberBetween(1, 2); int taxAmount = FAKER.number().numberBetween(35,45); int totalAmount = (productAmount*quantity)+taxAmount; return OrderData.builder() .userId(String.valueOf(userId)) .productId(String.valueOf(productId)) .productName(FAKER.commerce().productName()) .productAmount(productAmount) .qty(quantity) .taxAmt(taxAmount) .totalAmt(totalAmount) .build(); } This getUpdatedOrder() method generates totally new data for the order; hence, when this method is called in the PUT request, it will replace all the existing order details for the order ID. 3. Update the Order Using PUT Request Now, we have come to the stage where we will be testing the PUT request using Playwright Java. We would be creating a new test method testShouldUpdateTheOrderUsingPut() in the existing test class HappyPathTests. Java @Test public void testShouldUpdateTheOrderUsingPut() { final APIResponse authResponse = this.request.post("/auth", RequestOptions.create().setData(getCredentials())); final JSONObject authResponseObject = new JSONObject(authResponse.text()); final String token = authResponseObject.get("token").toString(); final OrderData updatedOrder = getUpdatedOrder(); final int orderId = 2; final APIResponse response = this.request.put("/updateOrder/" + orderId, RequestOptions.create() .setHeader("Authorization", token) .setData(updatedOrder)); final JSONObject updateOrderResponseObject = new JSONObject(response.text()); final JSONObject orderObject = updateOrderResponseObject.getJSONObject("order"); assertEquals(response.status(), 200); assertEquals(updateOrderResponseObject.get("message"), "Order updated successfully!"); assertEquals(orderId, orderObject.get("id")); assertEquals(updatedOrder.getUserId(), orderObject.get("user_id")); assertEquals(updatedOrder.getProductId(), orderObject.get("product_id")); assertEquals(updatedOrder.getProductName(), orderObject.get("product_name")); assertEquals(updatedOrder.getProductAmount(), orderObject.get("product_amount")); assertEquals(updatedOrder.getTotalAmt(), orderObject.get("total_amt")); } This method will generate the token, create a new set of updated order data, and send the PUT request to update the order. Let’s break this huge method into smaller chunks and understand it better. The first part of this method generates the token. It will use the /auth endpoint and send a POST request along with the valid login credentials. The getCredentials() is a static method that is available in the TokenBuilder class. It will generate and provide the valid credentials in the JSON format to be used in the /auth POST request. Java public class TokenBuilder { public static TokenData getCredentials() { return TokenData.builder().username("admin") .password("secretPass123") .build(); } } The getCredentials() method returns the TokenData object that contains the fields username and password. Java @Getter @Builder public class TokenData { private String username; private String password; } The response will be extracted and stored as JSON Object in the authResponseObject variable. Finally, the token value will be stored in the token variable in String format, which will be used in the test while sending POST requests. Next, the updated order details need to be fetched. The getUpdatedOrder() is a static method, and every time, it will generate a new order when it is called. We will be using the order with order_id “2” to update the order details. The order will be updated next using the put() method of Playwright. The additional details, Authorization header, and request body will be supplied in the put() method parameter. The RequestOptions.create() will create an object for the PUT request and allow attaching the header and request body. The setHeader() method will allow adding the Authorization header and the setData() will add the updatedOrder object as body to the request. Once the request is executed, the response is parsed and stored in the updateOrderResponseObject variable in JSONObject type. Response body: JSON { "message": "Order updated successfully!", "order": { "id": 1, "user_id": "1", "product_id": "1", "product_name": "iPhone 15 Pro Max", "product_amount": 503, "qty": 1, "tax_amt": 5.99, "total_amt": 508.99 } } The details of orders are retrieved in the “order” object in the response, and hence, we are storing the array in the orderObject variable that has the JSONObject type. The final part of the code performs assertions to check that the order details retrieved in the response are the same that were sent in the request body and accordingly the order has been updated. The updatedOrder object stores all the values that were sent in the request; hence, it is kept as the expected result, and the data retrieved in the response, i.e., orderObject object, becomes the actual output. Test Execution We need to first create orders before updating them. So, let’s create a new testng.xml file — testng-restfulecommerce-updateorder.xml to execute the update order tests. This testng.xml file will have only two methods to execute, the first one being testShouldCreateNewOrders() and then the update order test method — testShouldUpdateTheOrderUsingPut(). XML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd"> <suite name="Restful ECommerce Test Suite"> <test name="Testing Happy Path Scenarios of Creating and Updating Orders"> <classes> <class name="io.github.mfaisalkhatri.api.restfulecommerce.HappyPathTests"> <methods> <include name="testShouldCreateNewOrders"/> <include name="testShouldUpdateTheOrderUsingPut"/> </methods> </class> </classes> </test> </suite> The following screenshot of the test execution shows that the tests were executed successfully. First, the orders were created, and then an order was updated using the PUT request. Test Scenario 2: Try Updating the Order With an Invalid Order ID This is a sad path scenario where we will be supplying an invalid order ID while updating the order using a PUT request. Using a PUT request, try updating an order with an invalid order that does not exist in the system. For example, order_id = 90Verify that Status Code 404 is returned in the responseVerify that the message text “No Order found with the given parameters!” is returned in the response. Test Implementation We will be adding a new test method testShouldNotUpdateOrder_WhenOrderIdIsNotFound() in the existing test class SadPathTests. Java @Test public void testShouldNotUpdateOrder_WhenOrderIdIsNotFound() { final APIResponse authResponse = this.request.post("/auth", RequestOptions.create().setData(getCredentials())); final JSONObject authResponseObject = new JSONObject(authResponse.text()); final String token = authResponseObject.get("token").toString(); final OrderData updatedOrder = getUpdatedOrder(); final int orderId = 90; final APIResponse response = this.request.put("/updateOrder/" + orderId, RequestOptions.create() .setHeader("Authorization", token) .setData(updatedOrder)); final JSONObject responseObject = new JSONObject(response.text()); assertEquals(response.status(), 404); assertEquals(responseObject.get("message"), "No Order found with the given Order Id!"); } A hard-coded order_id “90” will be passed in the test as we need to send an invalid order ID. The implementation of this test scenario remains the same as we did for the previous test scenario. All the steps remain the same; we need to generate the token first, then attach it to the Header and use the put() method from the APIRequestContext interface to finally send the PUT request to update the order. The same getUpdatedOrder() static method from the OrderDataBuilder class would be used to send the request body. Finally, we would be asserting that the Status Code 404 and message text “No Order found with the given Order Id!” is returned in the response. Test Execution Let’s add a new test block in the testng-restfulecommerce-updateorder.xml file and execute it. XML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd"> <suite name="Restful ECommerce Test Suite"> <test name="Testing Happy Path Scenarios of Creating and Updating Orders"> <classes> <class name="io.github.mfaisalkhatri.api.restfulecommerce.HappyPathTests"> <methods> <include name="testShouldCreateNewOrders"/> <include name="testShouldUpdateTheOrderUsingPut"/> </methods> </class> </classes> </test> <test name="Testing Sad Path Scenarios of Updating Order"> <classes> <class name="io.github.mfaisalkhatri.api.restfulecommerce.SadPathTests"> <methods> <include name="testShouldNotUpdateOrder_WhenOrderIdIsNotFound"/> </methods> </class> </classes> </test> </suite> The following screenshot from the IntelliJ IDE shows that the test was executed successfully. Summary Testing PUT requests is equally important from an end-to-end testing perspective, as users will be using the PUT API requests to modify/update the records. While updating the data using test automation, we should remember to supply the required headers and data in the desired format. If the authorization token is needed, it should be supplied as well. Testing Happy and Sad Paths in test automation allows regression testing to be compact and can help uncover defects quickly. Playwright Java can help in writing the API automation test scripts for PUT requests for updating the records. We learned how to run the tests in sequential order using testng.xml as it is a recommended way to run the tests locally as well as in the CI/CD pipeline.

By Faisal Khatri

CORE

Optimizing MuleSoft Performance With HikariCP: A Complete Guide

MuleSoft is a powerful integration platform that often deals with high-throughput workloads that require robust database connection management. One solution that stands out in optimizing database interactions is HikariCP, a high-performance JDBC connection pool known for its speed and reliability. HikariCP is widely used in applications that require efficient connection management. In this article, we'll discuss the integration of HikariCP with MuleSoft, its benefits, and best practices for configuring it to maximize performance. What Is HikariCP? HikariCP is a lightweight, high-performance JDBC connection pool known for its minimal overhead and advanced optimization features. It provides fast connection acquisition, efficient connection pool management, and built-in tools to reduce latency and improve database interaction reliability. Key Features High Performance: Low-latency operations and efficient resource utilization make it one of the fastest connection pools available.Reliability: Features like connection validation and leak detection enhance stability.Scalability: Supports high-concurrency applications with minimal thread contention.Lightweight: Minimal footprint in terms of memory and CPU usage. Why Use HikariCP in MuleSoft? MuleSoft applications often interact with databases to process real-time and batch data. Efficient management of database connections is critical to meeting high transaction per minute (TPM) requirements and ensuring system reliability. HikariCP offers: Faster Response Times: Reduced connection acquisition time leads to lower latency in API responses.Enhanced Throughput: Optimized connection pooling ensures better handling of concurrent requests.Thread Management: Prevents thread saturation and reduces CPU overhead.Error Handling: Automatically detects and manages problematic connections, reducing application downtime. Integrating HikariCP With MuleSoft To integrate HikariCP in MuleSoft, follow these steps: Step 1: Configure the HikariCP Module MuleSoft does not natively include HikariCP (it comes with C3P0 by default), it has to be added as a custom dependency. Update your project's pom.xml to include the HikariCP library: XML <dependency> <groupId>com.zaxxer</groupId> <artifactId>HikariCP</artifactId> <version>5.0.1</version>  </dependency> Step 2: Define HikariCP Configuration Add Spring dependencies in pom.xml. The easiest way to add Spring jars is via "Spring Authorization Filter" from the Mule palette: XML <dependency> <groupId>org.mule.modules</groupId> <artifactId>mule-spring-module</artifactId> <version>1.3.6</version> <classifier>mule-plugin</classifier> </dependency> <dependency> <groupId>org.springframework.security</groupId> <artifactId>spring-security-core</artifactId> <version>5.4.2</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-beans</artifactId> <version>5.3.2</version> </dependency> <dependency> <groupId>org.springframework.security</groupId> <artifactId>spring-security-config</artifactId> <version>5.4.2</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId> <version>5.3.2</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-core</artifactId> <version>5.3.2</version> </dependency> Step 3: Create Spring Configuration Define a Spring configuration XML file (spring-config.xml) to initialize the HikariCP DataSource and expose it as a Spring Bean. Add this XML config file to src/main/resources. XML <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">  <bean id="dataSource" class="com.zaxxer.hikari.HikariDataSource" destroy-method="close"> <property name="jdbcUrl" value="jdbc:mysql://localhost:3306/mydb"/> <property name="username" value="dbuser"/> <property name="password" value="dbpassword"/> <property name="maximumPoolSize" value="20"/> <property name="minimumIdle" value="5"/> <property name="idleTimeout" value="30000"/> <property name="connectionTimeout" value="30000"/> <property name="maxLifetime" value="1800000"/> <property name="poolName" value="MuleHikariCP"/> </bean> </beans> HikariCP Pool Size (Maximum Pool Size) Here’s a detailed guide on why and how to avoid setting a higher connection pool size: For example, with 2400 TPM (which is 40 TPS) and each query taking about 1 second, you may need around 40 connections, with an additional buffer of 10-20% to handle spikes, giving a total pool size of around 44-48 connections. Caution: Avoiding an excessively high connection pool size in HikariCP (or any connection pooling mechanism) is critical for optimizing resource usage and ensuring stable application performance. A higher-than-necessary pool size can lead to resource contention, database overload, and system instability. 1. Right-Size the Pool Use the formula below to calculate the optimal pool size: CSS Optimal Pool Size = (Core Threads) × (1 + (Wait Time / Service Time)) Optimal Pool Size = (Core Threads) × (1 + (Service Time / Wait Time)) Core Threads: Number of threads available for executing queries.Wait Time: Time the application can afford to wait for a connection.Service Time: Average time for a query to execute. 2. Connection Timeout Set the connectionTimeout to a value less than your SLA, such as 500-700 milliseconds, to ensure that connections are not held up for too long if they are not available. 3. Idle Timeout Configure idleTimeout to a lower value, like 30,000 ms (30 seconds), so that idle connections are quickly released, avoiding resource waste. 4. Max Lifetime Set the maxLifetime slightly shorter than the database’s connection timeout (e.g., 30 minutes) to avoid connections being closed abruptly by the database. 5. Connection Validation Use validationQuery or enable validationTimeout to ensure connections are always valid, keeping the latency minimal. 6. Database Connection Utilization Ensure that queries are optimized for performance on the database side.Monitor database resources (CPU, memory, and indexes) to see if adjustments can be made for better utilization. These settings should improve how connections are used and help achieve your target response time of under 1 second. If you’re still experiencing issues, consider analyzing query performance and optimizing database operations. Step 4. Configure Spring in Mule Application Global Elements Add this Spring configuration in global.xml and refer to the config-ref: XML <mule xmlns:doc="http://www.mulesoft.org/schema/mule/documentation" xmlns:db="http://www.mulesoft.org/schema/mule/db" xmlns:spring="http://www.mulesoft.org/schema/mule/spring" xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.mulesoft.org/schema/mule/db http://www.mulesoft.org/schema/mule/db/current/mule-db.xsd http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd http://www.mulesoft.org/schema/mule/spring http://www.mulesoft.org/schema/mule/spring/current/mule-spring.xsd">  <spring:config name="Spring_Config" doc:name="Spring Config" doc:id="5550c804-20cf-40c0-9331-d3ee45d7444f" files="spring-beans.xml" />  <db:config name="Database_Config" doc:name="Database Config" doc:id="f18a2e4d-0f43-4adc-86f3-f76a42ecc3c9" > <db:data-source-connection dataSourceRef="dataSource" /> </db:config>  </mule> Step 5: Use the Spring Bean in Mule Flows The datasource bean is now available in the MuleSoft application and can be referenced in the db module configuration. Step 6: Verify Configuration Run the Mule application to ensure the Spring context is loaded correctly.Test database interactions to validate the connection pooling behavior. Step 7: Simulate Load Tests for HikariCP Pool Size Configuration Test Scenario API Use Case: Mule 4 API handling database queries with HikariCP connection pooling.Transaction Load: 3000 Transactions Per Minute (TPM).Concurrency: 50 concurrent users.Query Execution Time: 200ms per query.Connection Pool Configuration: Max Pool Size: 240Min Pool Size: 20Idle Timeout: 30 secondsMax Lifetime: 30 minutesLeak Detection Threshold: 2 seconds Conclusion Integrating HikariCP into your MuleSoft applications unlocks a new level of performance, reliability, and scalability for database interactions. By utilizing HikariCP’s efficient connection pooling and combining it with MuleSoft’s robust integration capabilities, you can confidently handle high-traffic workloads and demanding SLAs. Whether you're building APIs, processing real-time data, or managing complex integrations, HikariCP ensures optimal use of resources, reduced latency, and seamless scalability. With proper configuration and thoughtful integration, you can transform your MuleSoft applications into high-performance engines ready for modern enterprise challenges.

By Kamal Adivi

Setting Up DBT and Snowpark for Machine-Learning Pipelines

AI/ML workflows excel on structured, reliable data pipelines. To streamline these processes, DBT and Snowpark offer complementary capabilities: DBT is for modular SQL transformations, and Snowpark is for programmatic Python-driven feature engineering. Here are some key benefits of using DBT, Snowpark, and Snowflake together: Simplifies SQL-based ETL with DBT’s modularity and tests.Handles complex computations with Snowpark’s Python UDFs.Leverages Snowflake’s high-performance engine for large-scale data processing. Here’s a step-by-step guide to installing, configuring, and integrating DBT and Snowpark into your workflows. Step 1: Install DBT In Shell, you can use Python’s pip command for installing packages. Assuming Python is already installed and added to your PATH, follow these steps: Shell # Set up a Python virtual environment (recommended): python3 -m venv dbt_env source dbt_env/bin/activate # Install DBT and the Snowflake adapter: pip install dbt-snowflake # Verify DBT installation dbt --version Step 2: Install Snowpark Shell # Install Snowpark for Python pip install snowflake-snowpark-python # Install additional libraries for data manipulation pip install pandas numpy # Verify Snowpark installation python -c "from snowflake.snowpark import Session; print('successful Snowpark installation')" Step 3: Configuring DBT for Snowflake DBT requires a profiles.yml file to define connection settings for Snowflake. Locate the DBT Profiles Directory By default, DBT expects the profiles.yml file in the ~/.dbt/ directory. Create the directory if it doesn’t exist: Shell mkdir -p ~/.dbt Create the profiles.yml File Define your Snowflake credentials in the following format: YAML my_project: outputs: dev: type: snowflake account: your_account_identifier user: your_username password: your_password role: your_role database: your_database warehouse: your_warehouse schema: your_schema target: dev Replace placeholders like your_account_identifier with your Snowflake account details. Test the Connection Run the following command to validate your configuration: Shell dbt debug If the setup is correct, you’ll see a success message confirming the connection. Step 4: Setting Up Snowpark Ensure Snowflake Permissions Before using Snowpark, ensure your Snowflake user has the following permissions: Access to the warehouse and schema.Ability to create and register UDFs (User-Defined Functions). Create a Snowpark Session Set up a Snowpark session using the same credentials from profiles.yml: Python from snowflake.snowpark import Session def create_session(): connection_params = { "account": "your_account_identifier", "user": "your_username", "password": "your_password", "role": "your_role", "database": "your_database", "warehouse": "your_warehouse", "schema": "your_schema", } return Session.builder.configs(connection_params).create() session = create_session() print("Snowpark session created successfully") Register a Sample UDF Here’s an example of registering a simple Snowpark UDF for text processing: Python def clean_text(input_text): return input_text.strip().lower() session.udf.register( func=clean_text, name="clean_text_udf", input_types=["string"], return_type="string", is_permanent=True ) print("UDF registered successfully") Step 5: Integrating DBT With Snowpark You have a DBT model named raw_table that contains raw data. raw_table DBT Model Definition SQL -- models/raw_table.sql SELECT * FROM my_database.my_schema.source_table Use Snowpark UDFs in DBT Models Once you’ve registered a UDF in Snowflake using Snowpark, you can call it directly from your DBT models. SQL -- models/processed_data.sql WITH raw_data AS ( SELECT id, text_column FROM {{ ref('raw_table') } ), cleaned_data AS ( SELECT id, clean_text_udf(text_column) AS cleaned_text FROM raw_data ) SELECT * FROM cleaned_data; Run DBT Models Execute your DBT models to apply the transformation: Shell dbt run --select processed_data Step 6: Advanced AI/ML Use Case For AI/ML workflows, Snowpark can handle tasks like feature engineering directly in Snowflake. Here’s an example of calculating text embeddings: Create an Embedding UDF Using Python and a pre-trained model, you can generate text embeddings: Python from transformers import pipeline def generate_embeddings(text): model = pipeline("feature-extraction", model="bert-base-uncased") return model(text)[0] session.udf.register( func=generate_embeddings, name="generate_embeddings_udf", input_types=["string"], return_type="array", is_permanent=True ) Integrate UDF in DBT Call the embedding UDF in a DBT model to create features for ML: SQL -- models/embedding_data.sql WITH raw_text AS ( SELECT id, text_column FROM {{ ref('raw_table') } ), embedded_text AS ( SELECT id, generate_embeddings_udf(text_column) AS embeddings FROM raw_text ) SELECT * FROM embedded_text; Best Practices Use DBT for reusable transformations: Break down complex SQL logic into reusable models.Optimize Snowpark UDFs: Write lightweight, efficient UDFs to minimize resource usage.Test Your Data: Leverage DBT’s testing framework for data quality.Version Control Everything: Track changes in DBT models and Snowpark scripts for traceability. Conclusion By combining DBT’s SQL-based data transformations with Snowpark’s advanced programming capabilities, you can build AI/ML pipelines that are both scalable and efficient. This setup allows teams to collaborate effectively while leveraging Snowflake’s computational power to process large datasets. Whether you’re cleaning data, engineering features, or preparing datasets for ML models, the DBT-Snowpark integration provides a seamless workflow to unlock your data’s full potential.

By Sevinthi Kali Sankar Nagarajan

JavaScript in 2024: Exploring the Latest ECMAScript Updates

As we reach the end of 2024, JavaScript continues to dominate as the go-to programming language for web development, owing to its versatility and community-driven evolution. The latest ECMAScript updates introduce several powerful features aimed at improving developer productivity, code readability, and overall efficiency. 1. Pipeline Operator (|>) The pipeline operator is one of the most anticipated features of ES2024. Borrowed from functional programming paradigms, this operator improves the readability and maintainability of complex function chains by enabling a linear, step-by-step flow of data through multiple functions. How It Works Traditionally, chaining multiple functions requires nested calls, which can quickly become hard to read: JavaScript const result = uppercase(exclaim(addGreeting("Hello"))); With the pipeline operator, the same logic can be written in a cleaner and more readable way: JavaScript const result = uppercase(exclaim(addGreeting("Hello"))); Here, % acts as a placeholder for the value passed from the previous operation. This simple syntax improves code readability, especially in projects requiring complex data transformations. Use Cases The pipeline operator is particularly useful for: Functional Programming: Chaining small, reusable functions in a clear sequence.Data Processing: Applying multiple transformations to datasets in a readable manner.Simplifying Async Workflows: Integrating with await to make asynchronous pipelines intuitive (pending further committee discussions on compatibility). 2. Regular Expression Enhancements (v Flag) ES2024 introduces significant improvements to regular expressions with the addition of the v flag. This enhancement provides powerful new operators (intersection (&&), difference (--), and union (||)) that simplify complex pattern matching. Key Features Intersection (&&) Matches characters that are common to two character sets. For example: JavaScript let regex = /[[a-z]&&[^aeiou]]/v; console.log("crypt".match(regex)); // Matches consonants only: ['c', 'r', 'p', 't'] Difference (--) Excludes specific characters from a set: JavaScript let regex = /[\p{Decimal_Number}--[0-9]]/v; console.log("123٤٥٦".match(regex)); // Matches non-ASCII numbers: ['٤', '٥', '٦'] Union (||) Combines multiple sets, allowing broader matches. Practical Applications These operators simplify patterns for advanced text processing tasks, such as: Filtering non-ASCII characters or special symbols in multilingual datasets.Creating fine-tuned matches for validation tasks (e.g., email addresses, custom identifiers).Extracting domain-specific patterns like mathematical symbols or emojis. 3. Temporal API The Temporal API finally provides a modern, robust replacement for the outdated Date object, addressing long-standing issues with time zones, date manipulations, and internationalization. Why Temporal? The existing Date object has numerous flaws, including: Poor handling of time zones.Complex workarounds for date arithmetic.Limited support for non-Gregorian calendars. Temporal offers a more comprehensive and intuitive API for working with dates, times, and durations. Example: JavaScript const date = Temporal.PlainDate.from("2024-11-21"); const futureDate = date.add({ days: 10 }); console.log(futureDate.toString()); // Outputs: 2024-12-01 Core Features Precise Time Zones: Built-in handling of time zone offsets and daylight saving changes.Immutable Objects: Prevents accidental modifications, making code safer.Duration Arithmetic: Simplifies operations like adding or subtracting time units. Use Cases The Temporal API is ideal for: Scheduling Systems: Handling recurring events with time zone precision.Global Applications: Supporting international users with different calendars.Financial Applications: Accurate interest and payment calculations across time periods. 4. Object.groupBy() Grouping array elements based on specific criteria is a common requirement, and ES2024’s Object.groupBy() method simplifies this process significantly. How It Works Previously, developers had to write custom functions or rely on libraries like Lodash for grouping operations. With Object.groupBy(), grouping becomes straightforward: JavaScript const data = [ { type: "fruit", name: "apple" }, { type: "vegetable", name: "carrot" }, { type: "fruit", name: "banana" }, ]; const grouped = Object.groupBy(data, item => item.type); console.log(grouped); // Output: // { // fruit: [{ type: "fruit", name: "apple" }, { type: "fruit", name: "banana" }], // vegetable: [{ type: "vegetable", name: "carrot" }] // } Advantages Simplicity: Eliminates the need for custom grouping logic.Readability: Provides a declarative approach to data organization.Performance: Optimized for modern JavaScript engines. Applications Categorizing datasets for dashboards or analytics tools.Organizing user input based on metadata, such as tags or roles.Simplifying the preparation of reports from raw data. 5. Records and Tuples The records and tuples proposal introduces immutable data structures to JavaScript, ensuring data safety and reducing unintended side effects. What Are They? Records: Immutable key-value pairs, similar to objects.Tuples: Immutable, ordered lists, similar to arrays. Example: JavaScript const record = #{ name: "Alice", age: 25 }; const tuple = #[1, 2, 3]; console.log(record.name); // Output: "Alice" console.log(tuple[1]); // Output: 2 Key Benefits Immutability: Helps prevent bugs caused by accidental data mutations.Simplified Equality Checks: Records and Tuples are deeply compared by value, unlike objects and arrays. Use Cases Storing configuration data that must remain unchanged.Implementing functional programming techniques.Creating reliable data snapshots in state management systems like Redux. Conclusion With ES2024, JavaScript solidifies its position as a cutting-edge programming language that evolves to meet the demands of modern development. The pipeline operator, regex enhancements, Temporal API, Object.groupBy(), and immutable data structures like records and tuples are poised to streamline workflows and solve long-standing challenges. As these features gain adoption across browsers and Node.js environments, developers should explore and integrate them to write cleaner, more efficient, and future-proof code.

By Maulik Suchak

DuckDB Optimization: A Developer's Guide to Better Performance

If you are already using DuckDB, this guide will help you with some optimization techniques that can improve your application's performance. If you are new to DuckDB, don't fret — you'll still learn something new. I will share some of the practical tips that helped me optimize my applications. Let's dive in! Why DuckDB? Before we jump into optimization techniques, let's quickly discuss what makes DuckDB stand out. In the official DuckDB documentation, many benefits are listed. Give it a read. Loading Data One thing to remember about data loading in DuckDB is that the file format you choose makes a huge difference. Here's what I've learned: SQL -- Here's how I usually load my Parquet files SELECT * FROM read_parquet('my_data.parquet'); -- And here's a neat trick for CSV files CREATE TABLE my_table AS SELECT * FROM read_csv_auto('my_data.csv'); Tip: If you're working with CSV files, consider converting them to Parquet. Parquet files are compressed, columnar, and super fast to query. Chunking: Because Size Matters! I've successfully processed datasets in chunks, especially the larger ones. It's not only efficient, but also can help you debug any issues smoothly. Python import duckdb import pandas as pd def process_big_data(file_path, chunk_size=100000): # Let's break this elephant into bite-sized pieces conn = duckdb.connect(':memory:') print("Ready to tackle this big dataset!") processed_count = 0 while True: # Grab a chunk chunk = conn.execute(f""" SELECT * FROM read_csv_auto('{file_path}') LIMIT {chunk_size} OFFSET {processed_count} """).fetchdf() if len(chunk) == 0: break # Implement your logic here process_chunk(chunk) processed_count += len(chunk) print(f"Processed {processed_count:,} rows... Keep going!") I like to think of this as eating a pizza — you wouldn't try to stuff the whole thing in your mouth at once, right? The same goes for data processing. Query Optimization I've used some queries that would make any database. I learned some of the best practices the hard way (well, hard on the database, too). Here are some tips: 1. Use EXPLAIN ANALYZE to See What's Happening Under the Hood This will show you exactly how DuckDB is processing your query. This should inform you how to further tune your query. SQL EXPLAIN ANALYZE SELECT category, COUNT(*) as count FROM sales WHERE date >= '2024-01-01' GROUP BY category; 2. Be Specific With Columns It's like packing for a weekend trip — do you really need to bring your entire wardrobe? SQL -- Good: Only fetching what we need SELECT user_id, SUM(amount) as total_spent FROM purchases WHERE category = 'books' GROUP BY user_id; -- Not great: Why fetch all columns when we only need two? SELECT * FROM purchases WHERE category = 'books'; 3. Smart Joins Make Happy Databases It's more like organizing a party — you wouldn't invite everyone in town and then figure out who knows each other, right? SQL - Good: Filtering before the join SELECT u.name, o.order_date FROM users u JOIN orders o ON u.id = o.user_id WHERE u.country = 'Canada' AND o.status = 'completed'; -- Not optimal: Joining everything first SELECT u.name, o.order_date FROM (SELECT * FROM users) u JOIN (SELECT * FROM orders) o ON u.id = o.user_id WHERE u.country = 'Canada' AND o.status = 'completed'; 4. Window Functions Done Right It's like keeping a running score in a game — you update as you go, not by recounting all points for each play. SQL - Good: Efficient window function usage SELECT product_name, sales_amount, SUM(sales_amount) OVER (PARTITION BY category ORDER BY sale_date) as running_total FROM sales WHERE category = 'electronics'; -- Less efficient: Using subqueries instead SELECT s1.product_name, s1.sales_amount, (SELECT SUM(sales_amount) FROM sales s2 WHERE s2.category = s1.category AND s2.sale_date <= s1.sale_date) as running_total FROM sales s1 WHERE category = 'electronics'; Memory Management Here's another thing that I learned the hard way: always set memory limits. Here's how I keep things under control: Set memory_limit to 50-70% of available system RAMSet max_memory to about half of memory_limitMonitor and adjust based on your workload First, let's understand how DuckDB uses memory: SQL -- Check current memory settings SELECT * FROM duckdb_settings() WHERE name LIKE '%memory%'; -- View current memory usage SELECT * FROM pragma_database_size(); Basic Memory Settings Think of these settings as setting a budget for your shopping: memory_limit is like your total monthly budgetmax_memory is like setting a limit for each shopping triptemp_directory is like having a storage unit when your closet gets full SQL -- Set overall memory limit SET memory_limit='4GB'; -- Set maximum memory per query SET temp_directory='/path/to/disk'; -- For spilling to disk SET max_memory='2GB'; -- Per-query limit Monitoring Memory Usage SQL -- Enable progress bar to monitor operations SET enable_progress_bar=true; -- Enable detailed profiling SET enable_profiling=true; PRAGMA enable_profiling; -- After running your query, check the profile PRAGMA profile; Memory-Related Warning Signs Watch out for these signs of memory pressure: Slow query performanceSystem becoming unresponsiveQuery failures with "out of memory" errorsExcessive disk activity (spilling to disk) Clean Up Regularly Drop temporary tables and vacuum when needed: SQL -- Clean up temporary objects DROP TABLE IF EXISTS temp_results; VACUUM; Conclusion Always start with the basics, measure everything, and optimize where it matters most. Here's a quick checklist I use: Is my data in the right format? (Parquet is usually the answer)Am I processing data in chunks when dealing with large datasets?Are my queries as specific as possible?Have I set reasonable memory limits?

By Anil Kumar Moka

A Framework for Developing Service-Level Objectives: Essential Guidelines and Best Practices for Building Effective Reliability Targets

Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Observability and Performance: The Precipice of Building Highly Performant Software Systems. "Quality is not an act, it's a habit," said Aristotle, a principle that rings true in the software world as well. Specifically for developers, this means delivering user satisfaction is not a one-time effort but an ongoing commitment. To achieve this commitment, engineering teams need to have reliability goals that clearly define the baseline performance that users can expect. This is precisely where service-level objectives (SLOs) come into picture. Simply put, SLOs are reliability goals for products to achieve in order to keep users happy. They serve as the quantifiable bridge between abstract quality goals and the day-to-day operational decisions that DevOps teams must make. Because of this very importance, it is critical to define them effectively for your service. In this article, we will go through a step-by-step approach to define SLOs with an example, followed by some challenges with SLOs. Steps to Define Service-Level Objectives Like any other process, defining SLOs may seem overwhelming at first, but by following some simple steps, you can create effective objectives. It's important to remember that SLOs are not set-and-forget metrics. Instead, they are part of an iterative process that evolves as you gain more insight into your system. So even if your initial SLOs aren't perfect, it's okay — they can and should be refined over time. Figure 1. Steps to define SLOs Step 1: Choose Critical User Journeys A critical user journey refers to the sequence of interactions a user takes to achieve a specific goal within a system or a service. Ensuring the reliability of these journeys is important because it directly impacts the customer experience. Some ways to identify critical user journeys can be through evaluating revenue/business impact when a certain workflow fails and identifying frequent flows through user analytics. For example, consider a service that creates virtual machines (VMs). Some of the actions users can perform on this service are browsing through the available VM shapes, choosing a region to create the VM in, and launching the VM. If the development team were to order them by business impact, the ranking would be: Launching the VM because this has a direct revenue impact. If users cannot launch a VM, then the core functionality of the service has failed, affecting customer satisfaction and revenue directly.Choosing a region to create the VM. While users can still create a VM in a different region, it may lead to a degraded experience if they have a regional preference. This choice can affect performance and compliance.Browsing through the VM catalog. Although this is important for decision making, it has a lower direct impact on the business because users can change the VM shape later. Step 2: Determine Service-Level Indicators That Can Track User Journeys Now that the user journeys are defined, the next step is to measure them effectively. Service-level indicators (SLIs) are the metrics that developers use to quantify system performance and reliability. For engineering teams, SLIs serve a dual purpose: They provide actionable data to detect degradation, guide architectural decisions, and validate infrastructure changes. They also form the foundation for meaningful SLOs by providing the quantitative measurements needed to set and track reliability targets. For instance, when launching a VM, some of the SLIs can be availability and latency. Availability: Out of the X requests to launch a VM, how many succeeded? A simple formula to calculate this is: If there were 1,000 requests and 998 requests out of them succeeded, then the availability is = 99.8%. Latency: Out of the total number of requests to launch a VM, what time did the 50th, 95th, or 99th percentile of requests take to launch the VM? The percentiles here are just examples and can vary depending on the specific use case or service-level expectations. In a scenario with 1,000 requests where 900 requests were completed in 5 seconds and the remaining 100 took 10 seconds, the 95th percentile latency would be = 10 seconds.While averages can also be used to calculate latencies, percentiles are typically recommended because they account for tail latencies, offering a more accurate representation of the user experience. Step 3: Identify Target Numbers for SLOs Simply put, SLOs are the target numbers we want our SLIs to achieve in a specific time window. For the VM scenario, the SLOs can be: The availability of the service should be greater than 99% over a 30-day rolling window.The 95th percentile latency for launching the VMs should not exceed eight seconds. When setting these targets, some things to keep in mind are: Using historical data. If you need to set SLOs based on a 30-day rolling period, gather data from multiple 30-day windows to define the targets. If you lack this historical data, start with a more manageable goal, such as aiming for 99% availability each day, and adjust it over time as you gather more information. Remember, SLOs are not set in stone; they should continuously evolve to reflect the changing needs of your service and customers. Considering dependency SLOs. Services typically rely on other services and infrastructure components, such as databases and load balancers. For instance, if your service depends on a SQL database with an availability SLO of 99.9%, then your service's SLO cannot exceed 99.9%. This is because the maximum availability is constrained by the performance of its underlying dependencies, which cannot guarantee higher reliability. Challenges of SLOs It might be intriguing to set the SLO as 100%, but this is impossible. A 100% availability, for instance, means that there is no room for important activities like shipping features, patching, or testing, which is not realistic. Defining SLOs requires collaboration across multiple teams, including engineering, product, operations, QA, and leadership. Ensuring that all stakeholders are aligned and agree on the targets is essential for the SLO to be successful and actionable. Step 4: Account for Error Budget An error budget is the measure of downtime a system can afford without upsetting customers or breaching contractual obligations. Below is one way of looking at it: If the error budget is nearly depleted, the engineering team should focus on improving reliability and reducing incidents rather than releasing new features.If there's plenty of error budget left, the engineering team can afford to prioritize shipping new features as the system is performing well within its reliability targets. There are two common approaches to measuring the error budget: time based and event based. Let's explore how the statement, "The availability of the service should be greater than 99% over a 30-day rolling window," applies to each. Time-Based Measurement In a time-based error budget, the statement above translates to the service being allowed to be down for 43 minutes and 50 seconds in a month, or 7 hours and 14 minutes in a year. Here's how to calculate it: Determine the number of data points. Start by determining the number of time units (data points) within the SLO time window. For instance, if the base time unit is 1 minute and the SLO window is 30 days: Calculate the error budget. Next, calculate how many data points can "fail" (i.e., downtime). The error budget is the percentage of allowable failure. Convert this to time: This means the system can experience 7 hours and 14 minutes of downtime in a 30-day window. Last but not least, the remaining error budget is the difference between the total possible downtime and the downtime already used. Event-Based Measurement For event-based measurement, the error budget is measured in terms of percentages. The aforementioned statement translates to a 1% error budget in a 30-day rolling window. Let's say there are 43,200 data points in that 30-day window, and 100 of them are bad. You can calculate how much of the error budget has been consumed using this formula: Now, to find out how much error budget remains, subtract this from the total allowed error budget (1%): Thus, the service can still tolerate 0.77% more bad data points. Advantages of Error Budget Error budgets can be utilized to set up automated monitors and alerts that notify development teams when the budget is at risk of depletion. These alerts enable them to recognize when a greater caution is required while deploying changes to production. Teams often face ambiguity when it comes to prioritizing features vs. operations. Error budget can be one way to address this challenge. By providing clear, data-driven metrics, engineering teams are able to prioritize reliability tasks over new features when necessary. The error budget is among the well-established strategies to improve accountability and maturity within the engineering teams. Cautions to Take With Error Budgets When there is extra budget available, developers should actively look into using it. This is a prime opportunity to deepen the understanding of the service by experimenting with techniques like chaos engineering. Engineering teams can observe how the service responds and uncover hidden dependencies that may not be apparent during normal operations. Last but not least, developers must monitor error budget depletion closely as unexpected incidents can rapidly exhaust it. Conclusion Service-level objectives represent a journey rather than a destination in reliability engineering. While they provide important metrics for measuring service reliability, their true value lies in creating a culture of reliability within organizations. Rather than pursuing perfection, teams should embrace SLOs as tools that evolve alongside their services. Looking ahead, the integration of AI and machine learning promises to transform SLOs from reactive measurements into predictive instruments, enabling organizations to anticipate and prevent failures before they impact users. Additional resources: Implementing Service Level Objectives, Alex Hidalgo, 2020 "Service Level Objects," Chris Jones et al., 2017 "Implementing SLOs," Steven Thurgood et al., 2018 Uptime/downtime calculator This is an excerpt from DZone's 2024 Trend Report, Observability and Performance: The Precipice of Building Highly Performant Software Systems.Read the Free Report

By Siri Varma Vegiraju

CORE

Beyond ChatGPT: How Generative AI Is Transforming Software Development

Look, I'll be honest — when my team first started using AI coding assistants last year, I was skeptical — really skeptical. After 15 years of writing code, I didn't believe a language model could meaningfully help with real development work. Six months later, I had to eat my words. Our team's velocity increased by roughly 40%, and our code quality metrics actually improved. But here's the thing - it's not as simple as "AI makes coding easier." The reality is more nuanced, more interesting, and frankly, more useful than the marketing hype suggests. The Reality on the Ground Let me share something that happened last week. I was debugging a nasty memory leak in our Node.js backend. Traditionally, this would have meant hours of combing through code, adding console logs, and maybe buying more coffee. Instead, I fed the relevant code and memory profiles into our AI assistant and asked it to analyze the patterns. Within minutes, it spotted something I'd missed — we were accidentally holding onto WebSocket connections in a closure. This was a simple mistake, tricky to spot. The AI didn't magically fix it, but it pointed me in the right direction much faster than traditional debugging would have. What Actually Works Based on my team's experience over the past year: 1. Code Reviews The biggest surprise was how useful AI became in code reviews. Example from our codebase: Original code: Python def process_user_data(data): if data['status'] == 'active': user = User.objects.get(id=data['user_id']) if user.subscription: if user.subscription.status == 'valid': return handle_active_user(user) else: return handle_inactive_subscription(user) return handle_inactive_user() Our AI assistant flagged this and suggested the following: Refactored version: Python def process_user_data(data): if data['status'] != 'active': return handle_inactive_user() user = User.objects.get(id=data['user_id']) if not user.subscription: return handle_inactive_user() return (handle_active_user(user) if user.subscription.status == 'valid' else handle_inactive_subscription(user)) It is not revolutionary but consistently catches these readability issues faster than human reviewers. 2. Test Generation This was another surprise winner. Here's a real example: Original function: Python def calculate_order_total(items, discount_code=None): subtotal = sum(item.price * item.quantity for item in items) if discount_code: discount = get_discount_amount(discount_code, subtotal) return subtotal - discount return subtotal The AI-generated these test cases: Python def test_calculate_order_total(): items = [ Item(price=10.0, quantity=2), Item(price=15.0, quantity=1) ] assert calculate_order_total(items) == 35.0 def test_calculate_order_total_with_discount(): items = [Item(price=100.0, quantity=1)] assert calculate_order_total(items, 'SAVE20') == 80.0 def test_calculate_order_total_empty(): assert calculate_order_total([]) == 0 def test_calculate_order_total_invalid_discount(): items = [Item(price=10.0, quantity=1)] assert calculate_order_total(items, 'INVALID') == 10.0 It caught edge cases I probably would have missed initially. What Doesn't Work 1. Complex Architectural Decisions Last month, we needed to redesign our authentication system to handle multi-tenant SSO. The AI suggestions were technically correct but missed crucial real-world considerations about session management across different client types. 2. Context-Heavy Debugging When a bug involves multiple services, complex state, or race conditions, AI tools still struggle. They can help analyze individual components but often miss the bigger picture. Real-World Impact on Development Workflow Here's how our development workflow has actually changed: Before AI Integration Plain Text 1. Write code (2 hours) 2. Write tests (1 hour) 3. Debug issues (2 hours) 4. Code review (1 hour) Total: ~6 hours per feature After AI Integration Plain Text 1. Write code with AI assistance (1 hour) 2. AI generates test cases, developer adjusts (30 mins) 3. Debug with AI analysis (1 hour) 4. AI-assisted code review (30 mins) Total: ~3 hours per feature But here's the catch - this only works because we learned how to use AI tools effectively. The first month was actually slower as we figured out the right workflows. Best Practices We've Learned 1. Prompt Engineering for Developers Instead of: Plain Text "Write a function to process user data" We do: Plain Text "Write a Python function that: - Takes a user_id and data dictionary - Validates required fields: name, email, age - Handles missing fields with default values - Raises ValidationError for invalid data - Returns a processed user object Use type hints and include error handling." The difference in output quality is dramatic. 2. Code Review Strategy We now do: First pass: AI review for style, potential bugs, and obvious issuesSecond pass: Human review for business logic and architectural concernsFinal pass: AI check for security vulnerabilities 3. Documentation Generation This has been a game-changer. Example: Python # Original function with minimal docs def process_payment(amount, user_id, method): # ... implementation ... AI expanded this to: Python def process_payment( amount: Decimal, user_id: str, method: PaymentMethod ) -> PaymentResult: """Process a payment transaction for a user. Args: amount: Transaction amount in user's local currency. user_id: Unique identifier for the user. method: Payment method object containing card/bank details. Returns: PaymentResult object containing transaction ID and status. Raises: InsufficientFundsError: If payment method has insufficient funds. InvalidPaymentMethodError: If payment method is expired/invalid. PaymentProcessingError: If payment gateway encounters an error. Example: >>> result = process_payment( ... amount=Decimal('99.99'), ... user_id='usr_123', ... method=PaymentMethod(type='credit_card', token='tok_xyz') ... ) >>> print(result.transaction_id) 'tx_abc123' """ Security Considerations One area where we've had to be extremely careful is security. Some lessons learned: 1. Never Let AI Generate Security-Critical Code Example of what not to do: Python # DON'T: Let AI generate authentication logic def verify_password(plain_text, hashed): return hashlib.md5(plain_text.encode()).hexdigest() == hashed 2. Always Review Generated SQL We've seen AI suggest vulnerable queries: SQL -- DON'T: Raw string formatting f"SELECT * FROM users WHERE id = '{user_id}'" -- DO: Parameterized queries "SELECT * FROM users WHERE id = %s", (user_id,) Looking Forward Based on current trends and my experience, here's what's actually changing: 1. IDE Integration Is Getting Serious The latest AI-powered IDEs don't just suggest code - they understand entire codebases. Last week, our IDE flagged a potential race condition in an async function by analyzing how it's called across different services. 2. Specialized Models Are Coming We're seeing AI models trained specifically for certain frameworks or languages. The TypeScript-specific suggestions we're getting now are notably better than generic code generation. 3. Testing Is Being Transformed AI is getting better at generating edge cases and stress tests that humans might miss. Our test coverage has actually increased since adopting AI tools. Conclusion Look, AI isn't replacing developers anytime soon. What it is doing is making us more efficient, helping catch bugs earlier, and handling the boring parts of coding. The key is understanding its limitations and using it as a tool, not a replacement for human judgment. The developers who'll thrive in this new environment aren't the ones who can write the most code - they're the ones who can effectively collaborate with AI tools while maintaining a clear vision of what they're building and why. And yes, I used AI to help write parts of this article. That's the point — it's a tool, and I'm not ashamed to use it effectively.

By Igboanugo David Ugochukwu