Measuring Internal Product Attributes: Software Size

Download as pdf or txt
Download as pdf or txt
You are on page 1of 107

Course Name:-Software Metrics

Chapter 5:
Measuring Internal
Product Attributes:
Software Size
compiled by Samuel Ashagrre

SOFTWARE METRICS 1
Contents
► Software size
► Size: Length (code, specification, design)
► Size: Reuse
► Size: Functionality (function point, feature
point, object point, use-case point)
► Size: Complexity
Software Size

• Software size can be described with three such


attributes:
– Length
– Functionality
– Complexity
• Length is the physical size of the product, and
• Functionality measures the functions supplied by
the product to the user.
• Complexity can be interpreted in different ways.
5
3
Different types of Complexity

• Problem complexity (also called computational complexity)


measures the complexity of the underlying problem.
• Algorithmic complexity reflects the complexity of the
algorithm implemented to solve the problem; in some sense,
this type of complexity measures the efficiency of the
software.
• Structural complexity measures the structure of the
software used to implement the algorithm.
– For example, we look at control flow structure, hierarchical structure
and modular structure to extract this type of measure.
• Cognitive complexity measures the effort required to
understand the software

7
4
Software Size
■ Internal product attributes describe a software
product in a way that is dependent only on the
product itself.
■ One of the most useful attributes is the size of a
sofware product, which can be measured statically,
i.e., without executing the system.
■ Size measurement must reflect effort, cost and
productivity.
Software Size: Length

■ Length is the “physical size” of the product.


■ In a software development effort, there are three
major development products: specification, design,
and code.
■ The length of the specification can indicate how
long the design is likely to be, which in turn is a
predictor of code length.
■ Traditionally, code length refers to text-based code
length.
Length- Code

Traditional code measures


•The most commonly used measure of source code
program length is the number of lines of code (LOC),
•But one set of lines of code are different from others.
– For example, many programmers use spacing and blank
lines to make their programs easier to read.
– If lines of code are being used to estimate programming
effort, then a blank line does not contribute the same
amount of effort.
– Similarly, comment lines improve a program's
understandability, and they certainly require some effort to
write.
– But they may not require as much effort as the code itself
13
7
Length: Code - LOC
■ The most commonly used measure of source code
program length is the number of lines of code (LOC).
■ NCLOC: non-commented source line of code or
effective lines of code (ELOC).
■ CLOC: commented source line of code.
■ By measuring NCLOC and CLOC separately we
can define:
total length (LOC) = NCLOC + CLOC
■ The ratio: CLOC/LOC measures the density of
comments in a program.
Length: Code - LOC (cont.)
The SLOC values for various operating systems in
Microsofts Windows NT product line are as follows:
Length: Code - LOC (cont.)
Length: Code - LOC (cont.)
System size
What is a large project?
• Here is a suggestion for classification of Visual Basic
project sizes.
• The classification is based on the total number of
physical lines, excluding control definitions, as this is
the easiest way to measure code size.
• The classification is based on long-time experience
with Visual Basic projects. As programming languages
differ in their uses and power of expression, this
classification may not be directly usable for other
languages.
Length: Code - LOC (cont.)

LINES Size

0..9999 Small

10,000..49,999 Medium

50,000..99,999 Semi-large

100,000..499,999 Large

500,000.. Very large


Length: Code - LOC (cont.)
■ Advantages of LOC
■ Simple and automatically measurable
■ Correlates with programming effort (& cost)

■ Disadvantage of LOC
■ Vague definition
■ Language dependability
■ Not available for early planning
■ Developers’ skill dependability
Length: Code - LOC (cont.)
Disadvantage of LOC
• According to the Computer History Museum Apple Developer
Bill Atkinson in 1982 found problems with this practice:
• When the Lisa team was pushing to finalize their software in
1982, project managers started requiring programmers to
submit weekly forms reporting on the number of lines of code
they had written. Bill Atkinson thought that was silly. For the
week in which he had rewritten QuickDraw’s region
calculation routines to be six times faster and 2000 lines
shorter, he put “-2000″ on the form. After a few more weeks
the managers stopped asking him to fill out the form, and he
gladly complied
Length: Halstead’s Work

■ Maurice Halstead’s Theory (1971~1979):


■ A program P is a collection of tokens, composed of
two basic elements: operands and operators
■ Operands are variables, constants, addresses

■ Operators are defined operations in a

programming language
(language constructs) a, b, x
if ... else + - >
100
= main()
goto
Halstead's Software Science

• The logic of Halstead’s software science is that


“any programming task consists of selecting and
arranging a finite number of program "tokens,"
which are basic syntactic units distinguishable by
a compiler.”
• A computer program, according to software
science, is a collection of tokens that can be
classified as either operators or operands.

16
Halstead's Software Science

• Based on these primitive measures, Halstead


developed a system of equations expressing
– total vocabulary,
The total number of unique operator and unique
operand occurrences.

17
Halstead's Software Science

– overall program length


The total number of operator occurrences and the
total number of operand occurrences.
– potential minimum volume for an algorithm,
The potential minimum volume V* is defined as
the volume of the most succinct program in which
a problem can be coded.
Potential Minimum Volume
The potential minimum volume V* is defined as the volume of the most short
program in which a problem can be coded.
V* = (2 + n2*) * log2 (2 + n2*)
Here, n2* is the count of unique input and output parameters

18
Halstead's Software Science

– program level (a measure of software complexity),


To rank the programming languages, the level of
abstraction provided by the programming
language, Program Level (L) is considered.
The higher the level of a language, the less effort it
takes to develop a program using that language.
L = V* / V

The value of L ranges between zero and one, with L=1 representing a program
written at the highest possible level (i.e., with minimum size).
And estimated program level is L^ =2 * (n2) / (n1)(N2)

19
Halstead's Software Science

– program difficulty
This parameter shows how difficult to handle the
program is.
As the volume of the implementation of a program
increases, the program level decreases and the
difficulty increases.
Thus, programming practices such as redundant
usage of operands, or the failure to use higher-
level control constructs will tend to increase the
volume as well as the difficulty.
20
Halstead's Software Science

– Programming Effort – Measures the amount of


mental activity needed to translate the existing
algorithm into implementation in the specified
program language.
– Language Level – Shows the algorithm
implementation program language level. The
same algorithm demands additional effort if it is
written in a low-level program language. For
example, it is easier to program in Pascal than in
Assembler.

21
Halstead's Software Science

– Intelligence Content – Determines the amount


of intelligence presented (stated) in the program
This parameter provides a measurement of
program complexity, independently of the program
language in which it was implemented.

22
Halstead's Software Science

– Programming Time – Shows time (in minutes)


needed to translate the existing algorithm into
implementation in the specified program language.
T = E / (f * S)
• The concept of the processing rate of the human brain,
developed by the psychologist John Stroud, is also
used. Stoud defined a moment as the time required by
the human brain requires to carry out the most
elementary decision.

23
Halstead's Software Science
– Programming Time – Shows time (in minutes)
needed to translate the existing algorithm into
implementation in the specified program language.
T = E / (f * S)
The concept of the processing rate of the human brain, developed
by the psychologist John Stroud, is also used. Stoud defined a
moment as the time required by the human brain requires to
carry out the most elementary decision. The Stoud number S is
therefore Stoud’s moments per second with:
5 <= S <= 20. Halstead uses 18. The value of S has been
empirically developed from psychological reasoning, and its
recommended value for programming applications is 18.
• Stroud number S = 18 moments / second
• seconds-to-minutes factor f = 60
24
Length: Halstead’s Work /2
The basic metrics for these tokens are the following

 Number of distinct operators in the program (µ1)


 Number of distinct operands in the program (µ2)
 Total number of occurrences of operators in the
program (N1)
 Total number of occurrences of operands in the
program (N2)
Program vocabulary (µ)
µ = µ1 + µ2
Potential Minimum Volume
 The potential minimum volume V* is defined as
the volume of the most short program in which a
problem can be coded.
V* = (2 + n2*) * log2 (2 + n2*)
Here, n2* is the count of unique input and output
parameters
 Program Difficulty – This parameter shows how
difficult to handle the program is.

D = (µ1 / 2) * (N2 / µ2)


D=1/L
As the volume of the implementation of a program
increases, the program level decreases and the difficulty
increases.
Thus, programming practices such as redundant usage
of operands, or the failure to use higher-level control
constructs will tend to increase the volume as well as
the difficulty.
 Intelligence Content – Determines the amount of
intelligence presented (stated) in the program
 This parameter provides a measurement of program
complexity, independently of the program language
in which it was implemented.

I=V/D
 Program estimated length

 Effort required to generate program P: number


of elementary discriminations
 Time required for developing program P is the total
effort divided by the number of elementary
discriminations per second

 In cognitive psychology β is usually a number


between 5 and 20
 Halstead claims that β =18
 Remaining bugs: the number of bugs left
in the software at the delivery time

 Conclusion: the bigger program needs


more time to be developed and more bugs
remained
Counting rules for C language/1

 Comments are not considered.


 All the variables and constants are considered
operands.
 Global variables used in different modules of
the same program are counted as multiple
occurrences of the same variable.
 Local variables with the same name in different
functions are counted as unique operands.
 Functions calls are considered as operators.
Counting rules for C language /2

 All looping statements e.g., do {…} while ( ),


while ( ) {…}, for ( ) {…}, all control statements
e.g., if ( ) {…}, if ( ) {…} else {…}, etc. are
considered as operators.

 In control construct switch ( ) {case:…}, switch as


well as all the case statements are considered as
operators.
Counting rules for C language /3

 The reserve words like return, default, continue,


break, sizeof, etc., are considered as operators.
 All the brackets, commas, and terminators are
considered as operators.
 GOTO is counted as an operator and the label is
counted as an operand.
Counting rules for C language /4

 The unary and binary occurrence of “+” and “-”


are dealt separately. Similarly “*” (multiplication
operator) are dealt separately.
 In the array variables such as “array-name
[index]” “array-name” and “index” are
considered as operands and [ ] is considered as
operator.
Counting rules for C language /5

 In the structure variables such as “struct-name,


member-name” or “struct-name -> member-
name”, struct-name, member-name are taken as
operands and ‘.’, ‘->’ are taken as operators.
 Some names of member elements in different
structure variables are counted as unique
operands.
 All the hash directive are ignored.
List out the operators and operands and also calculate
the values of software science measures like
int sort (int x[ ], int n)

{
int i, j, save, im1;
/*This function sorts array x in ascending order */
If (n< 2) return 1;
for (i=2; i< =n; i++)
{
im1=i-1;
for (j=1; j< =im1; j++)
if (x[i] < x[j])
{
Save = x[i];
x[i] = x[j];
x[j] = save;
}
}
return 0;
}
List out the operators and operands and also calculate
the values of software science measures like
List out the operators and operands and also calculate
the values of software science measures like
Result:

μ1 = 14
N1 = 53
μ2=10
N2=38
Result:

Therefore, N = 91
μ = 24
V = 417.23 bits
N^ = 86.51
V* = 11.6
L = 0.027
D = 37.03
T = 610 seconds
Exercise
boolean sort (int x[ ], int n)

{
int i, j, save, im1;
/*This function sorts array x in ascending order */
If (n< 2) return true;
for (i=2; i< =n; i++)
{
im1=i-1;
for (j=1; j< =im1; j++)
if (x[i] < x[j])
{
Save = x[i];
x[i] = x[j];
x[j] = save;
}
}
return false;
}
Critics of Halstead’s work
Developed in the context of assembly languages and too
fine grained for modern programming languages.
The treatment of basic and derived measures is
somehow confusing.
The notions of time to develop and remaining bugs are
arguable.
Unable to be extended to include the size for
specification and design.
•It depends on usage of operator and operands in
completed code.
•It has no use in predicting complexity of program
at design level.
Advantages of Halstead

Advantages of Halstead :
•Do not require in-depth and control flow analysis
of program.
•Predicts Effort, rate of error and time.
•Useful in scheduling projects.

45
Length: Alternative Methods
 Alternative methods for text-based measurement of
code length:
1) Source memory size: Measuring length in terms of
number of bytes of computer storage required for the
program text. Excludes library code.
2) Char size: Measuring length in terms of number
of characters (CHAR) in program text.
3) Object memory size: Measuring length in terms of an
object (executable or binary) file. Includes library code.
 All are relatively easy to measure (or estimate).
Length: Code - Problems /1
 One of the problems with text-based definition of length
is that the line of code measurement is increasingly less
meaningful as software development turns to more
automated tools, such as:
 Tools that generate code from specifications
 Visual programming tools
Code Length in OOP

Object-oriented development also suggests new ways to


measure length.

Pfleeger found that a count of objects and methods led


to more accurate productivity estimates than those using
lines of code

48
Code Length in OOP

Lorenz found in his research at IBM that the average


class contained 20 methods, and the average method size
was 8 lines of code for Smalltalk and 24 for C++.

He also notes that size differs with system and


application type.

49
Importance of Reuse in Size

This reuse of software (including requirements, designs,


documentation, and test data and scripts as well as code)
◦ improves our productivity and quality,
◦ allows us to concentrate on new problems, rather than
continuing to solve old ones again.

50
NASA/Goddard's Software Engineering Laboratory
◦ Reused verbatim: the code in the unit which was
reused without any changes.
◦Slightly modified: fewer that 25% of the lines of code
in the unit were modified.
◦ Extensively modified: 25% or more of the lines of
code were modified.
◦ New: none of the code comes from a previously
constructed unit.
51
NASA/Goddard's Software Engineering Laboratory

52
Hewlett-Packard considers three levels of code:
◦new code, reused code, and leveraged code.

reused code is used as is, without modification,

leveraged code is existing code that is modified in some


way.

The Hewlett-Packard reuse ratio includes both reused


and leveraged code as a percentage of total code
delivered.

53
Measuring Software Size:
 Function Point (FP),
 Feature Point,
 Object Point and
 Use-case Point
Function-Oriented Metrics
 Function Point (FP) is a weighted measure of
software functionality.
 The idea is that a product with more functionality will
be larger in size.
 Function-oriented metrics are indirect measures of
software which focus on functionality and utility.
 The first function-oriented metrics was proposed by
Albrecht (1979~1983) who suggested a productivity
measurement approach called the Function Point (FP) method.
 Function points (FPs) measure the amount of functionality
in a system based upon the system specification.
 Estimation before implementation!
Functionality

56
Function Points(FP) can be used to size software
applications accurately

FP are becoming widely accepted as the standard metric


for measuring software size

FP have made adequate sizing possible.

Without a reliable sizing metric relative changes in


productivity or relative changes in quality can not be
calculated.
57
Keywords

Function Point: is a unit of measure for quantifying software


deliverable (functionality) based upon the user view.

User: is any person or thing that communicates or interacts with


the software at any time

User View : is the Functional User Requirements as perceived by


the user.

Functional user requirements : are a subset of user requirements,


that describe what the software shall do (functions), in terms of
tasks and services
58
Calculating Function Points

59
Calculating Function Points
Determine the number of components (EI, EO, EQ, ILF, and ELF)

EI −The number of external inputs. These are


elementary processes in which derived data passes
across the boundary from outside to inside.
receives information from outside the application
boundary,
 In an example library database system, enter an
existing patron's library card number.
EO −The number of external output. These are
elementary processes in which derived data passes
across the boundary from inside to outside.
presents information of the information system,
In an example library database system, display a list
of books checked out to a patron.
.
60
Calculating Function Points

EQ − The number of external queries. These are elementary


processes with both input and output components that result
in data retrieval from one or more internal logical files and
external interface files.
In an example library database system, determine what
books are currently checked out to a patron.
ILF − The number of internal logical files.
contains permanent data that is relevant to the user The
information system references and maintains the data
These are user identifiable groups of logically related data that
resides entirely within the applications boundary that are
maintained through external inputs.
In an example library database system, the file of books
in the library.
61
Calculating Function Points

ELF − The number of external log files. These are user


identifiable groups of logically related data that are used
for reference purposes only, and which reside entirely
outside the system.
 In an example library database system, the file that
contains transactions in the library's billing system.
also contains permanent data that is relevant to the
user. The information system references the data,
but the data is maintained by another information
system

62
63
64
65
66
67
68
Calculating FP (2)

69
Calculating FP (3)

70
Components of TCF

71
Components of TCF

72
73
74
75
76
77
78
79
80
81
82
Calculating FP (2)

83
Example

84
Example

85
Example

86
Example

87
88
89
90
91
FP: Advantages - Summary

This FP can then be used in various metrics, such as:


Cost = $ / FP
Quality = Errors / FP
Productivity = FP / person-month
FP: Advantages - Summary
 Can be counted before design or code documents exist
 Can be used for estimating project cost, effort, schedule
early in the project life-cycle
 Helps with contract negotiations

 Is standardized (though several competing standards exist)


 It is independent of the programming language,
technology, techniques.
94
"Guesstimate" Instead of Estimate!

 FP is a subjective measure: affected by the selection of


weights by external users.
 Function point calculation requires a full software
system specification. It is therefore difficult to use
function points very early in the software development
lifecycle.
 Physical meaning of the basic unit of FP is unclear.
 Unable to account for new versions of I/O, such as
data streams, intelligent message passing, etc.
 Not suitable for “complex” software, e.g., real-time and
embedded applications.
Extended Function Point (EFP) Metrics

■ FP metric has been further extended to


compute:
A. Feature points.
B. 3D function points.
Feature Point /1
■ Function points were originally designed to be applied
to business information systems. Extensions have been
suggested called feature points which may enable this
measure to be applied to other software engineering
applications.
■ Feature points accommodate applications in which the
“algorithmic complexity” is high such as real-time, process
control, and embedded software applications.
■ For conventional software and information systems
functions and feature points produce similar results. For
complex systems, feature points often produce counts about
%20~%35 higher than function points.
Feature Point 12
■ Feature points are calculated the same way as FPs
with the additions of algorithms as an additional
software characteristic.
■ Counts are made for the five FP categories, i.e.,
number of external inputs, external outputs,
inquiries, internal files, external interfaces, plus:
■ Algorithm (Na):
A bounded computational problem such as
inverting a matrix, decoding a bit string, or handling an
interrupt.
Object Point
Object Point /1
 Object points are used as an initial measure for size way
early in the development cycle, during feasibility studies.
 An initial size measure is determined by counting the
number of screens, reports, and third-generation components
that will be used in the application.
 Each object is classified as simple, medium, or difficult.
Use-Case Point
Use-Case Point /1
 Function Point is a method to measure
software size from a requirements
perspective.
 Use-Case is a method to develop
requirements. Use Cases are used to
validate a proposed design and to ensure it
meets all requirements.

Question: How to use Use-Cases to measure


function point and vice-versa?
Use-Case Point / 2
Question: How to use Use-Cases to measure
function point and vice-versa?
Two methods:
1. Identify and weight actors and use-cases
2. Count the inputs, outputs, files and data inquiries from
use-cases (using the use-case definition and activity diagram).
Complexity
The complexity of a solution can be regarded in
terms of the resources needed to implement a particular
solution.

We can view solution complexity as having at least two


aspects:
◦time complexity: where the resource is computer time
◦ space complexity: where the resource is computer
memory.
104
Time Complexity

◦Time Complexity is a way to represent the amount of time needed


by the program to run till its completion.
◦Time Complexity is most commonly estimated by counting the
number of elementary functions performed by the algorithm.
◦And since the algorithm's performance may vary with different
types of input data, hence for an algorithm we usually use the
worst-case Time complexity of an algorithm because that is the
maximum time taken for any input size.

105
Space Complexity
◦ Its the amount of memory space required by the algorithm, during
the course of its execution.
◦ An algorithm generally requires space for following components :
◦ Instruction Space : Its the space required to store the executable
version of the program. This space is fixed, but varies depending
upon the number of lines of code in the program.
◦ Data Space : Its the space required to store all the constants and
variables value.
◦ Environment Space : Its the space required to store the
environment information needed to resume the suspended
function.

106
 To compare the efficiency of algorithms, a measure of
the degree of difficulty of an algorithm called
computational complexity.
Computational complexity indicates how much effort
is needed to execute an algorithm, or what its cost is.
This cost can be expressed in terms of execution time
(time efficiency, the most common factor) or memory
(space efficiency).
107

You might also like