Developer Compendium: Functions

Introduction
Parameters and arguments
Do one thing
Extract common logic
Alternative views
- Inlined code
- YAGNI
Conclusion

Introduction

Functions, or "methods" if you're working in OOP-land, are where most of your codebase's complexity lives, so it makes sense to stick to a few conventions to keep things clean and easy to maintain. These conventions are generally aimed at making the next developer's life easier - it may not always be you who returns to the code, but whoever looks at it next will surely be grateful for the extra effort.

Some of the rules and conventions inherent to Bob Martin's idea of "clean code" can be overdone, so it pays to be conscious of the risk of going too far. Over the course of my (admittedly relatively short) professional career, I've found that an overengineered codebase can be as much of a nightmare to work with as one which appears to follow no principles at all - so, as with all things in life, striking a balance is key.

In its simplest form, a function should read like a book: left to right, top to bottom. If you need to move in any other direction whilst reading through the code then you should probably rethink your approach. Use of language is imperative to comprehension; write it well, and structure it properly.

The examples used in this article are written in C#, but the principles should translate well across a variety of languages.

Quick tip: write it badly, then refactor

Sometimes the best way to start a function is to just get it working, no matter how badly it's written. Chuck all of the logic you need into a single function if you have to - it may be long, and littered with duplications, ugly loops, and hard-coded strings, but the idea is that you eventually fix the mess you've made!

Once the function is working, add tests if you want, implement Test Driven Development if you haven't already, then get to work on refactoring using some (or all) of the following guidelines.

Parameters and arguments

I'll just clarify the following before continuing:

I use parameter when referring the type and name of a value in a function's signature
I use argument when referring to the value which is being passed into the function from elsewhere.

Functions perform all manner of...functions...and many require values to be passed into them from a calling function. If you're creating or updating an entity, you need to provide all of the values for the properties which make up that entity, for example:

public async Task CreateEmployee(
    string forename,
    string surname,
    string email,
    string address,
    string phone,
    string department,
    bool isManager,
    int salary)
{
    // ...
}

This function clearly has a lot of arguments. A developer scanning over this code may not care about each individual value, and is therefore having to read through information which they don't need to know about. What's worse is that a call to this function may look something like the following:

await _employeeService.CreateEmployee(
    employee.Forename,
    employee.Surname,
    employee.Email,
    employee.Address,
    employee.Phone,
    "Sales",
    false,
    30000);

The first five values are clear because the properties are well named, however the last three are ambiguous - what do "Sales", false and 30000 refer to? Sure, a developer may be able to deduce the meaning from the type, but they shouldn't need to, and they'll likely have to refer to the function's signature to confirm.

This leads neatly into two methods which can help improve a function signature's readability:

using named parameters
defining an object type

Named parameters

In cases where the meaning of an argument being passed into a function can't be determined by its value, either by a variable name, or from very simple deduction, named parameters become very useful in removing that ambiguity.

await _employeeService.CreateEmployee(
    employee.Forename,
    employee.Surname,
    employee.Email,
    employee.Address,
    employee.Phone,
    department: "Sales",
    isManager: false,
    salary: 30000);

The few extra keystrokes and slightly larger files are meaningless in the grand scheme of things, and far outweighed by the benefits of making a developer's life slightly easier. This doesn't need to be used everywhere, for example using named parameters for the employee arguments above adds no real value, only where clarity is required.

Defining an object type

When working with functions which handle the create, read, update and delete (CRUD) behaviours of an object, in this example an employee, the standard approach is to define a class and pass that object to the function.

public async Task CreateEmployee(Employee employee)
{
    // ...
}

public class Employee
{
    public string Forename { get; set; }
    public string Surname { get; set; }
    public string Email { get; set; }
    public string Address { get; set; }
    public string Phone { get; set; }
    public string Department { get; set; }
    public bool IsManager { get; set; }
    public int Salary { get; set; }
}

This shortens the function signature making it much quicker to understand, and ensures that changes to the Employee model are reflected in this function without the need to update the parameter list. The same principle can just as easily be applied to values which aren't directly related to the entity in question, but are related in regards to the function being called, for example:

public class CreateEmployeeRequest
{
    public string Forename { get; set; }
    public string Surname { get; set; }
    public string Email { get; set; }
    ...
    public User CreatedBy { get; set; } 👈
}

public async Task CreateEmployee(CreateEmployeeRequest request)
{
    // ...
}

This model includes all of the properties needed to create an Employee as well as a CreatedBy property, which has no bearing on the Employee entity itself, but can be used for auditing purposes as part other functionality, without having to alter the function's signature.

Combining function parameters into class objects also makes testing less cumbersome. In the CreateEmployee example above, you may wish to test the behaviour of the isManager condition as an isolated unit. You've ensured that the rest of the inputs either generate a mocked response, or are irrelevant and won't impact the test. However, because they are all listed as non-nullable arguments without defaults, you will need to provide values for each one to satisfy the function parameters every time the function is called.

Null and optional arguments

In the ongoing quest to reduce repetitive code (DRY), you may be tempted to try and bundle as much logic as you can into a single function. This can often lead to functions with bloated parameter lists, and optional arguments which are only relevant to certain logical paths.

public IShape Create3dShape(int length, int? height, int? depth, int? diameter) { ... } ❌

var cuboid = Create3dShape(5, 5, 5, null); ❌
var cylinder = Create3dShape(10, diameter: 3); ❌

When calling a function, if you find yourself having to pass null for an optional argument, or using named arguments to avoid passing null, then it's likely that your function is too DRY. Essentially, if one or more parameters are irrelevant to a certain logical path within a function, break it up and create a new function, or an overload which is tailored specifically to it's purpose.

public void CreateCuboid(int width, int height, int depth) { ... } ✅
public void CreateCylinder(int length, int? diameter) { ... } ✅

Which brings us nicely onto the next section.

Do one thing

The single responsibility principle (the "S" in SOLID), is typically used in the context of a class or object, however a similar train of thought can be applied to function definitions and behaviours as well. The idea is that a function should only perform one action, whether this be to send a message, modify an object, or retrieve some data.

If a function performs multiple tasks then it's a good sign that it needs to be broken up, as reading its signature won't give developers enough information regarding additional functionality - this additional functionality may be ambiguous or hidden, and is known as a side effect if it isn't properly acknowledged with an accurate function name. Take the following example:

public async Task CreateTenant(
    string tenantName,
    string userName,
    string userEmail)
{
    var tenant = await _context.Tenants.Add(new() { Name = tenantName });
    var user = await _context.Users.Add(new() { Name = userName, Email = userEmail, Tenant = tenant.Id });
}

If you're a developer reading this function's signature, you may expect it to create a tenant, nothing more, nothing less - you may even find it odd and question why a function called "CreateTenant" accepts arguments for a userName and a userEmail. You may have learned to expect that a tenant should have at least one user account associated to it in order to manage it, and that the next logical step after calling this function is to call another one called CreateUser, however this function doesn't exist because both actions are handled in the same place.

This sort of uncertainty doesn't really have a place in modern programming, and the issue described above ties in nicely to the previous article in this series on the importance of proper naming. To avoid repeating content, this article won't go into the naming of functions, other than to point out that functions are much easier to name when they only have a single purpose.

This particular function is relatively small because all it does is interact with a database to save some values, however it still does two things and acts on two entirely separate entities, Tenant and User.

You could clean up this function in one of two ways:

rename it (e.g. CreateTenantWithInitialUser)
split it into two methods (CreateTenant, CreateUser)

Since this is the "do one thing" section, #2 is the "ideal" solution, HOWEVER, the idea here is that there are no hidden side effects, and for such a small method I'd personally just go with #1 and rename it so that it's obvious what is happening. This example shows one of those occasions where stringently following a rule doesn't really provide any added benefit. In this case, a tenant needs to have an initial user, and it makes sense to combine the two actions into a single function - just make sure there is no ambiguity about the function's behaviour.

On the other hand, if the logic were more complex, perhaps with some conditional trees, validation, mapping logic, or error handling, you can see that it quickly starts to become quite long:

public async Task CreateTenantWithInitialUser(
    string tenantName,
    string userName,
    string userEmail)
{
    if (string.IsNullOrWhiteSpace(tenantName) || string.IsNullOrWhiteSpace(userName) || string.IsNullOrWhiteSpace(userEmail))
    {
        // do something
    }
    else
    {
        // do something else
    }

    var tenantExists = await _tenantService.GetTenantByName(tenantName);

    if (tenantExists)
    {
        // do something
    }
    else
    {
        // do something else
    }

    if (userName == "Joey Jo-Jo Junior Shabadoo")
    {
        return BadRequest("That's the worst name I ever heard");
    }

    var userExists = await _userService.GetUserByEmailAddress(userEmail);

    if (userExists)
    {
        // do something
    }
    else
    {
        // do something else
    }

    var tenant = new Tenant
    {
        Name = tenantName,
        Created = DateTime.UtcNow,
        Plan = Subscription.Basic
    };

    var user = new User
    {
        Name = userName,
        Email = userEmail,
        Created = DateTime.UtcNow,
        Tenant = tenant
    };

    try
    {
        await _context.Tenants.Add(tenant);
        await _context.Users.Add(user);
    }
    catch (Exception ex)
    {
        _logger.Error("Failed to create new tenant!" + ex.Message);
        throw ex;
    }
}

As can be seen here, only the two lines within the try block are relevant to the creation of our tenant and user, with the rest being mostly noise. The next section will look into a few approaches which can help focus the behaviour of our function by splitting logic into smaller, more manageable (and more easily testable) units.

Extract common logic

The "do one thing" principle discussed in the previous section identifies a function's unexpected "side effects" as the main culprit in making code hard to understand. However, this principle also applies to functionality which is related, or even required by the function in order to perform its duty, but can still obscure its true purpose.

Size matters

A function only doing one thing is more likely to be smaller than one doing many. A smaller function is easier to read and understand, and makes its purpose easier to absorb from a quick scan. There are no hard and fast rules to stick to, but just try to imagine reading through the function from the perspective of a developer who may be more junior than you, or less knowledgeable about the codebase, or even yourself in six months time when you've forgotten what you wrote, and ask yourself, "can I make this clearer?"

The four things mentioned in the previous section (validation, conditional branches, mapping and error handling) are all very common behaviours to see built into a function. In smaller examples it often make sense to keep all of the associated logic together, but more complex logic soon makes the code difficult to work with.

We'll briefly cover a few methods you can employ to help ensure your functions remain focused on a single purpose by looking into common separation practises.

Validation

Depending on the the type of app, and the language or framework used, validation can be handled in a multitude of ways. If you're writing client-side code in JavaScript, or back-end code on your API endpoints, you'll typically have some kind of input or request validation in place to make sure you handle invalid data properly, as well as feed any useful information back to the caller. The previous example included a few examples of validation to check for:

nulls (was a value provided?)
legitimacy (was the value provided valid?)
conflicts (does the value already exist?)

Using the example of an API endpoint written with .NET, validation can be separated from the primary function using a couple of popular conventions:

model binding with data annotations
FluentValidation

The first option makes use of a method mentioned earlier, by combining function parameters into a request object:

public class CreateTenantWithInitialUserRequest
{
    public string TenantName { get; set; }
    public string UserName { get; set; }
    public string UserEmail { get; set; }
}

This model can than be annotated with validation attributes, some which come built into .NET, and others which can be defined for custom behaviour:

public class CreateTenantWithInitialUserRequest
{
    [Required]
    [Remote(action: "VerifyName", controller: "Tenant")]
    public string TenantName { get; set; }

    [Required]
    [RealisticName]
    public string UserName { get; set; }

    [Required]
    [Remote(action: "VerifyEmail", controller: "User")]
    public string UserEmail { get; set; }
}

See the Microsoft documentation detailing model validation for more information about these attributes, including details on how to write your own custom validators. To validate this model validation in your function endpoint, simply add a check to the model state, either in the function itself, or to keep it lean, extract it into a base controller's OnActionExecutionAsync function.

if (!ModelState.IsValid)
{
    return BadRequest();
}

The second option uses a third party library called FluentValidation, which offers a similar (perhaps greater) level of flexibility and customisation, and uses the popular fluent syntax.

public class UserValidator : AbstractValidator<User>
{
    public UserValidator()
    {
        RuleFor(x => x.UserName).NotEmpty().Must(BeARealisticName).WithMessage("That's the worst name I ever heard");
        RuleFor(x => x.UserEmail).NotEmpty().Must(NotMatchExistingUser).WithMessage("That user already exists");
    }
}

private async Task<bool> BeARealisticName(string userName)
{
    return userName != "Joey Jo-Jo Junior Shabadoo";
}

private async Task<bool> NotMatchExistingUser(string userEmail)
{
    return await _userService.UserExists(userEmail);
}

Validators can then be registered in the service container to enable dependency injection:

services.AddScoped<IValidator<User>, UserValidator>();

Conditional branching

When a function branches off to perform varying actions based on a condition, it's a good indicator that you've found a logical point in which to split that action into a separate function.

Using the initial example, we may want to implement some custom behaviour if the provided user name already exists, for example, by appending some random characters to make it automatically unique, rather than returning an error:

var tenantExists = await _tenantService.GetTenantByName(tenantName);

if (tenantExists)
{
    tenantName += DateTime.UtcNow.ToUnixTimeMilliseconds();
}

Similarly, if you're performing data validation, or checking if values are null or fall within a acceptable range, the logic surrounding these can muddy the main purpose of the function, and could possibly also be abstracted for clarity.

Alternatively, if the logical tree if fairly simple and easy to follow, it's probably ok to leave it in - maybe even convert it to a ternary to reduce the amount of lines:

public async Task DoStuff(bool isAuthenticated)
{
    isAuthenticated
        ? await _myService.PerformAction()
        : throw new Exception("User needs to be logged in!");
}

However, if the logic begins to span more lines, or includes a condition which is more complex than a simple boolean, for example if it includes a switch statement, references values from other sources (e.g. checking a user exists) or contains numerous conditional statements (i.e. multiple AND or OR operators) then it gets harder to read and would benefit from being split into separate functions.

Switch statements or expression trees can be quite prone the complicating a function depending on the number of cases.

switch on (value)
    case "1":
        // do stuff
        // do stuff
        // do stuff
        // do stuff
        // do stuff

If the logic begins to extend beyond three or four lines, it's worthwhile extracting each case into a function and just calling the function.

switch on (value)
    case "1":
        // call a function
    case "2":
        // call another function

Object mapping

When handling data from various sources, you'll often come across functions which map one object type to another, for example a database entity model to a view model which is returned to the client.

Object mapping can become cumbersome and repetitive very quickly. If you find yourself mapping objects inside a function, there are a few methods you can employ to tidy them up:

Extract the mapping to a private function within the relevant class: this is useful if the mapping is isolated to a single class, and not likely to be reused
Extract the mapping to a public function in a common location: this is useful if you need to map the same object types across multiple classes and helps reduce duplication
Abstract the mapping entirely to custom tooling: the most notable in .NET being AutoMapper. However there are many developers who advocate against the use of AutoMapper, because it can add unnecessary complexity to projects which only need simple object mapping.

Error handling

Error handling is important, but if it obscures logic, it's wrong
- Michael Feathers, Clean Code

Anticipating runtime issues ahead of time and coding defensively is a great principle to follow to avoid potential headaches down the line, however it is very easy to fall into the trap of obscuring functional behaviour amongst a cloud of error handling logic. A common tactic is to wrap code in a try/catch block:

try
{
    await _myService.PerformTask(args);
}
catch (SqlException ex)
{
    _logger.LogCritical("Oops: " + ex.Message);
    throw ex;
}
catch (Exception ex)
{
    _logger.LogError("Oops: " + ex.Message);
    throw ex;
}

There's nothing inherently wrong with this, but it becomes a lot easier to stray away from a project-wide convention when handling logic is defined in every function, and if you're catching multiple exception types it can become quite chunky.

There are a number of ways in which we can handle errors more effectively, for example:

graceful error management: you don't always need to throw an exception. If you can initiate a retry of the failing code, or try an alternative method before returning a failure to the caller then that is far more valuable than throwing a generic server error
meaningful messaging: a useful error message is one which allows the user to adjust their behaviour and try again if was caused by a mistake on their part, or gives them some information to pass onto an administrator if they encountered a genuine issue. Instead of logging "Oops" like the example above, try describing the action attempted, or offer potential reasons the exception may have occurred
centralised logic: moving error handling out of individual functions and into a centralised class allows you to remove all of those messy try/catch blocks, reduce the potential for duplication, and keep your handling logic consistent across the solution. An example in .NET is to implement something akin to an exception filter.

Alternative views

Programming preferences can vary from one developer to the next, and arguments between two distinct (and often opposing) principles are commonplace. This article covered various suggestions which adhere (loosely) to the "Clean Code" guidelines, however there are some who dispute them. In this final section I'll cover a few which I've come across in the wild which relate to the writing of functions.

Inlined code

...you should be made constantly aware of the full horror of what you are doing
- John Carmack

In complete contrast to the "do one thing" guideline discussed near the beginning of this article, Carmack, a respected game developer from the Doom and Quake franchises, suggests that everything you need for a function should be kept together, with the reasoning being that the behaviour is visible in its entirety to anyone who reads it, rather than hidden behind a well-meaning function name.

Thanks to my colleague Mark Youngman for bringing this viewpoint to my attention.

YAGNI

The YAGNI (You Aren't Gonna Need It) principle, has been written about in great detail by a number of authors and programmers such as Martin Fowler, so I won't go into too much detail about it. In summary, it encourages developers to not think too long and hard about the potential future application of the code, and instead focus on the matter at hand. Spending too much time considering the what ifs of a system's architecture can often lead down the path of analysis paralysis, which can result in more time spent thinking about code than actually writing it, whilst attempting to anticipate future benefits which may never be realised.

For example, you may spend half a day refactoring a group of functions you've written for a new piece of functionality because you want to make them reusable in the future - but what if they're never re-used? Has that time been wasted? If the code was readable and well-written enough before the refactor, then it could be argued that yes, you've wasted your (or your employer's/client's) time and/or money.

The idea is that you can't always predict future use, so, to an extent, why bother trying? Essentially all code is perfect until it needs to be updated or extended to encompass some new parameters, it's only then when the cracks start to become more apparent and the time you may have spent future-proofing your code could even end up being counter-productive.

Conclusion

We've covered a few common practices to help you write effective functions. By carefully managing things like parameters and arguments you can make functions easier to use and maintain, and ensuring functions only do one thing promotes readability. Extracting common logic, such as validation, error handling, and object mapping, helps keep your code focused on a single purpose, reduces duplication and enhances consistency across your codebase.

Metadata

Title: Developer Compendium: Functions

Author: Tom Jones

Created: 28/09/2024

Tags: programming productivity