Refactoring a real-case
As a result of publishing the series of post of the clock-in/out system, I had the need to write a function to import the data from a large XLS file to a Postgres database using the ORM (TypeORM).
This function was written quickly as a script without thinking too much about the quality of the code, but simply "work". However, once the system is working and a test suite is available, it is a good idea to go through a constant refactoring to improve the code.
Next I will show different refactoring steps that I have applied to this function to improve the quality of the code. Therefore, the function that we are going to improve is shown first.
const xls = require('node-xlsx').parse(__dirname + '/schedule.xls');
export function XSLToJson() {
const schedulers = [];
const users = [];
xls[0].data.slice(2).forEach(teacher => {
let schedule = {};
let user = {};
let name = teacher[0].split(',');
if (name.length > 1) {
name = name[0] + ', ' + name[1].trim()[0] + '.';
}
user = {
uid: name,
name,
};
users.push(user);
const monday = [...teacher.slice(1, 7), ...teacher.slice(8, 14)];
const tuesday = [...teacher.slice(14, 20), ...teacher.slice(21, 27)];
const wednesday = [...teacher.slice(27, 33), ...teacher.slice(34, 40)];
const thursday = [...teacher.slice(40, 46), ...teacher.slice(47, 53)];
const friday = [...teacher.slice(53, 59), ...teacher.slice(60, 66)];
[monday, tuesday, wednesday, thursday, friday].forEach((day, numberOfDay) =>
day.forEach((room, hour) => {
if (room) {
schedule = {
user: {
uid: name,
},
room: room.replace(/\n/g, ' - '),
day: numberOfDay,
hour,
};
schedulers.push(schedule);
}
}),
);
});
return [schedulers, users];
}
Firstly, we can observe that there are different concepts in the same function, this gives us the warning that we need to extract functions. The following functions can be identified in the code:
- normalizedUID: this function is used to generate the name/uid (string) of each teacher.
- getDaysByTeacher: Returns an array in which each position represents each user's day. Each of the days is also represented by an array. Hence, this function returns an array of arrays in which we will have organized the weekly working hours of each user (fixed schedules).
- factorySchedule: Create a schedule object from another object composed of the room, hour and number of day.
import.ts
function normalizeUID(teacher): string {
let name = teacher[0].split(',');
if (name.length > 1) {
name = name[0] + ', ' + name[1].trim()[0] + '.';
}
return name;
}
function getDaysByTeacher(teacher): [any[], any[], any[], any[], any[]] {
return [
[...teacher.slice(1, 7), ...teacher.slice(8, 14)],
[...teacher.slice(14, 20), ...teacher.slice(21, 27)],
[...teacher.slice(27, 33), ...teacher.slice(34, 40)],
[...teacher.slice(40, 46), ...teacher.slice(47, 53)],
[...teacher.slice(53, 59), ...teacher.slice(60, 66)],
];
}
function factorySchedule({ name, room, hour, numberOfDay }) {
return {
user: {
uid: name,
},
room: room.replace(/\n/g, ' - '),
day: numberOfDay,
hour,
};
}
The main code would be as follows after the extraction of functions:
export function XSLToJson() {
const schedulers = [];
const users = [];
xls[0].data.slice(2).forEach(teacher => {
let schedule = {};
let user = {};
const name = normalizeUID(teacher);
user = {
uid: name,
name,
};
users.push(user);
const days = getDaysByTeacher(teacher);
days.forEach((day, numberOfDay) =>
day.forEach((room, hour) => {
if (room) {
const schedule = factorySchedule({ name, room, hour, numberOfDay });
schedulers.push(schedule);
}
}),
);
});
return [schedulers, users];
}
Before going on to address the next phase of refactoring, we will eliminate the variables that are understood using the domain of the problem, also the temporary variables will be removed.
export function XSLToJson() {
const schedulers = [];
const users = [];
xls[0].data.slice(2).forEach(teacher => {
const name = normalizeUID(teacher);
const user = {
uid: name,
name,
};
users.push(user);
const days = getDaysByTeacher(teacher);
days.forEach((day, numberOfDay) =>
day.forEach((room, hour) => {
if (room) {
const schedule = factorySchedule({ name, room, hour, numberOfDay });
schedulers.push(schedule);
}
}),
);
});
return [schedulers, users];
}
In this moment, we see that we are making use of one of the most powerful features of JavaScript, functions as a parameter in the forEachmethods. But really, the forEach methods are only acting as syntactic sugar against the classical for of the imperative programming. Therefore, we will be modifying each forEach method by the most appropriate one according to the task that they perform.
The first forEach loop that we are going to address is the following:
days.forEach((day, numberOfDay) =>
day.forEach((room, hour) => {
if (room) {
const schedule = factorySchedule({ name, room, hour, numberOfDay });
schedulers.push(schedule);
}
}),
);
This loop can be transformed into the method reduce because of what we want to get a single value (an array) after iterating through each of the days. In addition, the schedule variable can be omitted and the factorSchedule method can be invoked directly since the creation of this variable does not add semantic value in the code.
days.forEach((day, numberOfDay) => {
schedulers = day.reduce((acc, room, hour) => {
if (room) {
acc.push(factorySchedule({ name, room, hour, numberOfDay }));
}
return acc;
}, []);
});
If we observe the function performed in the body of the method reduce, it is an if-else which can be transformed into a ternary operation that we can combine with the potential of immutability (another characteristic of functional programming) leaving the code as follows:
days.forEach((day, numberOfDay) => {
schedulers = day.reduce((acc, room, hour) => {
return room
? [...acc, factorySchedule({ name, room, hour, numberOfDay })]
: acc;
}, []);
});
Another small step in the refactoring would be to omit the explicit invocation of return when using fat-arrow (lambda functions) of JavaScript, leaving the code as follows:
days.forEach((day, numberOfDay) => {
schedulers = day.reduce(
(acc, room, hour) =>
room ? [...acc, factorySchedule({ name, room, hour, numberOfDay })] : acc,
[],
);
});
If we look at the code, the next natural step will be to replace the forEach method that includes the reduce by another method reduce. It may seem complicated the code that will generate since we are using the features of fat-arrow and spread operator (...) which if you never are used before can seem abstract (we will solve it by creating functions that allow you to decompose the task).
schedulers1 = days.reduce(
(acc, day, numberOfDay) => [
...acc,
day.reduce(
(schedulers, room, hour) =>
generateScheduler({ room, schedulers, hour, numberOfDay, name }),
[],
),
],
[],
);
Therefore, the next step is to build small functions that perform the task that is done within each of the reduce. But we have a small problem, we need to pass extra parameters to each of the callbacks of the reduce methods. This problem is easily solved in JavaScript with different techniques such as overwriting the scope (this), using variables from a larger scope, or your own such as the bind, apply or call methods.
In our case, we are going to create several small functions that combined with the bind method will solve the problem of passing parameters in the callbacks. In this way, the resulting code until this moment would be the following:
export function XSLToJson() {
const users = [];
let schedulers = [];
xls[0].data.slice(2).forEach(teacher => {
const name = normalizeUID(teacher);
const user = {
uid: name,
name,
};
users.push(user);
const days = getDaysByTeacher(teacher);
schedulers = days.reduce(generateSchedulers.bind(null, name), []);
});
return [schedulers, users];
}
function generateSchedulers(name, schedulers, day, numberOfDay) {
return [
...schedulers,
day.reduce(generateScheduler.bind(null, numberOfDay, name), []),
];
}
function generateScheduler(numberOfDay, name, schedulers, room, hour) {
return room
? [...schedulers, factorySchedule({ name, room, hour, numberOfDay })]
: schedulers;
}
The next step is to address the forEach method that encompasses the entire main method. The elimination of this method will allow us to eliminate the two variables that act as accumulators (users and schedulers). In such a way, that the method reduce will receive as an accumulator an array that contains two arrays (schedulers and users), which will be unstructured to be able to be concatenated easily in the method reduced.
After refactoring the code we find the following version:
export function XSLToJson() {
return xls[0].data.slice(2).reduce(
([schedulers, users], teacher) => {
const name = normalizeUID(teacher);
const user = {
uid: name,
name,
};
const days = getDaysByTeacher(teacher);
const scheduler = days.reduce(generateSchedulers.bind(null, name), []);
return [[...schedulers, scheduler], [...users, user]];
},
[[], []],
);
}
In the same way as we did in the previous case, the method reduce receives a callback that is the one that performs the conversion to JSON, and finally we rename the content of the file to a variable called xls. So the partial result is the following:
export function XSLToJson() {
return xls.reduce(convertToJson, [[], []]);
}
function convertToJson([schedulers, users], teacher) {
const name = normalizeUID(teacher);
const user = {
uid: name,
name,
};
const days = getDaysByTeacher(teacher);
const scheduler = days.reduce(generateSchedulers.bind(null, name), []);
return [[...schedulers, scheduler], [...users, user]];
}
Finally the result of the script after making these changes is shown below:
const xls = require('node-xlsx')
.parse(__dirname + '/schedule.xls')[0]
.data.slice(2);
export function XSLToJson() {
return xls.reduce(convertToJson, [[], []]);
}
function convertToJson([schedulers, users], teacher) {
const name = normalizeUID(teacher);
const user = {
uid: name,
name,
};
const days = getDaysByTeacher(teacher);
const scheduler = days.reduce(generateSchedulers.bind(null, name), []);
return [[...schedulers, scheduler], [...users, user]];
}
function normalizeUID(teacher): string {
let name = teacher[0].split(',');
if (name.length > 1) {
name = name[0] + ', ' + name[1].trim()[0] + '.';
}
return name;
}
function getDaysByTeacher(teacher): [any[], any[], any[], any[], any[]] {
return [
[...teacher.slice(1, 7), ...teacher.slice(8, 14)],
[...teacher.slice(14, 20), ...teacher.slice(21, 27)],
[...teacher.slice(27, 33), ...teacher.slice(34, 40)],
[...teacher.slice(40, 46), ...teacher.slice(47, 53)],
[...teacher.slice(53, 59), ...teacher.slice(60, 66)],
];
}
function factorySchedule({ name, room, hour, numberOfDay }) {
return {
user: {
uid: name,
},
room: room.replace(/\n/g, ' - '),
day: numberOfDay,
hour,
};
}
function generateSchedulers(name, schedulers, day, numberOfDay) {
return [
...schedulers,
day.reduce(generateScheduler.bind(null, numberOfDay, name), []),
];
}
function generateScheduler(numberOfDay, name, schedulers, room, hour) {
return room
? [...schedulers, factorySchedule({ name, room, hour, numberOfDay })]
: schedulers;
}
Summary
- Extract functions.
- Name the functions and variables must be an identification value.
- Use the built-in language methods for your use (forEach against of reduce).
- Abstract complexity and details of the code to auxiliary functions of lower level.