Uncategorized – Gadre Consulting

How to design a software

Mak — Wed, 14 Feb 2024 16:50:38 +0000

Remember that a software must be designed to meet the business goals and only to meet the business goals.

Software design is not just the design; you have to worry about many things with financial considerations and shifting ground under you.

Establish the goals and non-goals of the business.

Identify the features needed for the Minimal Viable Product MVP) and then negotiate with the marketing team to establish a Minimal Marketable Product, most probably it will be MVP + more bells-and-whistles.

Armed with the above, draw up a high level componentized design. with the view that these components will be pluggable and replaceable at will.

Now for each component try to see what all sub components are needed e.g. Database, Messaging System, Data Persistence Silo, Statistical Analysis Engine etc..

Get the sub components’ list and bring the pieces together to form a master list of sub-components needed and remove duplicates.

Now for each sub-component evaluate if you can or want to build it yourself or you need to take a 3rd party dependency. Realistically, some OS/Low level dependencies cannot be avoided, e.g. one cannot be realistic about the hope to write their own OS or IIS/Tomcat/WebSphere replacement or DB server from scratch. These underlying things are well understood and are the industry-standard hard dependencies.

Now worry about the other dependencies:

If you want take on a 3rd party dependency, understand the long-term implication of that dependency. If this component is one open-source “DIY” component with public forums (stackoverflow etc.) as the only support group :-), always bear in mind that you are taking on a tough dependency. Anticipate and account for the possibility that you may have to assign engineers to understand the open-source code, and modify it, recompile it and like a good Samaritan, contribute back to the open-source movement. [On another side note: These pieces love to break when some version of an dependent component gets updated by Gradle or Maven or npm etc. Do not depend on Maven & Co. to pull down all the dependencies at compile time. Pull them down yourself, reference them and add in your source repository as external dependencies. Update them only when you are satisfied when “updated everything works”. Forgive me for my digression, this is not Software Design – It comes under “how to actually run a dev project”.] Another thing to worry is that these components may go out of fashion (like Hadoop, Ruby did and Blockchain is on its way); and once they are out of fashion, you will not find anyone who even wants even to talk about them.

If dealing with a commercial 3rd party component, understand the licensing and SLA promises and costs.

Now look at the skillset available in the job market. Do we do this in C# / Java / PHP / Python, whatever? This depends upon the availability of personnel. Your beloved language does not matter. e.g. I love C but there is nothing popular happening in C, it is used only for real low-level stuff, AND HENCE there are no C programmers available in the market anyway.

[ A side-story: In one of the projects, my client inherited large piece of code written by someone else in a language called Clojure – a lisp-like layer on top of JAVA. It was impossible to find Clojure coders in the job market, so this piece remained a mystery black box, and nobody wanted to touch it because nobody understood it. If a defect came or modifications were needed, they would just pre-munge and post-munge the data and code flow. Recently the company actually invested in a few engineers to learn Clojure, understand this code, and have started rewriting the functionality in JAVA (because the company is a JAVA shop and JAVA programmers are available in the market).]

Now take this (with a hand-in-glove with Marketing folks) to the business owners and show the costing with choices (Dot NET v/s JRE, DB2 v/s Oracle v/s MySQL etc.). I have seen in one instance, the business owners thought that Oracle license was too expensive and we should use free version of MySQL. From the componentized design point of view it was just another RDBMS, but from infrastructure point of view, they had to have person dedicated to support the MySQL instances. And they call all this Software Design Costs… forgive them, for they do not know what they are doing.

Now design some plug-in points for feature creeps from Marketing folks. Marketing folks keep on looking at the competition and always want to be the “warmest and fluffiest croissant” in the market. Keep some some “Reserved” parameters in APIS or undocumented secret APIS so that you can sneak in these late-comer features without shaking the whole foundation. It is like a spare tire (en-US) / tyre (en-GB) / Stepney (en-IN) in the trunk of your car, which you pray fervently to the almighty, that it will never to have to be used, but you keep it anyway,

Many marketing folks have the “Ostrich Syndrome” i.e. whenever they see something shiny thing in the industry, they, like an ostrich, want to pluck it. Even if your software handles tiny 6 MB files, they will want to use SPARK/HADOOP cluster, blockchain or ChatGPT or whatever is in fashion, so that they can say that in the marketing messages and they will keep on asking for including the latest fashionable technologies during the product development cycle. Sometimes you have to succumb to their demands. As a software engineer you must clearly call out the cost of incorporating and maintaining such irrelevant add-on components.

Deployment is not a part of Software design. Whether to use AWS or Azure or whatever else are a part of the deployment strategy.

Use with caution! – Getters Setters in JAVA and Properties in C#

Mak — Mon, 21 Aug 2023 13:10:51 +0000

A lot of programmers and gurus on the internet talk about Getters and setters in Java and (almost similar to Properties in C#) and most of them seem to love these constructs. The claim that “they hide the private implementation” is very valid and it is root cause of this evil.

Let us take the case of a simple class sitting in JAR, coming from a 3rd party library:

public class BankAccount { public double getBalance() { } … … }

The innocent programmer sees that getBalance() is a method on a BankAccount object, starts calling it, and the programs starts dragging its feet. The problem is that the getBalance() does a DB query under the hood and therefore is slow. The method name should have been getCurrentBalanceByLiveQuery() and any semi-witted programmer would have guessed “Ah! that would be slow”.

I have mandated that in the teams that I (try to) control, Getters and Setters should purely get or set the variable and just that. No extra validations, or whatever else, and if you are already doing this, then why not just expose the fields directly? Some code generation tools “depend” upon Getters and Setters, so my rule is to keep Getters and Setters do only what their names suggest.

If there is a method setBalance(double qMoney) and it does a DB update, I would call it setBalanceAndUpdateDB()…

This discussion applies to Object Properties in C#.

Unicode: What is it?

Mak — Sat, 22 Oct 2022 11:45:08 +0000

There are many scripts in the world. In this article the word script means the system of writing, alphabets, characters; it does not mean VB Script or Java Script…

Scripts

A script is a a set of alphabet and its characters used for writing textual content. http://en.wikipedia.org/wiki/ISO_15924 has the list of scripts standardized by ISO. The scripts relate to languages in many-to-many correspondence.

We have many examples of one script used to write different languages. E.g.

Latin Script (ISO:LATN) is used to write English, French, German etc.
Arabic Script (ISO:ARAB) is used to write Arabic, Farsi, Urdu, Pashto etc.
Devanagari Script (ISO:DEVA) is used to write Sanskrit, Hindi, Marathi, Konkani, Nepali etc.

There are languages which can be written in different scripts. E.g.

Panjabi in India is written is Gurumukhi Script (ISO:GURU), but is written in Arabic Script (ISO:ARAB) in Pakistan.
Serbian is written in Latin Script (ISO:LATN) or Cyrillic (ISO:CYRL) depending upon the geographical area or preference.
English which is generally written in Latin Script (ISO:LATN), is also written in Braille Script (ISO:BRAI).

One should consciously think about a script as a way of representing the language without speaking.

History

In early 1960s to 1980s, computers were almost exclusively used by scientific and engineering community for heavy calculations. A unit of storage in computers is a byte, which consists of 8 bits. Each bit can be either ON or OFF, represented as 1 or 0. In a set of 8 bits, i.e. a byte, 256 numbers can be represented, which are the possible combinations of 1s 0s. This is basically called as Binary Numeral System. For excellent information on Binary Numeral system, please refer to the Wikipedia Article http://en.wikipedia.org/wiki/Binary_numeral_system . In the earlier computers, there were no display screens or printers, the computer used a series of lights (On or OFF) to display the computer answer.

People were not happy with just numbers and blinking lights but electric typewriters and printers were available. Computer folks came up with the concept of representing characters with numbers. They designed the first encodings and the encoding called ASCII was adopted widely. ASCII used 7 bits i.e. 128 possible numbers and mapped characters to them, e.g. character ‘A’ was represented by number 65 and so on. Then they wrote computer instructions to print the shape of character ‘A’ whenever the binary number 65 came up. The 128 available characters encoded all the characters and punctuation required by English and some non-printable character which were actually commands to the printers e.g. 10 meant “advance to next line”, 12 meant “advance to next page” etc.. The first goal was to be able to print business information (bank statements, inventory information, invoices etc.) and the programming instructions (code) written in programming languages like FORTRAN, COBOL, BASIC etc. Later display screens were developed and the printer technology was extended to these displays, so the shapes of characters started showing up on monitor screens.

In Western Europe, French, German, Spanish languages were widely used and ASCII was extended to 8 bits (Extended ASCII) allowing 256 possible values, and that took care of the alphabet of the languages of Western Europe i.e., French, German, Spanish etc. People in Greece and Russia and other countries started to use computers and found out that they needed place for their characters, so they started using some characters from the 256 to represent characters specific to Greek or Russian. These maps are called Code Pages; please refer to https://en.wikipedia.org/wiki/Code_page . Soon other languages joined the party and wanted their share of characters in the available 256.

So there came a situation when, depending upon how you looked at it, the character code number say, 195 represented

Thai letter “Ro Rua” ร
Greek letter capital “gamma” Γ
Arabic letter “Alef with hamza below” إ
…

and one could not reliably tell what the text was unless one knew in advance, what script (alphabet) one was trying to read.

And then the Japanese, Korean and Chinese alphabets entered the fray bringing with them the thousands of letters of their alphabet…

So, over the years people of the world came together and created the Unicode standard where every character got its own unique value. Of course, they could not fit everything in 256 characters so they decided to look at 2 bytes i.e., 16 bits together giving a possible range of 65536 characters and all scripts get their own space. With Unicode, all the data can now coexist in one file or document or communication without being misinterpreted.

Since Unicode took up twice the space, people did not adopt it easily. Those who had data in only one language and never sent in internationally did not care, and kept on using the single byte systems. In the last 10 years or so, the cost of data storage both in RAM and on disk has come down, and communication speed has increased so the data size being twice does not matter anymore. UTF8 is a popular format (called UTF8 encoding) for storing and sharing Unicode text so that ASCII codes remain the same.

Unicode is neither a linguistic standard not a phonetic standard. It standardizes scripts.

Recently, it was acknowledged that 65536 characters were not enough for all scripts of the world, so now Unicode has extended itself to be 32 bits by adding surrogate characters, allowing over 4 billion characters.

Transliteration

Generally, languages are written in the traditionally standardized scripts. For example, the traditional way to write the word “knowledge” is using LATIN script, but one can write “नॉलेज” in Devanagari or “నోలేజ్” in Telugu, the pronunciation will be almost the same. Transliteration is writing of a language in the script which is not traditionally used for the language. Many text messages are written transliterated in LATIN script because of the universal acceptance of LATIN characters. E.g., It is common send a text message of the greeting “Namaste” in LATIN in India, though traditionally it would be written in Devanagari as “नमस्ते”.

Fonts

Fonts are data files that define visual shapes of characters. Here is an example of the same text in different fonts.

The last two examples are interesting, because they show Latin script text (i.e. English text) in practically unreadable form using the font Wingdings and Shusha. During the days before Unicode, fonts were created for 256 places of characters (Code Page) with the idea of Wingdings and the font designers put different looking shapes for Latin characters and it looked like a different alphabet. This is referred to as “Font based encoding”, which, in other words, means that you cannot read the text unless you have the font to go with it. If you do not have the font, the data will look like gibberish text.

If the data is saved in Unicode, the data retains its identity even if the font does not have a visual representation of the character. In such cases a rectangular box is displayed on the screen. However, if you are seeing Question Marks instead of characters, it means either that the text was not originally written in Unicode, but was converted to Unicode using the wrong Code Page, or the program which is displaying the text is not Unicode compliant, and uses Code Pages.

Unicode Consortium’s web site is http://www.unicode.org and Wikipedia page is https://en.wikipedia.org/wiki/Unicode , please visit them to get a better understanding of Unicode, better and more “official” than what I have summarized in a few paragraphs above.

Block Chain: Do I need it?

Mak — Thu, 15 Sep 2022 15:11:08 +0000

Let me go to explain block chain as I understand it today. There are three important manifest aspects to block chain.

Immutable Records

This is something like sculpting on a stone slab with a chisel. A chiseled stone cannot be changed once a Mark Master Mason has marked it with his unique symbol. If anyone changes the content, the tampering can be easily recognized. Further if a change or a new information has to be added, editing in place is not possible therefore a new stone slab will be sculpted with new information, and will have the reference to the earlier stone slab. Something like this:

The little mark after the name of the mason (James Bond, Hercule Poirot etc.) is the unique mark of the mason.

Distributed Ledger

Multiple identical replicas of the ledgers are kept in different places. [Of course, in the above example it would be practically impossible to make replicas of the chiseled stones.] Before the widespread use of computers, hand transcribed & certified true copies or photostat copies of ledgers could be taken and stored in different locations. Later when computers arrived, copies could be stored on paper tapes or cards, floppy disks, magnetic tapes, or optical storage like CDs. However, these were primarily used for archival purposes.

Now, with the advent of networked computers, one can easily replicate the ledgers to be stored on different computers. The important aspect to note is that these copies of ledgers can be accessed rapidly.

The word “distributed” in distributed computing means something else to software engineers; they basically think of “distributed” as different parts of the program or algorithm executing on different computers. However distributed ledger generally means that the whole ledger replicated on different computers.

Consensus Approval

This is democratic voting. 51% or more voters decide the fate of the proposal. In this case the computers are voting via their algorithms.

Example

Now let us take an example with a pseudo-real-life scenario.

Monopoly Game – Normal – without the block chain:

Six friends are playing a game of Monopoly and a 7^th friend is the designated banker. The game is played like with the usual monopoly rules except one difference; players do not handle Monopoly-money, but the banker maintains the “accounts” i.e., balances for every player and status of properties. Every transaction like buy property, pay rent, pay $50 to get-out-of-jail, get $200 on “PASSING GO” etc. are validated, executed and recorded by the banker, who is the sovereign authority for all account ledgers.

Let us say that “Player-A” wants to buy say, “Boardwalk (US Edition)” or “Mayfair (UK Edition)” she/he will request the banker to execute the transaction. The banker will check if the property is for sale and if Player-A’s ledger has enough money. If satisfied, the banker will do the needful and proceed to update the account balance & property card in the banker’s records.
If the Player- A “PASSES GO”, the banker will add $200 to her/his account.
and so on…

Monopoly Game – with the block chain:

Six friends are playing a game of monopoly but there is a neither a banker nor any monopoly-money. Every player maintains the “account” balances of all the players, all properties and there is a notional virtual “bank” having infinite money.

Let us say that “player A” wants to buy say, “Boardwalk (US Edition)” or “Mayfair (UK Edition)” she/he will send the intended transaction to the remaining 5 players who are tasked with validating the transaction. They look up their own records, validate that the property is not owned by anyone else, and that “Player-A” has enough money in the account to buy it. As soon as 3 out of 5 (i.e. 51%+) agree that the transaction can go through, the transaction is completed and all players update their copies of the records of Player A and the property card.
If the Player A “PASSES GO”, a transaction of adding $200 to Player A’s account is sent for approval to the remaining 5 players, who have to ensure that the player A has already passed Go (there is no need to check the bank because it has infinite money). Once 3 out of 5 say “OK” to the impending transaction, all 6 update their own records of Player A.
and so on …

When you “PASS GO” you get $200 out of thin air from the Monopoly’s non-existent virtual bank, just like Bitcoins

Further pondering

Now-a-days, many software products aspire to use block chain, and the concept has been marketed very well. I have spoken to quite a few software startups and many of them want to use block chain but in my opinion, they don’t need it. It is similar to an “item-song” in a Bollywood movie :-); this song has nothing to do with the storyline of the movie. Using block chain is the “in vogue, chic, and trendy” thing to do these days. A lot of times product promoters, marketing and product positioning teams think that having “We use block-chain! No kidding…!” as a slogan will attract funding (or better funding) and/or more customers, and it may be actually true.

What I think is important are the “immutable records” i.e., written once, read many times (like a CD). These are essential for auditing purposes. If any record needs to be modified, a new record has to be created with the reference of the existing record so that the audit trail is preserved.

Another way to record the trail is to keep a “record of change” for every modification that happens. Traditional databases (MSSQL, Oracle DB, IBM DB2 etc.) can do this.

To make the whole ledger tamper resistant, one can keep multiple copies of the records (replicated ledger) and ensure that all copies are synchronized.

Let me ponder over where I could use parts of block-chain…

An airline company may want to keep a track of where a particular aircraft traveled throughout the year (SEA to SFO to CDG to LHR to YVR… and so on…) and correlate it to maintenance costs or fuel costs. For this purpose, immutable records YES, distributed ledger (maybe) as a backup, but consensus voting NOT NEEDED.
A democratic organization wants to provision e-voting in an election. A vote is recorded once and retrieved for counting purpose. All one needs is to ensure immutability. For this purpose, immutable records YES, distributed ledger (maybe) as a backup, consensus voting NOT NEEDED.

When it comes to “Consensus Voting” I have my doubts. Maybe 51% is not enough for some organizations, like banks – they may want a black-ball system where 100% ledger-keepers must validate proposed transaction. I have never been really convinced of 51% consensus voting in case of any transactions which have financial ramifications, e.g., movement of funds or inventory. I would go for 100% voting even if I have distributed the ledger on multiple computers.

51% attack is possible when a hacker gets control of 51% of the ledger-keepers and forces nefarious results. Many erudite software engineers have written on how to deal with the 51% attack. Navigate to https://www.google.com/search?q=51+attack+blockchain and you will see a lot of information.

Let us say in the consensus voting, 54% say YES, and the rest 46% say NO or abstain, there should be a root-cause analysis done to see why these 46% did not say YES, which nobody seems to mention though…

The concept immutable records and distributed ledgers have existed for more than 30 years. Block chain concept tries to formalize the process in a consistent way.

If you have a hammer in your hand and walk around the house, many objects look tantalizingly like nails. Blockchain is one such hammer, and it is fashionable too, therefore the temptation to use it for anything can be overwhelming .

Software Engineers or Software Plumbers

Mak — Mon, 29 Aug 2022 15:44:28 +0000

What do we have here? Software Engineers or Software Plumbers?

I always like to use metaphors so here we go!

The construction Industry, is very similar to software industry.

We have

Builders

Builders seize the opportunities of building an apartment building, and work through the determination of price point, profitability, legal aspects of land acquisition, financing etc. They eventually decide that they should have, say, 10 condos of say 150 sq. meters, and 10 condos of 120 sq. meters; a total of 2700 sq. met. of sellable area.

Architects & Engineers:

Architects/Engineers create the building plans in accordance with the builders’ aspirations and requirements. They study the utilitarian and aesthetic aspects and come up with building plans and get them approved by the interested parties. They decide on the technical aspects of the actual construction work, like size of beams and columns, steel required to withstand the load, which brands of building materials to use, electrical loads and requirements etc. so that the building is looks as close as possible to the approved plan.

Construction Crew:

Excavators, Plumbers, Masons, Electricians, Painters etc. Construction crews follow the orders of the Architects & Structural Engineers and do the actual construction.

Plumbers get the orders to install a shower stall, a sink and a WC in apartments at the specified locations. They are told the brand of shower head, the brand and color of washbasin and WC to use and so on. They assemble the parts together put appropriate pipes in the appropriate places and complete the work.

Masons lay the bricks and build walls with the thickness and height etc. following the specifications given by the Engineers and Architects.

Electricians do the wiring and install appliances and points as per specifications given by Engineers and Architects.

…

Plumbers do not have competence to decide where to put the shower stall, or the diameter / slope of the sewage pipe required for the effluent of the building. Masons do not know how thick the wall should be and where to build it, and, electricians do not know the gauge of wire required to carry the necessary current.

On the other hand, the Architects & Engineers know where to build a wall (and maybe even theoretically know how to lay the bricks), but a skilled mason can do a much better job of actually building the wall. Architects & Engineers know what the slope and diameter of a sewage pipe should be and how it should be laid, but a specialist plumber has the expertise to do install it well, and ditto for electricians.

If I translate this to software building parlance, we have the Business Owners, Software Architects/Engineers and Software Plumbers

Business Owners:

The business owners envision the software with target market in mind, procure the finance, do the competitive analyses, legal analyses etc. and engage Software Architects/Engineers to design the product for them.

Software Architects/Engineers do deep analyses and decide what existing software can be reused and repurposed, or what needs to be created from scratch.

If something needs to be created from scratch, they either prototype it themselves and hire/engage other software engineers create it under their supervision. These software engineers work together, innovating and challenging the established practices. They might write some new code or sometimes take a partially usable piece from some other software, understand it and repurpose it to suit the needs.

And then we have the software plumbers, they put together the pieces already made available to them by the Software Architects/Engineers, and complete the product.

Last few years, I have interviewed hundreds of candidates for Software Engineering jobs, and most of them are software plumbers, and too mediocre software plumbers.

Python is the latest hot hero/heroin of the industry (Ruby used to be it in 2012-13). Python *Programmers* just know how to call NUMPY and SCI-KIT etc. People get offended when I call them Python Plumbers.

Maybe, it is so that most of the good things have been already invented/innovated, we do not care about resource contention anymore; because processors, RAM and disk space are cheaper and faster. Most of the newer graduates from colleges have never heard of page faults and clustered indexes….

I wonder if the world just needs software plumbers in future. What is in a name, let us call them Software Engineers.