Gadre Consulting https://gadreconsulting.com Expert software architecture and design Sun, 18 Feb 2024 05:08:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.2 How to design a software https://gadreconsulting.com/2024/02/14/how-to-design-a-software/ https://gadreconsulting.com/2024/02/14/how-to-design-a-software/#respond Wed, 14 Feb 2024 16:50:38 +0000 https://gadreconsulting.com/?p=110 Remember that a software must be designed to meet the business goals and only to meet the business goals.

Software design is not just the design; you have to worry about many things with financial considerations and shifting ground under you.

Establish the goals and non-goals of the business.

Identify the features needed for the Minimal Viable Product MVP) and then negotiate with the marketing team to establish a Minimal Marketable Product, most probably it will be MVP + more bells-and-whistles.

Armed with the above, draw up a high level componentized design. with the view that these components will be pluggable and replaceable at will.

Now for each component try to see what all sub components are needed e.g. Database, Messaging System, Data Persistence Silo, Statistical Analysis Engine etc..

Get the sub components’ list and bring the pieces together to form a master list of sub-components needed and remove duplicates.

Now for each sub-component evaluate if you can or want to build it yourself or you need to take a 3rd party dependency. Realistically, some OS/Low level dependencies cannot be avoided, e.g. one cannot be realistic about the hope to write their own OS or IIS/Tomcat/WebSphere replacement or DB server from scratch. These underlying things are well understood and are the industry-standard hard dependencies.

Now worry about the other dependencies:

If you want take on a 3rd party dependency, understand the long-term implication of that dependency. If this component is one open-source “DIY” component with public forums (stackoverflow etc.) as the only support group :-), always bear in mind that you are taking on a tough dependency. Anticipate and account for the possibility that you may have to assign engineers to understand the open-source code, and modify it, recompile it and like a good Samaritan, contribute back to the open-source movement. [On another side note: These pieces love to break when some version of an dependent component gets updated by Gradle or Maven or npm etc. Do not depend on Maven & Co. to pull down all the dependencies at compile time. Pull them down yourself, reference them and add in your source repository as external dependencies. Update them only when you are satisfied when “updated everything works”. Forgive me for my digression, this is not Software Design – It comes under “how to actually run a dev project”.] Another thing to worry is that these components may go out of fashion (like Hadoop, Ruby did and Blockchain is on its way); and once they are out of fashion, you will not find anyone who even wants even to talk about them.

If dealing with a commercial 3rd party component, understand the licensing and SLA promises and costs.

Now look at the skillset available in the job market. Do we do this in C# / Java / PHP / Python, whatever? This depends upon the availability of personnel. Your beloved language does not matter. e.g. I love C but there is nothing popular happening in C, it is used only for real low-level stuff, AND HENCE there are no C programmers available in the market anyway.

[ A side-story: In one of the projects, my client inherited large piece of code written by someone else in a language called Clojure – a lisp-like layer on top of JAVA. It was impossible to find Clojure coders in the job market, so this piece remained a mystery black box, and nobody wanted to touch it because nobody understood it. If a defect came or modifications were needed, they would just pre-munge and post-munge the data and code flow. Recently the company actually invested in a few engineers to learn Clojure, understand this code, and have started rewriting the functionality in JAVA (because the company is a JAVA shop and JAVA programmers are available in the market).]

Now take this (with a hand-in-glove with Marketing folks) to the business owners and show the costing with choices (Dot NET v/s JRE, DB2 v/s Oracle v/s MySQL etc.). I have seen in one instance, the business owners thought that Oracle license was too expensive and we should use free version of MySQL. From the componentized design point of view it was just another RDBMS, but from infrastructure point of view, they had to have person dedicated to support the MySQL instances. And they call all this Software Design Costs… forgive them, for they do not know what they are doing.

Now design some plug-in points for feature creeps from Marketing folks. Marketing folks keep on looking at the competition and always want to be the “warmest and fluffiest croissant” in the market. Keep some some “Reserved” parameters in APIS or undocumented secret APIS so that you can sneak in these late-comer features without shaking the whole foundation. It is like a spare tire (en-US) / tyre (en-GB) / Stepney (en-IN) in the trunk of your car, which you pray fervently to the almighty, that it will never to have to be used, but you keep it anyway,

Many marketing folks have the “Ostrich Syndrome” i.e. whenever they see something shiny thing in the industry, they, like an ostrich, want to pluck it. Even if your software handles tiny 6 MB files, they will want to use SPARK/HADOOP cluster, blockchain or ChatGPT or whatever is in fashion, so that they can say that in the marketing messages and they will keep on asking for including the latest fashionable technologies during the product development cycle. Sometimes you have to succumb to their demands. As a software engineer you must clearly call out the cost of incorporating and maintaining such irrelevant add-on components.

Deployment is not a part of Software design. Whether to use AWS or Azure or whatever else are a part of the deployment strategy.

]]>
https://gadreconsulting.com/2024/02/14/how-to-design-a-software/feed/ 0
Use with caution! – Getters Setters in JAVA and Properties in C# https://gadreconsulting.com/2023/08/21/use-with-caution-getters-setters-in-java-and-properties-in-c/ https://gadreconsulting.com/2023/08/21/use-with-caution-getters-setters-in-java-and-properties-in-c/#respond Mon, 21 Aug 2023 13:10:51 +0000 https://gadreconsulting.com/?p=106 A lot of programmers and gurus on the internet talk about Getters and setters in Java and (almost similar to Properties in C#) and most of them seem to love these constructs. The claim that “they hide the private implementation” is very valid and it is root cause of this evil.

Let us take the case of a simple class sitting in JAR, coming from a 3rd party library:

public class BankAccount {
public double getBalance() {
}


}

The innocent programmer sees that getBalance() is a method on a BankAccount object, starts calling it, and the programs starts dragging its feet. The problem is that the getBalance() does a DB query under the hood and therefore is slow. The method name should have been getCurrentBalanceByLiveQuery() and any semi-witted programmer would have guessed “Ah! that would be slow”.

I have mandated that in the teams that I (try to) control, Getters and Setters should purely get or set the variable and just that. No extra validations, or whatever else, and if you are already doing this, then why not just expose the fields directly? Some code generation tools “depend” upon Getters and Setters, so my rule is to keep Getters and Setters do only what their names suggest.

If there is a method setBalance(double qMoney) and it does a DB update, I would call it setBalanceAndUpdateDB()…

This discussion applies to Object Properties in C#.

]]>
https://gadreconsulting.com/2023/08/21/use-with-caution-getters-setters-in-java-and-properties-in-c/feed/ 0
Software Globalization and Localization https://gadreconsulting.com/2022/10/29/software-globalization-and-localization/ https://gadreconsulting.com/2022/10/29/software-globalization-and-localization/#respond Sat, 29 Oct 2022 15:47:58 +0000 https://gadreconsulting.com/?p=80 To make the software ready for the world market, English works pretty well for many markets, however when it comes to countries like France (where English is not really adored), or Thailand (where British never ruled) or many parts of the world where English is understood only at a rudimentary level, the software in local language can be successful.

Globalization is the ability to handle data in multiple languages and with local conventions, while Localization is the ability to manifest the user interface in the local languages.

1    Globalization

To be able to handle data in any language or many languages, the software should be Unicode compliant. Newer languages like C#, Java, Python, JavaScript are already Unicode compliant. C/C++ code needs efforts to ensure that it is Unicode compliant.

1.1       Unicode

One should make an effort to understand the basics of Unicode. One can refer to Unicode.org or (for a simplistic version) my article: https://bit.ly/3sxUng7 .

1.2       Data Persistence

To be able to persist data in any language or many languages, the data must be stored in any Unicode compliant encoding. Common choices are UTF-8 (where ASCII remains 1 byte, and endian-ness does not matter) and UTF-16 (where all characters are at least encoded as 2 bytes, and endian-ness matters). In case most of the data is ASCII based, it is convenient and easy to use UTF-8. UTF-8 creates a significant in increase in size when characters beyond the ASCII range are used e.g., French character ‘ç’ is represented in 2 bytes \xc3\xa7, Hebrew character ‘ש’  is represented as \xd7\xa9, Devanagari character ‘ॐ’ is represented in 3 bytes \xe0\xa5\x90, and Japanese character ‘聞’ is represented in 3 bytes \xe8\x81\x9e, and so on. In spite of the increase in size, UTF-8 remains the widely used character encoding, possibly because the universal acceptance, availability of cheap storage and high speeds of networks.

1.3       Sending/Receiving Data

Though not directly connected with Globalization, to be able to successfully send/receive/transport data across machines there are two ways

1.3.1        Transporting persisted data

A persisted version of data (saved file) is sent and received. The sender and the receiver must conform to the format agreed upon. If a file containing textual matter is stored in UTF-8 and if both the receiver and sender know that the data is UTF-8 encoded, it will be interpreted correctly.

1.3.2        Transporting ephemeral data

This typically happens during client-server communication, or when devices send data to peers or servers. The sender-receiver protocols should be agreed upon either in advance or dynamically. As an example, in SOAP/REST API interaction, the data is sent in XML/JSON/Base64 encoded formats or LZCompressed+Base64 encoded format, and the HTML header can contain the information about the format used. The default encoding in HTML5 is UTF-8. See this article https://webhint.io/docs/user-guide/hints/hint-meta-charset-utf-8/. In case of binary data transfers using custom ports, care should be taken to understand and appropriately interpret the Endian-ness / floating point representation of the data being sent.

1.4       Date and Time storage

1.4.1        Date and Time Formats

This is the part where the software needs to worry about the way data is represented locally. Misinterpretation of Date stamps and Time stamps can lead to serious problems.

e.g.

If the date entered by a user is 6/11/12, it is open to interpretation. In USA people will think it is 11th June 2012, in Europe/India it will be thought of as 6th November 2012, and in Japan, it will be considered to be 12th November 2006. Therefore, it is important to send the date format (d/M/yy etc.) with the date if the date is coming in as a text string, or always use a standard format yyyy-MM-dd.

It also matters where the date was intended for. If a bank transaction happens at 0900 HRS in Mumbai on 12/January/2022, it is still 11/January/2022 is Seattle/San Francisco. This brings us to the consideration of Time Stamps. It makes sense to record not just the date but also the time and time zone with the date.

One can convert a datetime stamp to number of milliseconds (or any other measure of time) from a known epoch point and place to eliminate the ambiguity. Please refer to https://en.wikipedia.org/wiki/Epoch_(computing). Java Date is internally a long number which is the number of milliseconds passed since passed since 1st January 1970 at 00:00:00 HRS UTC. UTC is the Universal Time Coordinated. Similarly in .NET, it is counted starting at 1st January 0001 at 00:00:00 HRS UTC and in 100 nanosecond intervals.

It is important to store Date as a Date Time stamp using one of the conventions. A good practice is to store everything UTC on the server and convert to the value in the appropriate time zone for display and calculations. If the time stamps are stored in SQL Servers, most of the SQL servers record time stamps accurately, in the servers’ time setting.

Even if the software is not intended to be used internationally, the system may need to keep the time zone information, because many countries have multiple time zones. Keeping the Date and Time in UTC helps.

Some countries follow daylight savings time adjustment. Therefore, if the software is counting time difference from 5 November 2022 at 0900HRS to 6 November 2022 at 0900 HRS, in USA, it would be 25 hours, because on 6 November 2022, the clocks will turn back 1 hour at 0200 hrs. The operating systems handle this fine and if the software maintains the Date Time stamps in UTC, it then becomes just a matter of adding or subtracting difference between UTC and the target time zone.

Different countries require different Date and Time formats. The software should never store the formatted text for Dates and Time. The order of Year, Month, Days is different and month names are also different.

e.g., to store 12th August 2022, one can use 2022/08/12, 8/12/22, 8/12/2022, 12/8/22, 12/8/2022, 2022-August-12, 2022-août-12 (French), २०२२-अगस्त-१२ (Hindi), etc. Similar to the dates, time can be represented in multiple ways, 24 hours format, 12 hours format with AM/PM suffix. There are many ways to write say 6:30 in the evening: 6:30 PM, 1830 HRS, 18h30 etc.

One must remember never to store the date and time information is formatted strings, but format it to the desired setting of the target prior to display/printing.

1.4.2        Calendars

It is common for governments and official documents to use the official calendar of the country. E.g., Official calendar of Thailand is the Buddhist Calendar, or Official calendar of India is the Indian National Calendar, both have constant day offsets from the Gregorian calendar, and Saudi Arabian government uses the official Hijri Calendar, which is a lunar calendar, and Israel government uses the Hebrew Calendar (which is similar to the Hindu Lunar Calendar). It makes sense for the software to convert to standard Gregorian dates in UTC for storing and convert back for display/printing.

1.4.3        Money and Numbers

Typically, in USA, the money amount 123456.78 is displayed as “$123,456.78”. In France, however it will be displayed as “123 456,89 €”. In India, “₹ 1,23,456.89” will be the appropriate format. In some countries like South Korea e.g., the amounts in jeon (1/100 of a Won) are never used in normal transactions and amounts are just written as ₩ 123.

One needs to remember that whatever is displayed, frequently comes back and the software needs to parse it back. One has to use the appropriate format. In USA, UK, India, the Decimal Separator is the ‘.’ (dot) but it is ‘,’ (comma) in France or other parts of Europe.

It is important to store the currency of transaction, though generally not too many transactions happen in normal users’ account in different currencies, and even if a US person does a transaction in Canada, the amount is converted by the bank to USD by the financial institution. However, if the software is for financial institutions dealing in multiple currencies, one must (obviously!) record the currency.

1.4.4        Names of countries

There are a few cases where some country / region names are unacceptable in some countries, e.g., the China – Taiwan issue. One should avoid referring to Taiwan as R.O.C. in mainland China (P.R.C.).

1.4.5        Display Direction

This matters in the languages or countries where the culturally correct look and feel is Right-To-Left, i.e., most of the countries in the middle east.

1.4.5.1             Desktop and Mobile Applications

Underlying operating systems like Windows, Linux, iOS, Android have support for right-to-left display and text. One should try and leverage these features of the operating systems. Here is a screenshot of Hebrew localized version of Microsoft Excel.



1.4.5.1             Web pages

HTML works fine generally if the proper tag (dir="rtl") is placed in the appropriate elements and all the elements tagged correctly will flow Right-To-Left. One can visit the web sites of Arabic / Hebrew / Urdu newspapers e.g., https://www.aleqt.com/ and study the use of this HTML tag.

1.4.6        Text Display

Display of text of many (almost all) Unicode Scripts is generally well handled by standard controls in HTML, Windows, iOS, Linux, and Android. Writing a language compliant editor even without the bells and whistles like spelling checker and grammar checker, is a monumental task and there are major software companies who have invested immense efforts in these. Here are a few interesting text rendering complexities apart from the Right to Left text flow.

  • Shape Shifters: Characters that change shape depending upon what is before and after the character. This is always so for Arabic script and many Indic scripts like Devanagari.
  • Ligatures: Two or more characters form a singles shape on the screen. This is font dependent too. Latin script also has some ligatures which are mostly used in culturally correct and aesthetic representation of text. Please refer to https://en.wikipedia.org/wiki/Ligature_(writing).
  • Kashida justification:  In Arabic scripts some characters are connected horizontally by a ghost “kashida” character during display for calligraphic purposes. Please refer to https://en.wikipedia.org/wiki/Kashida.
  • No space between words: Thai script generally does not use spaces between words. A comparative example in English would be the two sentences:” There was a handout for me”, and “There was a hand out for me”. In Thai both would look like therewasahandoutforme, and the code in edit controls use complex dictionary and grammar rules to figure out the word-breaks.
  • Vertical Writing: Traditional Japanese, Mongolian scripts are written top-to-bottom and paragraphs flow Right-To-Left in Japanese and Left to Right in Mongolian.

In general, the best bet is to use standard well known controls to display text.

2    Localization

One should think about the target audience and decide the languages / countries to localize for. This can open a can of worms. Please refer to the languages specified in ISO-639. In USA, one may want to support Spanish (Mexican). In Canada one may be forced to support French (Canada). When it comes to the European Union or India, there are many that one can decide to support but one must remember that English (United States) is different from English (Great Britain) for spellings (color v/s colour).

Localization has three important things to worry about. Images, Colors and Text.

2.1       Images

One must be very cautious about creating images / logos / clip arts depending upon the target consumer. Images that seem to be perfectly fine in one culture may be offensive in other. Take the example of the Thumbs Up sign. It signifies a positive emotion like “Yes / I Agree / Good job / Ready to go” to Americans, Indians, Britons but it suggests an expletive in Greece, Russia, Middle East, Latin America similar to what the middle finger means to Americans. See https://www.deseret.com/2011/4/15/20371322/international-business-international-symbol-icon-blunders-can-be-avoided.

More serious issues arise when there are political or religious meaning attached to certain symbols. E.g., the symbol of “Swastika” is pretty much prohibited in many countries, because Nazis used it; however, it is the auspicious symbol of “Well Being” in Hinduism, Buddhism and Jainism and is reverently displayed on entrances to homes and temples in Nepal, India, Thailand, Mongolia, Sri Lanka, China and Japan. Similar issues arise when an image or icon resembles a religious symbol like the crescent of the moon, or the trident; these can start a backlash from religious factions or governments. In the current era of political and cultural correctness, it makes sense to avoid any images which may spur adverse reactions.

The images of flags of countries are not a big problem, as long as one sticks to the standard images similar to the ones used by payment gateways or web sites that show a drop down of flags to choose country (e.g. xe.com)


Screenshot from www.xe.com

Extra care should be taken while displaying maps of countries due to the ever-changing border disputes between countries. One should use well known map providers (Google, Bing, MapBox etc.) and make sure to have a disclaimer (vetted by the legal folks) under the map, something like : “This map is for indicative purposes only and may not accurately depict the international borders”.

2.2       Colors

Though not a very big-ticket item, some colors have regional/cultural significances. A person in India will associate Orange (which is almost Saffron) with Hinduism, Blue with Buddhism, and Green with Islam, but in Ireland, Green is associated with Saint Patrick’s Day. Color red could mean a positive movement in stock markets in Japan, but it means a negative movement in USA.

This is more in the theming and branding realm, but one should be aware of overlaying symbols on colors, it might inadvertently mean something else. e.g. putting a X sign on the color of religious significance/ political party may be considered as a “Ban that religion / political party” message and interested activists may start threatening.

2.3       Text and Strings

2.3.1        Translation

Getting a native speaker of the language to translate or tweak machine translated text is the correct way to go. Machine translation services do a good job of translating but native speakers of the language can tweak the machine translated text to incorporate more apt phrases and the subtleties of the languages.

Never construct sentences in parts. Let us take an example: Online shopping site wants to show friendly messages.

Your basket contains one large yellow shirt.

Novice programmers will code the string as:

Your basket contains <QTY> <SIZE> <COLOR> <ITEM>

Replace QTY, SIZE, COLOR, ITEM at run time and add a small logic to use plural of the <ITEM> in case QTY is more than one, or put a “(s)” after the <ITEM> This will fly ok in English.

But in French, adjectives have a gender and singular/plural variants, and <SIZE> goes before and the <COLOR> goes after the <ITEM>

  • Votre panier contient une grande chemise jaune.

If instead of one shirt it were one hat, it would be

  • Votre panier contient un grand chapeau jaune.

In Hindi even the verbs have genders and change according to the subject.

The subject – object – verb positioning is different in different languages. In English verbs come after the subject: for example, the genderless English sentence “I had gone to Mumbai”, will become “मैं मुंबई गया था” (for a man) or “मैं मुंबई गयी थी” (for a woman), in Hindi; the appropriate form of the verb will be used depending upon the gender of the subject.

Therefore, one should translate the whole strings, never construct it in parts.

“Your basket contains <QTY> <SIZE> <COLOR> <ITEM>”

In French will be

“Your basket contains <QTY> <SIZE> <ITEM><COLOR>”

And in Hindi it will be

आपकी टोकरी में <QTY> <SIZE> <COLOR> <ITEM> है

However more code will be needed to pick the correct gender for <SIZE> and <COLOR> depending upon the gender of the <ITEM>. Instead, one can think about doing something like this:

  • Your Basket – Item: <ITEM>, Quantity: <QTY>, Size: <SIZE>, Color: <COLOR>

And all languages may be satisfied.

English- Your Basket – Item: Shirt, Quantity: 1, Size: Large, Color: Yellow

French  – Votre panier – Article : Chemise, Quantité : 1, Taille : Grand / Grande, Couleur : Jaune

Hindi – आपकी टोकरी – वस्तु : कमीज, संख्या: १, आकार : बडा / बडी, रंग: पीला/पीली

Etc.

But it is not as friendly as “Your basket contains one large yellow shirt” … This is a compromise for reducing localization costs and finally ends up being a business decision.

The cost of translation itself may be small but the cost of assuring its quality can be an expensive item and needs strict review by native speakers of the language.

2.3.2        Hardcoded Strings and Images

One should never hardcode any strings or even images that will be seen by the user. There are many ways to pick up the correct strings and images at run time. They could be in resource files or string tables loaded at run time by the rendering program. The program itself should be completely content agnostic and should be able to take any strings / images at run time.

All platforms have ways of picking up correct resources at run time. One can use them or invent one’s own ways but one should remember never to hardcode. Here are a few links to various platform documentation

https://developer.android.com/guide/topics/resources/localization

https://developer.apple.com/localization/

https://docs.oracle.com/javase/8/docs/technotes/guides/intl/index.html

https://learn.microsoft.com/en-us/dotnet/core/extensions/localization

and one can find specific details about recommended strategies for different technologies PHP, Python, React, Angular, etc. on respective web sites.


Ever friendly and informative WIKIPEDIA has this great article: https://en.wikipedia.org/wiki/Internationalization_and_localization

]]>
https://gadreconsulting.com/2022/10/29/software-globalization-and-localization/feed/ 0
Unicode: What is it? https://gadreconsulting.com/2022/10/22/unicode-what-is-it/ https://gadreconsulting.com/2022/10/22/unicode-what-is-it/#respond Sat, 22 Oct 2022 11:45:08 +0000 https://gadreconsulting.com/?p=75 There are many scripts in the world. In this article the word script means the system of writing, alphabets, characters; it does not mean VB Script or Java Script…

Scripts

A script is a a set of alphabet and its characters used for writing textual content. http://en.wikipedia.org/wiki/ISO_15924  has the list of scripts standardized by ISO. The scripts relate to languages in many-to-many correspondence.

We have many examples of one script used to write different languages. E.g.

  • Latin Script (ISO:LATN) is used to write English, French, German etc.
  • Arabic Script (ISO:ARAB) is used to write Arabic, Farsi, Urdu, Pashto etc.
  • Devanagari Script (ISO:DEVA) is used to write Sanskrit, Hindi, Marathi, Konkani, Nepali etc.

There are languages which can be written in different scripts. E.g.

  • Panjabi in India is written is Gurumukhi Script (ISO:GURU), but is written in Arabic Script (ISO:ARAB) in Pakistan.
  • Serbian is written in Latin Script (ISO:LATN) or Cyrillic (ISO:CYRL) depending upon the geographical area or preference.
  • English which is generally written in Latin Script (ISO:LATN), is also written in Braille Script (ISO:BRAI).

One should consciously think about a script as a way of representing the language without speaking.

History

In early 1960s to 1980s, computers were almost exclusively used by scientific and engineering community for heavy calculations. A unit of storage in computers is a byte, which consists of 8 bits. Each bit can be either ON or OFF, represented as 1 or 0. In a set of 8 bits, i.e. a byte, 256 numbers can be represented, which are the possible combinations of 1s 0s. This is basically called as Binary Numeral System. For excellent information on Binary Numeral system, please refer to the Wikipedia Article http://en.wikipedia.org/wiki/Binary_numeral_system . In the earlier computers, there were no display screens or printers, the computer used a series of lights (On or OFF) to display the computer answer.

People were not happy with just numbers and blinking lights but electric typewriters and printers were available. Computer folks came up with the concept of representing characters with numbers. They designed the first encodings and the encoding called ASCII was adopted widely. ASCII used 7 bits i.e. 128 possible numbers and mapped characters to them, e.g. character ‘A’ was represented by number 65 and so on. Then they wrote computer instructions to print the shape of character ‘A’ whenever the binary number 65 came up. The 128 available characters encoded all the characters and punctuation required by English and some non-printable character which were actually commands to the printers e.g. 10 meant “advance to next line”, 12 meant “advance to next page” etc.. The first goal was to be able to print business information (bank statements, inventory information, invoices etc.) and the programming instructions (code) written in programming languages like FORTRAN, COBOL, BASIC etc. Later display screens were developed and the printer technology was extended to these displays, so the shapes of characters started showing up on monitor screens.

In Western Europe, French, German, Spanish languages were widely used and ASCII was extended to 8 bits (Extended ASCII) allowing 256 possible values, and that took care of the alphabet of the languages of Western Europe i.e., French, German, Spanish etc. People in Greece and Russia and other countries started to use computers and found out that they needed place for their characters, so they started using some characters from the 256 to represent characters specific to Greek or Russian. These maps are called Code Pages; please refer to https://en.wikipedia.org/wiki/Code_page . Soon other languages joined the party and wanted their share of characters in the available 256.

So there came a situation when, depending upon how you looked at it, the character code number say, 195 represented

  • Thai letter “Ro Rua” ร
  • Greek letter capital “gamma” Γ
  • Arabic letter “Alef with hamza below” إ

and one could not reliably tell what the text was unless one knew in advance, what script (alphabet) one was trying to read.

And then the Japanese, Korean and Chinese alphabets entered the fray bringing with them the thousands of letters of their alphabet…

So, over the years people of the world came together and created the Unicode standard where every character got its own unique value. Of course, they could not fit everything in 256 characters so they decided to look at 2 bytes i.e., 16 bits together giving a possible range of 65536 characters and all scripts get their own space. With Unicode, all the data can now coexist in one file or document or communication without being misinterpreted.

Since Unicode took up twice the space, people did not adopt it easily. Those who had data in only one language and never sent in internationally did not care, and kept on using the single byte systems. In the last 10 years or so, the cost of data storage both in RAM and on disk has come down, and communication speed has increased so the data size being twice does not matter anymore. UTF8 is a popular format (called UTF8 encoding) for storing and sharing Unicode text so that ASCII codes remain the same.

Unicode is neither a linguistic standard not a phonetic standard. It standardizes scripts.

Recently, it was acknowledged that 65536 characters were not enough for all scripts of the world, so now Unicode has extended itself to be 32 bits by adding surrogate characters, allowing over 4 billion characters.

Transliteration

Generally, languages are written in the traditionally standardized scripts. For example, the traditional way to write the word “knowledge” is using LATIN script, but one can write “नॉलेज” in Devanagari or “నోలేజ్” in Telugu, the pronunciation will be almost the same. Transliteration is writing of a language in the script which is not traditionally used for the language. Many text messages are written transliterated in LATIN script because of the universal acceptance of LATIN characters. E.g., It is common send a text message of the greeting “Namaste” in LATIN in India, though traditionally it would be written in Devanagari as “नमस्ते”.

Fonts

Fonts are data files that define visual shapes of characters. Here is an example of the same text in different fonts.

The last two examples are interesting, because they show Latin script text (i.e. English text) in practically unreadable form using the font Wingdings and Shusha. During the days before Unicode, fonts were created for 256 places of characters (Code Page) with the idea of Wingdings and the font designers put different looking shapes for Latin characters and it looked like a different alphabet. This is referred to as “Font based encoding”, which, in other words, means that you cannot read the text unless you have the font to go with it. If you do not have the font, the data will look like gibberish text.

If the data is saved in Unicode, the data retains its identity even if the font does not have a visual representation of the character. In such cases a rectangular box is displayed on the screen. However, if you are seeing Question Marks instead of characters, it means either that the text was not originally written in Unicode, but was converted to Unicode using the wrong Code Page, or the program which is displaying the text is not Unicode compliant, and uses Code Pages.

Unicode Consortium’s web site is http://www.unicode.org and Wikipedia page is https://en.wikipedia.org/wiki/Unicode , please visit them to get a better understanding of Unicode, better and more “official” than what I have summarized in a few paragraphs above.

]]>
https://gadreconsulting.com/2022/10/22/unicode-what-is-it/feed/ 0
Block Chain: Do I need it? https://gadreconsulting.com/2022/09/15/block-chain-do-i-need-it/ https://gadreconsulting.com/2022/09/15/block-chain-do-i-need-it/#comments Thu, 15 Sep 2022 15:11:08 +0000 https://gadreconsulting.com/?p=59 Let me go to explain block chain as I understand it today. There are three important manifest aspects to block chain.

Immutable Records

This is something like sculpting on a stone slab with a chisel. A chiseled stone cannot be changed once a Mark Master Mason has marked it with his unique symbol. If anyone changes the content, the tampering can be easily recognized. Further if a change or a new information has to be added, editing in place is not possible therefore a new stone slab will be sculpted with new information, and will have the reference to the earlier stone slab. Something like this:

The little mark after the name of the mason (James Bond, Hercule Poirot etc.) is the unique mark of the mason.

Distributed Ledger

Multiple identical replicas of the ledgers are kept in different places. [Of course, in the above example it would be practically impossible to make replicas of the chiseled stones.] Before the widespread use of computers, hand transcribed & certified true copies or photostat copies of ledgers could be taken and stored in different locations. Later when computers arrived, copies could be stored on paper tapes or cards, floppy disks, magnetic tapes, or optical storage like CDs. However, these were primarily used for archival purposes.

Now, with the advent of networked computers, one can easily replicate the ledgers to be stored on different computers. The important aspect to note is that these copies of ledgers can be accessed rapidly.

The word “distributed” in distributed computing means something else to software engineers; they basically think of “distributed” as different parts of the program or algorithm executing on different computers. However distributed ledger generally means that the whole ledger replicated on different computers.

Consensus Approval

This is democratic voting. 51% or more voters decide the fate of the proposal. In this case the computers are voting via their algorithms.

Example

Now let us take an example with a pseudo-real-life scenario.

Monopoly Game – Normal – without the block chain:

Six friends are playing a game of Monopoly and a 7th friend is the designated banker. The game is played like with the usual monopoly rules except one difference; players do not handle Monopoly-money, but the banker maintains the “accounts” i.e., balances for every player and status of properties. Every transaction like buy property, pay rent, pay $50 to get-out-of-jail, get $200 on “PASSING GO” etc. are validated, executed and recorded by the banker, who is the sovereign authority for all account ledgers.

  • Let us say that “Player-A” wants to buy say, “Boardwalk (US Edition)” or “Mayfair (UK Edition)” she/he will request the banker to execute the transaction. The banker will check if the property is for sale and if Player-A’s ledger has enough money. If satisfied, the banker will do the needful and proceed to update the account balance & property card in the banker’s records.
  • If the Player- A “PASSES GO”, the banker will add $200 to her/his account.
  • and so on…

Monopoly Game – with the block chain:

Six friends are playing a game of monopoly but there is a neither a banker nor any monopoly-money.  Every player maintains the “account” balances of all the players, all properties and there is a notional virtual “bank” having infinite money.

  • Let us say that “player A” wants to buy say, “Boardwalk (US Edition)” or “Mayfair (UK Edition)” she/he will send the intended transaction to the remaining 5 players who are tasked with validating the transaction. They look up their own records, validate that the property is not owned by anyone else, and that “Player-A” has enough money in the account to buy it. As soon as 3 out of 5 (i.e. 51%+) agree that the transaction can go through, the transaction is completed and all players update their copies of the records of Player A and the property card.
  • If the Player A “PASSES GO”, a transaction of adding $200 to Player A’s account is sent for approval to the remaining 5 players, who have to ensure that the player A has already passed Go (there is no need to check the bank because it has infinite money). Once 3 out of 5 say “OK” to the impending transaction, all 6 update their own records of Player A.
  • and so on …

When you “PASS GO” you get $200 out of thin air from the Monopoly’s non-existent virtual bank, just like Bitcoins 😊

Further pondering

Now-a-days, many software products aspire to use block chain, and the concept has been marketed very well. I have spoken to quite a few software startups and many of them want to use block chain but in my opinion, they don’t need it. It is similar to an “item-song” in a Bollywood movie :-); this song has nothing to do with the storyline of the movie. Using block chain is the “in vogue, chic, and trendy” thing to do these days. A lot of times product promoters, marketing and product positioning teams think that having “We use block-chain! No kidding…!” as a slogan will attract funding (or better funding) and/or more customers, and it may be actually true.

What I think is important are the “immutable records” i.e., written once, read many times (like a CD). These are essential for auditing purposes. If any record needs to be modified, a new record has to be created with the reference of the existing record so that the audit trail is preserved.

Another way to record the trail is to keep a “record of change” for every modification that happens. Traditional databases (MSSQL, Oracle DB, IBM DB2 etc.) can do this.

To make the whole ledger tamper resistant, one can keep multiple copies of the records (replicated ledger) and ensure that all copies are synchronized.

Let me ponder over where I could use parts of block-chain…

  • An airline company may want to keep a track of where a particular aircraft traveled throughout the year (SEA to SFO to CDG to LHR to YVR… and so on…) and correlate it to maintenance costs or fuel costs. For this purpose, immutable records YES, distributed ledger (maybe) as a backup, but consensus voting NOT NEEDED.
  • A democratic organization wants to provision e-voting in an election. A vote is recorded once and retrieved for counting purpose. All one needs is to ensure immutability. For this purpose, immutable records YES, distributed ledger (maybe) as a backup, consensus voting NOT NEEDED.

When it comes to “Consensus Voting” I have my doubts. Maybe 51% is not enough for some organizations, like banks – they may want a black-ball system where 100% ledger-keepers must validate proposed transaction. I have never been really convinced of 51% consensus voting in case of any transactions which have financial ramifications, e.g., movement of funds or inventory. I would go for 100% voting even if I have distributed the ledger on multiple computers.

51% attack is possible when a hacker gets control of 51% of the ledger-keepers and forces nefarious results. Many erudite software engineers have written on how to deal with the 51% attack. Navigate to  https://www.google.com/search?q=51+attack+blockchain and you will see a lot of information.

Let us say in the consensus voting, 54% say YES, and the rest 46% say NO or abstain, there should be a root-cause analysis done to see why these 46% did not say YES, which nobody seems to mention though…

The concept immutable records and distributed ledgers have existed for more than 30 years. Block chain concept tries to formalize the process in a consistent way.

If you have a hammer in your hand and walk around the house, many objects look tantalizingly like nails. Blockchain is one such hammer, and it is fashionable too, therefore the temptation to use it for anything can be overwhelming 😊.

]]>
https://gadreconsulting.com/2022/09/15/block-chain-do-i-need-it/feed/ 2
Software Engineers or Software Plumbers https://gadreconsulting.com/2022/08/29/software-engineers-or-software-plumbers/ https://gadreconsulting.com/2022/08/29/software-engineers-or-software-plumbers/#respond Mon, 29 Aug 2022 15:44:28 +0000 https://gadreconsulting.com/?p=45 What do we have here? Software Engineers or Software Plumbers?

I always like to use metaphors so here we go!

The construction Industry, is very similar to software industry.

We have

  • Builders

Builders seize the opportunities of building an apartment building, and work through the determination of price point, profitability, legal aspects of land acquisition, financing etc. They eventually decide that they should have, say, 10 condos of say 150 sq. meters, and 10 condos of 120 sq. meters; a total of 2700 sq. met. of sellable area.

  • Architects & Engineers:

Architects/Engineers create the building plans in accordance with the builders’ aspirations and requirements. They study the utilitarian and aesthetic aspects and come up with building plans and get them approved by the interested parties. They decide on the technical aspects of the actual construction work, like size of beams and columns, steel required to withstand the load, which brands of building materials to use, electrical loads and requirements etc. so that the building is looks as close as possible to the approved plan.

  • Construction Crew:

Excavators, Plumbers, Masons, Electricians, Painters etc. Construction crews follow the orders of the Architects & Structural Engineers and do the actual construction.

Plumbers get the orders to install a shower stall, a sink and a WC in apartments at the specified locations. They are told the brand of shower head, the brand and color of washbasin and WC to use and so on. They assemble the parts together put appropriate pipes in the appropriate places and complete the work.

Masons lay the bricks and build walls with the thickness and height etc. following the specifications given by the Engineers and Architects.

Electricians do the wiring and install appliances and points as per specifications given by Engineers and Architects.

Plumbers do not have competence to decide where to put the shower stall, or the diameter / slope of the sewage pipe required for the effluent of the building. Masons do not know how thick the wall should be and where to build it, and, electricians do not know the gauge of wire required to carry the necessary current.

On the other hand, the Architects & Engineers know where to build a wall (and maybe even theoretically know how to lay the bricks), but a skilled mason can do a much better job of actually building the wall. Architects & Engineers know what the slope and diameter of a sewage pipe should be and how it should be laid, but a specialist plumber has the expertise to do install it well, and ditto for electricians.

If I translate this to software building parlance, we have the Business Owners, Software Architects/Engineers and Software Plumbers

Business Owners:

The business owners envision the software with target market in mind, procure the finance, do the competitive analyses, legal analyses etc. and engage Software Architects/Engineers to design the product for them.

Software Architects/Engineers do deep analyses and decide what existing software can be reused and repurposed, or what needs to be created from scratch.

If something needs to be created from scratch, they either prototype it themselves and hire/engage other software engineers create it under their supervision. These software engineers work together, innovating and challenging the established practices. They might write some new code or sometimes take a partially usable piece from some other software, understand it and repurpose it to suit the needs.

And then we have the software plumbers, they put together the pieces already made available to them by the Software Architects/Engineers, and complete the product.

Last few years, I have interviewed hundreds of candidates for Software Engineering jobs, and most of them are software plumbers, and too mediocre software plumbers.

Python is the latest hot hero/heroin of the industry (Ruby used to be it in 2012-13). Python *Programmers* just know how to call NUMPY and SCI-KIT etc. People get offended when I call them Python Plumbers.

Maybe, it is so that most of the good things have been already invented/innovated, we do not care about resource contention anymore; because processors, RAM and disk space are cheaper and faster. Most of the newer graduates from colleges have never heard of page faults and clustered indexes….

I wonder if the world just needs software plumbers in future. What is in a name, let us call them Software Engineers.

]]>
https://gadreconsulting.com/2022/08/29/software-engineers-or-software-plumbers/feed/ 0
DDOS or DOS Attacks https://gadreconsulting.com/2022/08/02/ddos-or-dos-attacks/ https://gadreconsulting.com/2022/08/02/ddos-or-dos-attacks/#comments Tue, 02 Aug 2022 05:00:44 +0000 https://gadreconsulting.com/?p=22 DDOS or DOS Attacks – Can you really prevent them?

Let us face the truth, DDOS or DOS attacks cannot be prevented.

Here is a metaphorical story…

John Doe owned a small restaurant “My Little Hut” which has a serving/seating capacity of 60 people on 9 tables.

Tom Cat used to be a partner in the business with John Doe for a few years but they separated supposedly amicably. Tom held a secret grudge against John, but never showed it. Tom Cat wanted to bring down John Doe’s restaurant business to dust.

Tom Cat being a popular person had many devoted groupies and ordered his 3000 groupies to go “My Little Hut” and stand in line as soon as the restaurant opens, get seated at a table, wait for the server, peruse the menu, spend some time idling & drinking water, and after a while leave without ordering anything.

“My Little Hut”‘s legitimate 30-40 customers were also in this line of 3000 but never got a chance to eat at the restaurant and were denied the service. Denial Of Service accomplished.

How can John Doe mitigate this attack the next time Tom Cat wants to pull the same stunt? Realistically he cannot, but he can try to reduce its impact on the business.

Various methods to mitigate DOS attacks are mentioned on the internet.

The first one is the IP filtering / fencing / fire-walling etc.

Metaphorically, continuing the above example, John had a list his known customer IDs “White List”. He employed security guards and as soon as anyone joined in line, the security guards checked their ID against the “White List” and threw them out of the line if their ID was not in the “white List”. This process of checking against the “White List” took say 15 seconds per ID, so John employed more security guards to handle 3000 in parallel. John kept detailed logs of rejected IDs and created a “Black List”.

This worked for some time and had an undesired side effect; legitimate customers were also thrown out if they forgot to carry the ID, and brand new legitimate customers could not enter. John started a new registration counter where the new legitimate customer registered their Ids in the “White List” and then could join the restaurant waiting line. Once Tom realized this, he gathered more groupies and sent another 3000 to the registration counter, and it became overwhelmed with registration requests. John employed more staff to handle the registration process, and the war went on…

The second method is distributing the load or increasing the bandwidth also called as Service Point / Server over-provisioning / redundancy.

In the above example, John opened a side door, stationed his security guards there, and had already told his legitimate registered customers to use the side door if the regular door was too busy or closed during regular business hours. This too worked only for some time, because Tom started sending his 3000 rogue groupies to the side door. So John borrowed from banks and opened ten new restaurants in the same area, using a common kitchen. John’s customer base remained pretty much the same but his expenses skyrocketed.

If, as soon as the 3000 stood in the line of the restaurant or registration line, John could have just shut down the restaurant, but it would have meant that John’s customer could not use the restaurant, and Tom’s DOS attack succeeded.

Now, John has hired Sherlock Holmes and Hercule Poirot to figure out who is behind these attacks. He has given them the detailed of logs IDs. Let us see what these two geniuses can infer from the detailed logs.

Most of the other methods described in various internet sources are forensics or postmortem analyses; and these are important. Since one cannot really be 100% shielded from a DOS/DDOS attack, strategies and workflows should be established to minimize the impact on business.

External Links:

https://en.wikipedia.org/wiki/Denial-of-service_attack

https://www.cisa.gov/uscert/ncas/tips/ST04-015

https://us.norton.com/internetsecurity-emerging-threats-dos-attacks-explained.html#

]]>
https://gadreconsulting.com/2022/08/02/ddos-or-dos-attacks/feed/ 1