๐ถ๐ป๐น
Artificial Intelligence (AI) and Machine Learning (ML) are terms that are used by a lot of people but do you know about the technical nuts and bolts of how they actually work?
If you don't that's OK, — I’ve broken down some of the most common terms you might hear in the world of AI/ML into simple explanations anyone can understand. Plus, if you're using Google Colab, there's a built-in AI assistant called Gemini that can help you understand and write code but you need to know the right questions to ask it first.
1. NLP (Natural Language Processing) ๐ค๐ง
NLP is a branch of AI that helps computers understand and work with human language. Think of it as the technology behind things like chatbots, language translators, and voice assistants like Siri or Alexa. It allows machines to “read” and make sense of text or speech, just like we do.
2. BERT (Bidirectional Encoder Representations from Transformers)
BERT is a special AI model developed by Google that helps computers understand the meaning of words in a sentence — not just individually, but in context. For example, the word "bank" can mean a riverbank or a financial institution. BERT helps AI figure out which meaning makes sense depending on the sentence.
3. Data Cleaning
Before you can train an AI model, you need clean data. Data cleaning is the process of fixing or removing incorrect, incomplete, or messy data. It's like tidying up a spreadsheet before using it — removing duplicates, fixing spelling errors, or filling in missing information.
4. Data Preprocessing
This step comes after cleaning and involves preparing the data so a machine can understand it. That might include turning text into numbers, resizing images, or organizing data in a consistent format. It’s like prepping ingredients before cooking — everything needs to be in the right form before starting. For example if your data has a timestamp in it then all the timestamps need to be in the same format or it's like trying to compare the price of something in Dollars with something else in Pounds.
5. K-NN (K-Nearest Neighbours)
This is a simple AI technique used to make predictions. It works by finding the closest “neighbors” (similar examples) to a new piece of data and making a decision based on them. For example, if your neighbors all like a certain movie, the algorithm might suggest you’ll like it too.
6. Clustering
Clustering is a way of grouping similar things together without knowing much about them in advance. It’s like sorting a box of mixed Lego pieces by color or shape, even if you don’t know what the pieces are for. AI uses clustering to find patterns or groupings in data. For example the coordinates of the locations of migrating birds can be analysed in this way.
7. Classification
Classification is when an AI model learns to put things into categories. For instance, it can look at an email and decide if it’s spam or not. The goal is to assign labels (or classes) to new data based on what it learned from old examples. AI is good at this kind of task, just give it plenty of examples to learn from.
8. TensorFlow
TensorFlow is a tool made by Google to help build and train AI models. It’s kind of like a construction kit for AI, used by professionals and beginners alike. With TensorFlow, people can create systems that recognize images, understand speech, or even recommend songs. It's a tool that's used to build your own AI models by inputting sample data into it for it to learn from and then make predictions that can be tested and evaluated, (that process can be repeated to fine tune it).
9. Scikit-learn
Scikit-learn (often written as sklearn) is another popular tool for building machine learning models. It's known for being user-friendly, especially for beginners. It's is an open-source Python library which helps in
making machine learning more accessible. It provides a straightforward,
consistent interface for a variety of tasks like classification,
regression, clustering, data preprocessing and model evaluation.
10. Tokenization
Tokenization is a step in processing text data. It breaks down text into smaller pieces, like words or even parts of words, so that a computer can understand and analyze it. For example, the sentence “AI is amazing” would be tokenized into [“AI”, “is”, “amazing”]. For a Uni assignment I used this tool to find out if I was spreading hate on social media, I downloaded my entire fb data and tokenized all the words from my fb posts and comments so I could check for hate words (don't worry there weren't any!) and then I displayed the most common words in a word cloud.
Personal Advice : Use Google Colab + Gemini AI ๐๐ช
If you're curious to try any of this out, Google Colab is a free, beginner-friendly coding platform that runs in your browser — no setup needed. It also has a built-in AI assistant called Gemini that can explain code, help you write it, or answer questions as you learn. AI is basically an advanced form of data analysis which can be a very CPU intensive activity so I'd rather tell google.colab to do the task in the background while I work on other things to save my computer from overheating and crashing.
For more detail I recommend:
https://www.geeksforgeeks.org/
Comments