Do you overthink making your yet-to-be-made app scalable? Great, you're not the only one, but should you?
Here are some scalability stories from the famous tech that you use every day. Hope you'll find them insightful. 1) Facebook As Facebook started as a little Stanford project and then started scaling the university campus-wise. Do you know how they used to do that?
Facebook used to have a separate server for each campus with each having its own code, MySql database, and memecache. Each time FB need to have another campus they would install a new server with its own server, code, and database.
Each user was restricted to its own campus, they would have a weird "harvard.facebook.com" type URL for each user. Later when they decided to let different campus users talk to each other, it took them a year to come up with an efficient so-called single-global-table, where users can add each other globally. 2) Gmail yeah I mean the defacto standard for emails, Gmail. Do you how it was initiated & scaled?
A guy name Paul Buchheit from google didn't like to use existing email clients because he wasn't satisfied with their UI. So he hacked together a little tool to just preview his emails in google groups UI & he liked it. Then he started adding little new features when he feel like it should have something else, like let's add a writing emails option, a delete button... One day he showed his created little email client to his co-worker and he kinda likes it, but the problem was he wasn't able to use his email in it because Paul's email was hardcoded in it. Paul just realized he should make it flexible to allow for other emails to be used instead of his one.
People start to use it inside google, a day it went down & someone came to Paul and said, do you know Gmail has been down since morning. He said no, I was busy with work didn't know that :D 3) Twitch Initially, video streaming cost was very high when twitch started, every little detail on their web page will fire a query to get data, e.g watch count, likes count, etc
They decided to cut the cost by loading all website data in cache except video and chat. Therefore video was played but none of the buttons was interactive except chat. All those data were being loaded from local storage.
Later they figure out a way to cache using different strategies. 4) Google At some point google's algorithm to crawl pages become inefficient, it would take 3 weeks to crawl new pages.
Google stopped crawling new pages for 6 months & showed only the old crawled pages to users on search. The cool part was users didn't even notice that. 5) Friendster vs MySpace Friendster used to have a feature to show you how far this connection is from you. But this little so-called important functionality was very expensive. It will fire a query for each user-friend multiple times to figure out by going through existing connections to figure out how this person is linked with the current user. This functionality would fire tons of queries & took a lot of time.
Myspace used the same feature but instead of finding a connection through multiple queries, they would fire a single query if the person is your friend otherwise they would show text, "this is your far connection" :D Source: Y Combinator Conclusion is, Don't stress too much about scaling before you even start, rather start small & scale with the time.
1. The Ternary Operator
Above is the example of a Ternary Operator (?) used as a conditional expression. This shorthand use of the conditional expression is more preferred over the if-else statement for situations where you have to specifically choose between two situations to run some arithematic logic or render something on your page quickly. How this expression basically works is that the condition is checked first, if the condition is true then the statement after the ternary operator ( ? ) will run, and if that’s not the case then the statement after the colon ( : ) will run.
2. Destructuring The JavaScript destruction feature allows you to smoothly assign values or objects in one go on the left side of the assignment operator to variables (as shown in above example). One other example: 3. Spread Operator To put it simple a spread operator takes an iterable and divides it into individual elements into an another array or object. Here’s another example:
4. Array Methods .pop() and .push(): .pop() removes item at the end of an array and .push() adds element at the end of an array: There are some other methods like filter, reduce, sort, includes, find, forEach, splice, concat, shift and unshift and so on.
5. Arrow Functions Helps us to create functions in a more simpler manner:
6. Promises Promises are used just like real human promises which ensure some future result, which means they are an asynchronous event. The promises can either be: Resolved — If it completes it’s function Rejected — If the function it tries to do is not successful Pending — When result of the function is not yet determined The .then() function is used when a promise is either resolved or rejected and it takes two callback functions as parameters, first function runs when promise is resolved and the second is optional and runs when the promise is rejected and not resolved. The .catch() function is used for error handling in promise.
7. The Fetch API The fetch() method starts the process of fetching a resource from a server. The fetch() method returns a Promise that resolves to a Response object. 8. Async/Await Async/Await functionality provides a better and cleaner way to deal with Promises. JavaScript is synchronous in nature and async/await helps us write promise-based functions in such a way as if they were synchronous by stopping the execution of further code until the promise is resolved or rejected. To make it work, you have to first use the async keyword before declaring a function. For example, async function promise() {}. Putting async before a function means that the function will always return a promise. Inside an async function, you can use the keyword await to suspend further execution of code until that promise is resolved or rejected. You can use await only inside of an async function. Now, let’s quickly finish off this section with an example: 9. Import/Export Components In React, you have to render every component you declare in the App.js component. In the above example, we created a component called Component and exported it with our code export default Component. Next, we go to App.js and import the Component with the following code: import Component from './Component'. Took help from here. on freecodecamp.
In the rapidly advancing field of Artificial Intelligence (AI), understanding the foundations is key, especially when dealing with Large Language Models (LLMs). This guide aims to simplify complex topics for beginners, taking you through essential concepts like neural networks, natural language processing (NLP), and LLMs. We'll explore how LLMs are built, trained, and keywords related to LLMs, as well as the challenges they face, like biases and hallucinations. What Are Neural Networks? A neural network is a machine learning model that makes decisions like the human brain, by mimicking how biological neurons work together to recognize patterns, evaluate choices, and reach conclusions. Neural networks are the backbone of any AI model, including LLMs. As the backbone of AI models, including LLMs, neural networks are organized into layers that process data and learn from it. The neurons in a neural network are arranged in layers: Input Layer: Where the data first enters the model. Hidden Layers: Where computations happen, allowing the network to learn patterns. Output Layer: Where the final prediction or decision is made. For example, a neural network can be trained to recognize images of cats. It processes an image through layers of neurons, enabling it to identify a cat among different shapes or objects. What Is Natural Language Processing (NLP)? Natural language processing (NLP) is a computer program's ability to understand human language, both spoken and written. NLP is the field that focuses on enabling machines to understand and generate human language. NLP powers everything from chatbots to voice assistants, translating human language into something a machine can process. It involves tasks like: Tokenization: Breaking text into smaller components (e.g., words or subwords). Parsing: Understanding the structure of sentences. Sentiment Analysis: Determining whether a piece of text is positive, negative, or neutral. Without NLP, LLMs wouldn't be able to grasp the nuances of human language. What Are Large Language Models (LLMs)? Large Language Models (LLMs) are advanced neural networks designed to understand and generate human language. They are trained on vast amounts of text data, allowing them to learn patterns, context, and nuances of language. LLMs can perform various tasks, such as answering questions, writing essays, translating languages, and engaging in conversations. Their primary goal is to create human-like text that captures the intricacies of natural language. The core idea behind LLMs is simple: predict the next word in a sentence. For example, if you input "The sun rises in the...", the LLM should predict "east." While this may seem basic, this task leads to the development of complex, emergent abilities like text generation, reasoning, and even creativity. Key LLM Terminologies The Transformer A Transformer is a type of neural network architecture that revolutionized natural language processing (NLP) by enabling models to handle sequences of data more efficiently. The Transformer architecture was introduced in the groundbreaking paper Attention Is All You Need. Traditional models like Recurrent Neural Networks (RNNs) processed sequential data and maintained an internal state, allowing them to handle sequences like sentences. However, they struggled with long sequences due to issues like the vanishing gradient problem, because over time, they would forget earlier information. This happened because the adjustments made to improve the model became too small to have any real impact. The Transformer addressed these challenges by using a mechanism called attention, which allowed the model to focus on different parts of a sentence or document more effectively, regardless of their position. This innovation laid the foundation of ground breaking models like GPT-4, Claude, and LLaMA. The architecture was first designed as an encoder-decoder framework. In this setup, the encoder processes input text, picking out important parts and creating a representation of it. The decoder then transforms this representation back into readable text. This approach is useful for tasks like summarization, where the decoder creates summaries based on the input passed to the encoder. The encoder and decoder can work together or separately, offering flexibility for various tasks. Some models only use the encoder to turn text into a vector, while others rely on just the decoder, which is the foundation of large language models. Language Modeling Language modeling refers to the process of teaching LLMs to understand the probability distribution of words in a language. This allows models to predict the most likely next word in a sentence, a critical task in generating coherent text. The ability to generate coherent and contextually appropriate text is crucial in many applications, such as text summarization, translation, or conversational agents. Tokenization Tokenization is the first step when working with large language models (LLMs). It means breaking down a sentence into smaller parts, called tokens. These tokens can be anything from individual letters to whole words depending on the model, and how they're split can affect how well the model works. For example, consider the sentence: The developer’s favorite machine.. If we split the text by spaces, we get:
["The", "developer's", "favorite", "machine."]
Here, punctuation like the apostrophe in developer’s and the period at the end of machine. stays attached to the words. But we can also split the sentence based on spaces and punctuation:
["The", "developer", "'", "s", "favorite", "machine", "."]
The way text is split into tokens depends on the model, and many advanced models use methods like subword tokenization. This breaks words into smaller, meaningful parts. For example, the sentence It's raining can be split as:
["It", "'", "s", "rain", "ing", "."]
In this case, raining is broken into rain and ing, which helps the model understand the structure of words. By splitting words into their base forms and endings (like rain and ing for raining), the model can learn the meaning more effectively without needing to store different versions of every word. During tokenization, the text is scanned, and each token is assigned a unique ID in a dictionary. This allows the model to quickly refer to the dictionary when processing the text, making the input easier to understand and work with. Embeddings After tokenization, the next step is to convert these tokens into something a computer can work with — which is done using embeddings. Embeddings are a way to represent tokens (words or parts of words) as numbers that the computer can understand. These numbers help the model recognize relationships between words and their context. For example, let's say we have the words happy and joyful. The model assigns each word a set of numbers (its embedding) that captures its meaning. If two words are similar, like happy and joyful, their numbers will be close together, even though the words are different. At first, the model assigns random numbers to each token. But as the model trains—by reading and learning from large amounts of text—it adjusts those numbers. The goal is for tokens with similar meanings to have similar sets of numbers, helping the model understand the connections between them. Although it may sound complicated, embeddings are just lists of numbers that allow the model to store and process information efficiently. Using these numbers (or vectors) makes it easier for the model to understand how tokens relate to one another. Let's look at a simple example of how embeddings work:
Imagine we have three words: cat, dog, and car. The model will assign each word a set of numbers, like this: cat → [1.2, 0.5]
dog → [1.1, 0.6]
car → [4.0, 3.5]
Here, cat and dog have similar numbers because they are both animals, so their meanings are related. On the other hand, "car" has very different numbers because it’s a vehicle, not an animal. Training and Fine-Tuning Large language models (LLMs) are trained by reading massive amounts of text to learn how to predict the next word in a sentence. The model's goal is to adjust its internal settings to improve the chances of making accurate predictions based on the patterns it observes in the text. Initially, LLMs are trained on general datasets from the internet, like The Pile or CommonCrawl, which contain a wide variety of topics. For specialized knowledge, the model might also be trained on focused datasets, such as Reddit Posts, which help it learn specific areas like programming. This initial training phase is called pre-training, where the model learns to understand language overall. During this phase, the model’s internal weights (its settings) are adjusted to help it predict the next word more accurately based on the training data. Once pre-training is done, the model usually undergoes a second phase called fine-tuning. In fine-tuning, the model is trained on smaller datasets focused on specific tasks or domains, like medical text or financial reports. This helps the model apply what it learned during pre-training to perform better on specific tasks, such as translating text or answering questions about a particular field. For advanced models like GPT-4, fine-tuning requires complex techniques and even larger amounts of data to achieve their impressive performance levels. Prediction After training or fine-tuning, the model can create text by predicting the next word in a sentence (or next token to be precise). It does this by analyzing the input and giving each possible next token a score based on how likely it is to come next. The token with the highest score is chosen, and this process repeats for each new token. This way, the model can generate sentences of any length, but it’s important to remember that the model can only handle a certain amount of text at a time as input, known as its context size. Context Size The context size, or context window, is a crucial aspect of LLMs. It refers to the maximum number of tokens the model can process in a single request. It determines how much information the model can handle in a single go, which impacts how well it performs and the quality of its output. Different models have different context sizes. For instance, OpenAI’s gpt-3.5-turbo-16k model can handle up to 16,000 tokens (which are parts of words or words themselves). Smaller models might manage only 1,000 tokens, while bigger ones like GPT-4-0125-preview can process up to 128,000 tokens. This limit affects how much text the model can generate at one time. Scaling Laws Scaling laws explain how a language model's performance is affected by different factors, such as the number of parameters, the size of the training dataset, the computing power available, and the model's design. These laws, discussed in the Chinchilla paper, help us understand how to best use resources to train models effectively. They also offer insights into optimizing performance. According to Scaling laws, The following elements determine a language model’s performance: Number of Parameters (N): Parameters are like tiny parts of the model’s brain that help it learn. When the model reads data, it adjusts these parameters to get better at understanding patterns. The more parameters the model has, the smarter it becomes, meaning it can pick up on more complex and detailed patterns in the data. Training Dataset Size (D): The training dataset is the collection of text or data the model learns from. The bigger the training dataset, the more the model can learn and recognize patterns in different texts. FLOPs (Floating Point Operations Per Second): This term refers to the amount of computing power needed to train the model. It measures how fast the model can process data and perform calculations during training. More FLOPs mean the model can handle more complex tasks but also requires more computational resources to do so. Emergent Abilities in LLMs As LLMs grow in size and complexity, they start exhibiting emergent abilities that were not explicitly programmed into them. For example, GPT-4 can summarize long texts or even perform basic arithmetic without being specifically trained for those tasks. These abilities emerge because the model learns so much about language and data during training. Prompts Prompts are the instructions you give to LLMs to generate a desired output. Designing the right prompt can significantly improve the quality of the generated text. For example: 1. Use Clear Language: Be specific in your prompts to get better results. Less Clear: Write about Allama Iqbal. More Clear: Write a 500-word article on the great poet of sub-continent Allama Iqbal. 2. Provide Enough Context: Context helps the model know what you want. Less Context: Write a story. More Context: Write a short story about a baby girl lost in the woods with happy ending. 3. Try Different Variations: Experiment with different prompt styles to see what works best. Original: Write a blog post about the benefits of programming. Variation 1: Write a 1000-word blog post on the mental and financial benefits of regularly practicing programming. Variation 2: Create an engaging blog post highlighting the top 10 benefits of programming. 4. Review Outputs: Always check the automated responses for accuracy before sharing. Hallucinations Hallucinations occur when LLMs generate content that is factually incorrect or nonsensical. For instance, an LLM might state that "The capital of Australia is Sydney," when the correct answer is Canberra. This happens because the model is focused on generating likely text based on its training, not verifying facts. Biases Bias in LLMs arises when the training data reflects cultural, gender, or racial biases. For example, if a model is trained predominantly on English text from Western sources, it may produce biased outputs that favor Western perspectives. Efforts are being made to minimize these biases, but they remain a critical challenge in the field.
In today’s world, where every system is going towards exponential complexity, we want to automate our processes, and same was the case for us. We wanted to update our configs that we were storing in our database in a systematic way. The approach we used was that we took a clone of our configs in database, and we kept those configs in separate files based on their primary ids. The problem was, we wanted to update those configs in our database when we change them in their files. So here comes the solution, we wrote a script to be run on every change to the main branch using GitHub Workflows updating only those configs that were changed. How? Let’s learn. Overview: The implemented solution uses GitHub Workflows to automate the execution of scripts based on changes made in commits. Here’s an overview of the process: GitHub Workflows Setup: The process begins with setting up a GitHub Workflow file within the .github/workflows directory of the repository. This file defines the conditions under which scripts should be executed, such as on pushes to specific branches like main. Detecting Changes in Commits: Within the workflow, a step is dedicated to identifying changes between the last commit and the current one. This is achieved using the git diff command, comparing the files in the previous commit (github.event.before) with the files in the current commit (github.sha). Extracting Changed Files: The output of the git diff command provides a list of files that have been modified, added, or deleted between commits. This list of changed files is then captured and processed to ensure it is in a suitable format for further usage. Passing Changed File Names to Script: After obtaining the list of changed files, it is passed as arguments to the command that executes the script. This ensures that the script operates only on the files that have been modified since the last commit, minimizing unnecessary processing and improving efficiency. Executing the Script: Finally, the script is executed with the list of changed file names as arguments. This enables the script to perform its intended actions, such as running tests, generating documentation, or deploying code, specifically targeting the files that have been altered in the latest commit. Following is the .github/workflow file for accomplishing the above task. name: Deploymenton: push: branches: - mainjobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 with: fetch-depth: 0 # Don't forget this line, otherwise git wouldn't be able to detect the hashes of the commits - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: "18" - name: Install dependencies run: npm install - name: Identify changed files id: changed-files run: | changed_files=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }}) # Replace newline characters with spaces to avoid data loss. When this variable is transferred # to the next step, the data after the first new line special character is lost changed_files=${changed_files//$'\n'/ } echo "::set-output name=changed_files::$changed_files" - name: Run script with changed files run: | changed_files="${{ steps.changed-files.outputs.changed_files }}" echo "$changed_files" echo "Executing command: node index.js $changed_files" node index.js $changed_files The question is, how would we get the names of the changed files in our script file!? Easypeasy: async function main() {// 'process.argv' is an array containing the command line arguments provided when the Node.js process was invoked.// The '.slice(2)' method is used to extract elements from the 'process.argv' array, starting from index 2, since indexes 0 and 1 contain node and index.js respectively const allModifiedFiles = process.argv.slice(2);// Filter only those files that you want to track const modifiedFiles = allModifiedFiles.filter((file) => file.startsWith("data/") ); modifiedFiles.forEach(async (file) => { // Write code as per your requirements });}main();
All Blogs
We merge next-gen AI with deep expertiseto turn complex problems into smart solutions
Do you overthink making your yet-to-be-made app scalable? Great, you're not the only one, but should you?
Here are some scalability stories from the famous tech that you use every day. Hope you'll find them insightful. 1) Facebook As Facebook started as a little Stanford project and then started scaling the university campus-wise. Do you know how they used to do that?
Facebook used to have a separate server for each campus with each having its own code, MySql database, and memecache. Each time FB need to have another campus they would install a new server with its own server, code, and database.
Each user was restricted to its own campus, they would have a weird "harvard.facebook.com" type URL for each user. Later when they decided to let different campus users talk to each other, it took them a year to come up with an efficient so-called single-global-table, where users can add each other globally. 2) Gmail yeah I mean the defacto standard for emails, Gmail. Do you how it was initiated & scaled?
A guy name Paul Buchheit from google didn't like to use existing email clients because he wasn't satisfied with their UI. So he hacked together a little tool to just preview his emails in google groups UI & he liked it. Then he started adding little new features when he feel like it should have something else, like let's add a writing emails option, a delete button... One day he showed his created little email client to his co-worker and he kinda likes it, but the problem was he wasn't able to use his email in it because Paul's email was hardcoded in it. Paul just realized he should make it flexible to allow for other emails to be used instead of his one.
People start to use it inside google, a day it went down & someone came to Paul and said, do you know Gmail has been down since morning. He said no, I was busy with work didn't know that :D 3) Twitch Initially, video streaming cost was very high when twitch started, every little detail on their web page will fire a query to get data, e.g watch count, likes count, etc
They decided to cut the cost by loading all website data in cache except video and chat. Therefore video was played but none of the buttons was interactive except chat. All those data were being loaded from local storage.
Later they figure out a way to cache using different strategies. 4) Google At some point google's algorithm to crawl pages become inefficient, it would take 3 weeks to crawl new pages.
Google stopped crawling new pages for 6 months & showed only the old crawled pages to users on search. The cool part was users didn't even notice that. 5) Friendster vs MySpace Friendster used to have a feature to show you how far this connection is from you. But this little so-called important functionality was very expensive. It will fire a query for each user-friend multiple times to figure out by going through existing connections to figure out how this person is linked with the current user. This functionality would fire tons of queries & took a lot of time.
Myspace used the same feature but instead of finding a connection through multiple queries, they would fire a single query if the person is your friend otherwise they would show text, "this is your far connection" :D Source: Y Combinator Conclusion is, Don't stress too much about scaling before you even start, rather start small & scale with the time.
1. The Ternary Operator
Above is the example of a Ternary Operator (?) used as a conditional expression. This shorthand use of the conditional expression is more preferred over the if-else statement for situations where you have to specifically choose between two situations to run some arithematic logic or render something on your page quickly. How this expression basically works is that the condition is checked first, if the condition is true then the statement after the ternary operator ( ? ) will run, and if that’s not the case then the statement after the colon ( : ) will run.
2. Destructuring The JavaScript destruction feature allows you to smoothly assign values or objects in one go on the left side of the assignment operator to variables (as shown in above example). One other example: 3. Spread Operator To put it simple a spread operator takes an iterable and divides it into individual elements into an another array or object. Here’s another example:
4. Array Methods .pop() and .push(): .pop() removes item at the end of an array and .push() adds element at the end of an array: There are some other methods like filter, reduce, sort, includes, find, forEach, splice, concat, shift and unshift and so on.
5. Arrow Functions Helps us to create functions in a more simpler manner:
6. Promises Promises are used just like real human promises which ensure some future result, which means they are an asynchronous event. The promises can either be: Resolved — If it completes it’s function Rejected — If the function it tries to do is not successful Pending — When result of the function is not yet determined The .then() function is used when a promise is either resolved or rejected and it takes two callback functions as parameters, first function runs when promise is resolved and the second is optional and runs when the promise is rejected and not resolved. The .catch() function is used for error handling in promise.
7. The Fetch API The fetch() method starts the process of fetching a resource from a server. The fetch() method returns a Promise that resolves to a Response object. 8. Async/Await Async/Await functionality provides a better and cleaner way to deal with Promises. JavaScript is synchronous in nature and async/await helps us write promise-based functions in such a way as if they were synchronous by stopping the execution of further code until the promise is resolved or rejected. To make it work, you have to first use the async keyword before declaring a function. For example, async function promise() {}. Putting async before a function means that the function will always return a promise. Inside an async function, you can use the keyword await to suspend further execution of code until that promise is resolved or rejected. You can use await only inside of an async function. Now, let’s quickly finish off this section with an example: 9. Import/Export Components In React, you have to render every component you declare in the App.js component. In the above example, we created a component called Component and exported it with our code export default Component. Next, we go to App.js and import the Component with the following code: import Component from './Component'. Took help from here. on freecodecamp.
In the rapidly advancing field of Artificial Intelligence (AI), understanding the foundations is key, especially when dealing with Large Language Models (LLMs). This guide aims to simplify complex topics for beginners, taking you through essential concepts like neural networks, natural language processing (NLP), and LLMs. We'll explore how LLMs are built, trained, and keywords related to LLMs, as well as the challenges they face, like biases and hallucinations. What Are Neural Networks? A neural network is a machine learning model that makes decisions like the human brain, by mimicking how biological neurons work together to recognize patterns, evaluate choices, and reach conclusions. Neural networks are the backbone of any AI model, including LLMs. As the backbone of AI models, including LLMs, neural networks are organized into layers that process data and learn from it. The neurons in a neural network are arranged in layers: Input Layer: Where the data first enters the model. Hidden Layers: Where computations happen, allowing the network to learn patterns. Output Layer: Where the final prediction or decision is made. For example, a neural network can be trained to recognize images of cats. It processes an image through layers of neurons, enabling it to identify a cat among different shapes or objects. What Is Natural Language Processing (NLP)? Natural language processing (NLP) is a computer program's ability to understand human language, both spoken and written. NLP is the field that focuses on enabling machines to understand and generate human language. NLP powers everything from chatbots to voice assistants, translating human language into something a machine can process. It involves tasks like: Tokenization: Breaking text into smaller components (e.g., words or subwords). Parsing: Understanding the structure of sentences. Sentiment Analysis: Determining whether a piece of text is positive, negative, or neutral. Without NLP, LLMs wouldn't be able to grasp the nuances of human language. What Are Large Language Models (LLMs)? Large Language Models (LLMs) are advanced neural networks designed to understand and generate human language. They are trained on vast amounts of text data, allowing them to learn patterns, context, and nuances of language. LLMs can perform various tasks, such as answering questions, writing essays, translating languages, and engaging in conversations. Their primary goal is to create human-like text that captures the intricacies of natural language. The core idea behind LLMs is simple: predict the next word in a sentence. For example, if you input "The sun rises in the...", the LLM should predict "east." While this may seem basic, this task leads to the development of complex, emergent abilities like text generation, reasoning, and even creativity. Key LLM Terminologies The Transformer A Transformer is a type of neural network architecture that revolutionized natural language processing (NLP) by enabling models to handle sequences of data more efficiently. The Transformer architecture was introduced in the groundbreaking paper Attention Is All You Need. Traditional models like Recurrent Neural Networks (RNNs) processed sequential data and maintained an internal state, allowing them to handle sequences like sentences. However, they struggled with long sequences due to issues like the vanishing gradient problem, because over time, they would forget earlier information. This happened because the adjustments made to improve the model became too small to have any real impact. The Transformer addressed these challenges by using a mechanism called attention, which allowed the model to focus on different parts of a sentence or document more effectively, regardless of their position. This innovation laid the foundation of ground breaking models like GPT-4, Claude, and LLaMA. The architecture was first designed as an encoder-decoder framework. In this setup, the encoder processes input text, picking out important parts and creating a representation of it. The decoder then transforms this representation back into readable text. This approach is useful for tasks like summarization, where the decoder creates summaries based on the input passed to the encoder. The encoder and decoder can work together or separately, offering flexibility for various tasks. Some models only use the encoder to turn text into a vector, while others rely on just the decoder, which is the foundation of large language models. Language Modeling Language modeling refers to the process of teaching LLMs to understand the probability distribution of words in a language. This allows models to predict the most likely next word in a sentence, a critical task in generating coherent text. The ability to generate coherent and contextually appropriate text is crucial in many applications, such as text summarization, translation, or conversational agents. Tokenization Tokenization is the first step when working with large language models (LLMs). It means breaking down a sentence into smaller parts, called tokens. These tokens can be anything from individual letters to whole words depending on the model, and how they're split can affect how well the model works. For example, consider the sentence: The developer’s favorite machine.. If we split the text by spaces, we get:
["The", "developer's", "favorite", "machine."]
Here, punctuation like the apostrophe in developer’s and the period at the end of machine. stays attached to the words. But we can also split the sentence based on spaces and punctuation:
["The", "developer", "'", "s", "favorite", "machine", "."]
The way text is split into tokens depends on the model, and many advanced models use methods like subword tokenization. This breaks words into smaller, meaningful parts. For example, the sentence It's raining can be split as:
["It", "'", "s", "rain", "ing", "."]
In this case, raining is broken into rain and ing, which helps the model understand the structure of words. By splitting words into their base forms and endings (like rain and ing for raining), the model can learn the meaning more effectively without needing to store different versions of every word. During tokenization, the text is scanned, and each token is assigned a unique ID in a dictionary. This allows the model to quickly refer to the dictionary when processing the text, making the input easier to understand and work with. Embeddings After tokenization, the next step is to convert these tokens into something a computer can work with — which is done using embeddings. Embeddings are a way to represent tokens (words or parts of words) as numbers that the computer can understand. These numbers help the model recognize relationships between words and their context. For example, let's say we have the words happy and joyful. The model assigns each word a set of numbers (its embedding) that captures its meaning. If two words are similar, like happy and joyful, their numbers will be close together, even though the words are different. At first, the model assigns random numbers to each token. But as the model trains—by reading and learning from large amounts of text—it adjusts those numbers. The goal is for tokens with similar meanings to have similar sets of numbers, helping the model understand the connections between them. Although it may sound complicated, embeddings are just lists of numbers that allow the model to store and process information efficiently. Using these numbers (or vectors) makes it easier for the model to understand how tokens relate to one another. Let's look at a simple example of how embeddings work:
Imagine we have three words: cat, dog, and car. The model will assign each word a set of numbers, like this: cat → [1.2, 0.5]
dog → [1.1, 0.6]
car → [4.0, 3.5]
Here, cat and dog have similar numbers because they are both animals, so their meanings are related. On the other hand, "car" has very different numbers because it’s a vehicle, not an animal. Training and Fine-Tuning Large language models (LLMs) are trained by reading massive amounts of text to learn how to predict the next word in a sentence. The model's goal is to adjust its internal settings to improve the chances of making accurate predictions based on the patterns it observes in the text. Initially, LLMs are trained on general datasets from the internet, like The Pile or CommonCrawl, which contain a wide variety of topics. For specialized knowledge, the model might also be trained on focused datasets, such as Reddit Posts, which help it learn specific areas like programming. This initial training phase is called pre-training, where the model learns to understand language overall. During this phase, the model’s internal weights (its settings) are adjusted to help it predict the next word more accurately based on the training data. Once pre-training is done, the model usually undergoes a second phase called fine-tuning. In fine-tuning, the model is trained on smaller datasets focused on specific tasks or domains, like medical text or financial reports. This helps the model apply what it learned during pre-training to perform better on specific tasks, such as translating text or answering questions about a particular field. For advanced models like GPT-4, fine-tuning requires complex techniques and even larger amounts of data to achieve their impressive performance levels. Prediction After training or fine-tuning, the model can create text by predicting the next word in a sentence (or next token to be precise). It does this by analyzing the input and giving each possible next token a score based on how likely it is to come next. The token with the highest score is chosen, and this process repeats for each new token. This way, the model can generate sentences of any length, but it’s important to remember that the model can only handle a certain amount of text at a time as input, known as its context size. Context Size The context size, or context window, is a crucial aspect of LLMs. It refers to the maximum number of tokens the model can process in a single request. It determines how much information the model can handle in a single go, which impacts how well it performs and the quality of its output. Different models have different context sizes. For instance, OpenAI’s gpt-3.5-turbo-16k model can handle up to 16,000 tokens (which are parts of words or words themselves). Smaller models might manage only 1,000 tokens, while bigger ones like GPT-4-0125-preview can process up to 128,000 tokens. This limit affects how much text the model can generate at one time. Scaling Laws Scaling laws explain how a language model's performance is affected by different factors, such as the number of parameters, the size of the training dataset, the computing power available, and the model's design. These laws, discussed in the Chinchilla paper, help us understand how to best use resources to train models effectively. They also offer insights into optimizing performance. According to Scaling laws, The following elements determine a language model’s performance: Number of Parameters (N): Parameters are like tiny parts of the model’s brain that help it learn. When the model reads data, it adjusts these parameters to get better at understanding patterns. The more parameters the model has, the smarter it becomes, meaning it can pick up on more complex and detailed patterns in the data. Training Dataset Size (D): The training dataset is the collection of text or data the model learns from. The bigger the training dataset, the more the model can learn and recognize patterns in different texts. FLOPs (Floating Point Operations Per Second): This term refers to the amount of computing power needed to train the model. It measures how fast the model can process data and perform calculations during training. More FLOPs mean the model can handle more complex tasks but also requires more computational resources to do so. Emergent Abilities in LLMs As LLMs grow in size and complexity, they start exhibiting emergent abilities that were not explicitly programmed into them. For example, GPT-4 can summarize long texts or even perform basic arithmetic without being specifically trained for those tasks. These abilities emerge because the model learns so much about language and data during training. Prompts Prompts are the instructions you give to LLMs to generate a desired output. Designing the right prompt can significantly improve the quality of the generated text. For example: 1. Use Clear Language: Be specific in your prompts to get better results. Less Clear: Write about Allama Iqbal. More Clear: Write a 500-word article on the great poet of sub-continent Allama Iqbal. 2. Provide Enough Context: Context helps the model know what you want. Less Context: Write a story. More Context: Write a short story about a baby girl lost in the woods with happy ending. 3. Try Different Variations: Experiment with different prompt styles to see what works best. Original: Write a blog post about the benefits of programming. Variation 1: Write a 1000-word blog post on the mental and financial benefits of regularly practicing programming. Variation 2: Create an engaging blog post highlighting the top 10 benefits of programming. 4. Review Outputs: Always check the automated responses for accuracy before sharing. Hallucinations Hallucinations occur when LLMs generate content that is factually incorrect or nonsensical. For instance, an LLM might state that "The capital of Australia is Sydney," when the correct answer is Canberra. This happens because the model is focused on generating likely text based on its training, not verifying facts. Biases Bias in LLMs arises when the training data reflects cultural, gender, or racial biases. For example, if a model is trained predominantly on English text from Western sources, it may produce biased outputs that favor Western perspectives. Efforts are being made to minimize these biases, but they remain a critical challenge in the field.
In today’s world, where every system is going towards exponential complexity, we want to automate our processes, and same was the case for us. We wanted to update our configs that we were storing in our database in a systematic way. The approach we used was that we took a clone of our configs in database, and we kept those configs in separate files based on their primary ids. The problem was, we wanted to update those configs in our database when we change them in their files. So here comes the solution, we wrote a script to be run on every change to the main branch using GitHub Workflows updating only those configs that were changed. How? Let’s learn. Overview: The implemented solution uses GitHub Workflows to automate the execution of scripts based on changes made in commits. Here’s an overview of the process: GitHub Workflows Setup: The process begins with setting up a GitHub Workflow file within the .github/workflows directory of the repository. This file defines the conditions under which scripts should be executed, such as on pushes to specific branches like main. Detecting Changes in Commits: Within the workflow, a step is dedicated to identifying changes between the last commit and the current one. This is achieved using the git diff command, comparing the files in the previous commit (github.event.before) with the files in the current commit (github.sha). Extracting Changed Files: The output of the git diff command provides a list of files that have been modified, added, or deleted between commits. This list of changed files is then captured and processed to ensure it is in a suitable format for further usage. Passing Changed File Names to Script: After obtaining the list of changed files, it is passed as arguments to the command that executes the script. This ensures that the script operates only on the files that have been modified since the last commit, minimizing unnecessary processing and improving efficiency. Executing the Script: Finally, the script is executed with the list of changed file names as arguments. This enables the script to perform its intended actions, such as running tests, generating documentation, or deploying code, specifically targeting the files that have been altered in the latest commit. Following is the .github/workflow file for accomplishing the above task. name: Deploymenton: push: branches: - mainjobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 with: fetch-depth: 0 # Don't forget this line, otherwise git wouldn't be able to detect the hashes of the commits - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: "18" - name: Install dependencies run: npm install - name: Identify changed files id: changed-files run: | changed_files=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }}) # Replace newline characters with spaces to avoid data loss. When this variable is transferred # to the next step, the data after the first new line special character is lost changed_files=${changed_files//$'\n'/ } echo "::set-output name=changed_files::$changed_files" - name: Run script with changed files run: | changed_files="${{ steps.changed-files.outputs.changed_files }}" echo "$changed_files" echo "Executing command: node index.js $changed_files" node index.js $changed_files The question is, how would we get the names of the changed files in our script file!? Easypeasy: async function main() {// 'process.argv' is an array containing the command line arguments provided when the Node.js process was invoked.// The '.slice(2)' method is used to extract elements from the 'process.argv' array, starting from index 2, since indexes 0 and 1 contain node and index.js respectively const allModifiedFiles = process.argv.slice(2);// Filter only those files that you want to track const modifiedFiles = allModifiedFiles.filter((file) => file.startsWith("data/") ); modifiedFiles.forEach(async (file) => { // Write code as per your requirements });}main();
Introduction: Pagination is a crucial feature when dealing with large data, allowing you to efficiently manage and display data in smaller and more manageable chunks. Unlike popular SQL and NoSQL databases like MySQL, PostgreSQL, and MongoDB etc., pagination in DynamoDB is a bit different. In the forementioned database management systems, you can create a pagination system by counting all records and setting the correct values for limit and offset. Consider the following example of a MySQL query (using Sequelize ORM), in which all projects that are In Progress status are fetched with pagination. const { pageNumber } = req.query;const limit = 10; // Or get it from request query as req.query.limit if provided by frontendconst offset = (pageNumber - 1) * limit;const { count, rows } = await Projects.findAndCountAll({ where: { status: "In Progress" }, order: [["createdAt", "DESC"]], limit: limit, offset: offset, });res.status(200).json({ count: count, totalPages: Math.ceil(count / limit), pageNumber: pageNumber, projects: rows, }); From the above example, you can guess that there are few things to keep track of on the frontend, that are, totalPages and pageNumber Using totalPages , we can show the number of pages available and pageNumber for showing the current page that a user is viewing. If you want to control limit from frontend as well, you can include it in the query variable and consider it on the backend as well, no problem at all. The important thing is that count is needed for calculating total pages or telling the frontend how many records we have in total. The problem in DynamoDB is, we don’t have any count specific query. For counting, we would scan the whole table and then find the length of the returned array which we want to avoid (what is the point of pagination then, if we are scanning the whole table). Imagine you have millions of records, and you are scanning every bit of it, Scary!! How to implement pagination in DynamoDB then? Let’s learn. Pagination In DynamoDB: Every scan or query operation in DynamoDB returns a property, which is LastEvaluatedKey that indicates the last item that was read in the scan or query operation. It serves as a marker or token for the next page of results. When LastEvaluatedKey is undefined, it typically indicates that there are no more pages of results to fetch beyond the current page. There is another parameter in DynamoDB query operations,ExclusiveStartKey that takes Primary key as value (Partition key and sort key both if both are present) and is used in paginated queries to specify where to start fetching results from the provided index. Sad news, there is no count query in DynamoDB as discussed earlier so we can’t get the total number of records available and the total number of pages as well. So, it is better to have an infinite scroll or Load more UI on frontend while showing paginated DynamoDB items. Using LastEvaluatedKey and ExclusiveStartKey , we can implement our pagination as following: import { DynamoDBDocument } from "@aws-sdk/lib-dynamodb";import { DynamoDB } from "@aws-sdk/client-dynamodb";// In case of serverless frameworklet { lastEvaluatedKey, limit } = event.queryStringParameters;// In express frameworklet { lastEvaluatedKey, limit } = req.query;limit = Number(limit);if (lastEvaluatedKey) { // Since the lastEvaluatedKey may contain special characters that may // cause problems in URL, the best approach is to encode the URL on // frontend before sending it to the backend. Since the value is URL // encoded, we need to decode it on backend. A typical lastEvaluatedKey // may have value (without URL encoding) as following: // { projectId: "40290" } lastEvaluatedKey = JSON.parse(decodeURIComponent(lastEvaluatedKey));}const dynamodb = DynamoDBDocument.from( new DynamoDB({ region: DYNAMODB_REGION, credentials: { accessKeyId: CUSTOM_AWS_ACCESS_KEY_ID, secretAccessKey: CUSTOM_AWS_SECRET_ACCESS_KEY, }, }));const { Items, Count, LastEvaluatedKey } = await dynamodb.query({ TableName: "Projects", ExclusiveStartKey: lastEvaluatedKey ? lastEvaluatedKey : undefined, Limit: limit, // Limit number of items scanned });// LastEvaluatedKey must be sent to the frontend along with Items so that // it can keep track of the pagination Count is representing the number of items in the response. Use the Limit parameter to control the maximum number of items returned per page, optimizing throughput and reducing resource consumption. If the value of LastEvaluatedKey is undefined, the initial set of items will be returned according to the specified limit. However, if a valid value is provided, the query will return a set of items beginning from the value of LastEvaluatedKey, which contains the primary key of the last evaluated item.
Ready To Fuel Your Vision With AI-Powered Innovation?