Table of contents
Fresh off its developer-focused I/O conference, Google has an overwhelming amount of AI stuff to share with developers. As always, Google I/O has left me and you utterly amazed, and this year is no exception—but this time, it's done in an AI-style!
Every year, Google begins its developer conference with an array of announcements, including numerous unveilings of new projects. With the Google I/O 2024 keynote now behind us, it's clear that the main themes were Google Gemini and new AI capabilities, which presented us a preview of what is ahead for our digital life. CEO Sundar Pichai was justified in portraying the event as its version of The Eras Tour, notably the "Gemini Era" at the very top.
Ahead of OpenAI's announcement the previous day, Google unveiled some of its most important products, including the quicker Flash version of its flagship Gemini AI model, which is now equipped to contend with the new and faster GPT iteration, GPT-4o.
I took on the task of providing you with a concise, simple to comprehend summary of the most significant details from the keynote speech, as it's possible that you were not able to attend the entirety two-hour presentation on Tuesday.
Enhanced Photo Search Performance
Within Google Photos, Google has included a number of powerful visual search capabilities. Ask photos is a new tool that enables you to ask Gemini to search for photos while offering more fragmented outcomes than previously. As a demonstration, provide it with your license plate number, and it will utilize context cues to identify your vehicle in every photo you have ever taken. Amazing ain't it?
Gemini: Off to work!
AI is also being included by Google into its Workplace office toolkit. Many Google products, including Gmail, Google Drive, Docs, Sheets, and Slides, are going to feature a toggle option for Google's Gemini AI. Asking questions, creating emails or documents, and condensing lengthy documents or email threads are all functions of the Gemini helper.
AI Teammate powered by Gemini is also integrated into programs like as Docs and Gmail. You may call this whatever you like; it's similar to having a productivity buddy at work. With the aid of the AI Teammate, you may generate to-do lists, retain track of project files, organize more effectively team discussions, and monitor assignment completion. Like a Slackbot with more power isn't it?
Recently released, Google's Circle to Search is now getting a revamp and will soon be used to assist children with homework, such as delivering step-by-step solutions for arithmetic problems.
Enhanced security features
Enhanced safeguards for screen sharing, enhanced security against cell site simulators, and on-device live threat detection to identify dangerous applications are just a few of the security and privacy features Google said it is introducing to Android.
In addition, Google unveiled Theft Detection Lock, an AI-powered feature that recognizes motion frequently connected to theft, such as a quick turn in the other way, to safeguard devices in the real world. Once discovered, the phone screen instantly locks, prohibiting continued usage of the device without clearing any precautions you’ve placed in place.
Element of Private Space
This is a fun one. Users may isolate a specific area of the operating system for confidential data using the new Android feature called Private Space. It partitions specific apps into a "container," somewhat to the mobile operating system's Incognito mode.
The area can be locked as an additional degree of authentication and is accessible through the launcher. Notifications, settings, and recents will not show up for apps in Private Space. As long as the private zone is accessible, users may continue access the applications via a system sharesheet and photo picker in the main area.
Amazing Gemini Models
Google's Gemini AI features two new models that are tailored to different kinds of activities. The quicker, lower latency Gemini 1.5 Flash is designed for applications where speed is important. Let me brief you up with both the models!
Gemini 1.5 Flash
The quickest Gemini model provided by the API and the most recent model in the Gemini family is 1.5 Flash. It's tailored for high-volume, high-frequency jobs at scale, costs less to service, and includes our groundbreaking lengthy context window.
Gemini 1.5 Pro
Role, format, and style-specific directions, as well as more intricate and subtle ones, are now able to be followed by 1.5 Pro at the product level. For certain use scenarios, such as creating the persona and answer style of a chat agent or automating workflows with numerous function calls, Google has enhanced control over the model's replies. Additionally, Google lets users control how models behave by providing system instructions.
Gemini Nano
Additionally, starting with Chrome 126, Google is integrating Gemini Nano, the smallest of its AI models, straight into the Chrome desktop client. According to Google, developers will be able to power their own AI capabilities using the on-device model. The "help me write" function from Workspace Lab in Gmail is one example of a service that Google intends to power using this new capability.
Project Astra!
A visually enhanced version of Google Lens, Project Astra is a chatbot. Simply pointing the camera at objects, users may ask inquiries about almost anything in their immediate surroundings by opening the camera on their phones. In a sample film that Google released, a user repeatedly asked Astra different questions depending on their environment.
Google claims that Astra's superior spatial and contextual knowledge enables users to recognize objects in the real world, such as the town they are in, the inner workings of a computer's code, or even the name of your dog's band. The demonstration demonstrated how Astra's voice-activated interactions functioned using the cameras on phones and certain unnamed smart eye wear.
Google Firebase Genkit
Firebase Genkit is a recent addition to the Firebase platform, designed to facilitate the development of AI-powered JavaScript/TypeScript apps by developers. Support for Go will be added soon. With the help of this open-source framework and the Apache 2.0 license, programmers can incorporate AI into both new and old applications with ease.
Learning using Generative AI
LearnLM is a new family of generative AI models that Google has "fine-tuned" for learning. This is a joint effort between Google Research and its DeepMind AI research branch.
According to Google, LearnLM models are made to "conversationally" instruct pupils in a variety of areas. According to Google, LearnLM might assist instructors in finding fresh concepts, engaging information, and activities, as well as resources catered to the need of certain student cohorts.
A Search Evolution
Even after starting off as a small search-focused business, Google is still the biggest name in the search sector!
When you're looking for items with some contextual complexity, a new function called Multi-Step Reasoning allows you to locate many levels of knowledge about a topic. Using travel as an example, Google demonstrated how using Maps to search for hotels and create travel schedules may be helpful. After that, it offered restaurant recommendations and assisted with organizing the trip's meals. By focusing on vegetarian alternatives or certain cuisines, you may narrow down your search. You are presented with this information in an orderly fashion.
Updates on Gemma 2
Google plans to update Gemma 2 with a new 27-billion-parameter model in response to developer demands for a larger model, which are among the top requests. In June, Google is going to release the next generation of its Gemma models. According to Google, Nvidia optimized this size for next-generation GPUs, and it can operate effectively on a single TPU host and vertex AI.
Imagen 3
The text-to-image model we have the best quality is Imagen 3. Compared to our previous models, it creates an astonishing amount of information and produces lifelike, photo-realistic images with considerably less irritating visual aberrations.
In addition to incorporating minor nuances from longer prompts, Imagen 3 comprehends natural language and your prompt's purpose better. A variety of genres are mastered by the model thanks to its deep comprehension.
Additionally, it's Google's finest model to date for word rendering—a difficult task for picture generating models. Personalized birthday cards, presentation title slides, and other things may be created using this functionality.
Veo: The most powerful video generating model yet!
To date, Veo is the most powerful video generating model offered by Google. It produces long-form, 1080p films in a variety of cinematic and visual genres that are of excellent quality and can exceed one minute.
Understanding prompts for a wide range of cinematic effects, such as time lapses or overhead shots of a landscape, gives producers an unmatched degree of creative power while precisely capturing the complexity and tone of a prompt. All people will be able to produce videos thanks to the tools developed by Google's video generating model. Veo creates new opportunities for storytelling, education, and more, whether you're an experienced filmmaker, an aspiring artist, or an educator hoping to impart information.
The Project IDX
Google has released Project IDX, an advanced browser-based development environment focused on artificial intelligence, into open beta. This release brings with it connectors with the Chrome Dev Tools and Lighthouse to facilitate application debugging, as well as an interaction with the Google Maps Platform into the IDE to assist add geolocation functionality to its apps. App deployment to Cloud Run, Google Cloud's serverless platform for front- and back-end service execution, will soon be possible as well, according to Google.
I hope you are now well aware about Google's most powerful yet innovative advancements that were announced in this year's Google I/O 2024. I now feel that we will be surrounded by far more technical advancements and highly efficient models in the coming time :)