

…’Zero’ makes it easier to develop AI systems for the Chinese cultural context…Ĭhinese researchers with startup Qihoo 360 AI Research and the Department of Automation at Tsinghua University have built Zero, a benchmark for assessing the quality of vision-text Chinese AI models. Read more: A Generalist Agent (DeepMind PDF).Ĭhinese researchers build a large multi-modal dataset, and evaluation suite: Why this matters: “Although still at the proof-of-concept stage, the recent progress in generalist models suggests that safety researchers, ethicists, and most importantly, the general public, should consider their risks and benefits,” DeepMind writes.Ĭheck out the blog: A Generalist Agent (DeepMind website). In a small package: The largest (disclosed here) GATO agent is 1.18 billion parameters, making it fairly small in the grand scheme of recent AI developments.Īn even crazier thing: The GATO model only has a context window of 1024 tokens (by comparison, GPT when it launched), so the fact 1024 tokens is enough to get a somewhat capable multimodal agent is pretty surprising. It’s a big deal: The fact you can take a bunch of different tasks from different modalities and just… tokenize them… and it works? That’s wild! It’s both a) wildly dumb and b) wildly effective, and c) another nice example of ‘ The Bitter Lesson‘, where given enough compute/scale, the dumb things (aka, the simple ones) tend to work really well. What GATO can do: After training, GATO can do okay at tasks ranging from DeepMind Lab, to robot manipulation, to the procgen benchmark, to image captioning, to natural language generation. The result is a system where “the same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.” This is wild stuff! Now DeepMind has done the same thing for reinforcement learning with GATO, an agent where basically DeepMind takes a bunch of distinct tasks in different modalities and embeds them into the same space, then learns prediction tasks from them. Or contrastive learning – just embed a couple of modalities into the same space and sort of flip-flop between them through the learning process and you get powerful multimodal systems like CLIP. Take for example GPT3 – just scale-up next word prediction on an internet-scale corpus and you wind up with something capable of few-shot learning, fielding a vast range of NLP capabilities.Īnother example is computer vision systems – just create a vast dataset and you wind up with increasingly robust vision systems. In the past few years, the dumbest thing has tended to work surprisingly well. …AKA: The dawn of really preliminary, general AI systems. ĭeepMind builds a (very preliminary) general AI agent:
#Advance wars by web ai license
Read more: Unified Chinese License Plate Detection and Recognition with High Efficiency (arXiv). It’s also notable how universities in China can access large-scale surveillance datasets. Why this matters: Datasets like CRPD represent the basic infrastructure on which AI capabilities get developed. Images for the dataset were “collected from electronic monitoring systems in most provinces of mainland China in different periods and weather conditions,” the authors write.

Each image is annotated with the Chinese and English characters of the depicted license plate, the coordinate of the vertices of the license plates, and the type of license plate (e.g, whether for police cars, small cars, etc). The dataset: The Chinese Road Plate Dataset (CRPD) contains 25k images (around 30k total). The authors use the dataset to train some models that get state-of-the-art accuracy while running at 30 frames per second. Researchers with the University of Electronic Science and Technology of China have built a dataset for recognizing Chinese license plates. …A basic dataset for a useful capability…
